代碼調用流程:node
1. nova.scheduler.client.query.SchedulerQueryClient#select_destinations 2. nova.scheduler.rpcapi.SchedulerAPI#select_destinations 3. nova.scheduler.manager.SchedulerManager#select_destinations 4. nova.scheduler.filter_scheduler.FilterScheduler#select_destinations
scheduler的rpcapi和manager屬於同步調用。算法
在第三步中scheduler會調用placement提供的API,對全部的`compute node`進行初步的篩選,placement的API會返回一個字典,格式以下:api
{ "provider_summaries": { "4cae2ef8-30eb-4571-80c3-3289e86bd65c": { "resources": { "VCPU": { "used": 2, "capacity": 64 }, "MEMORY_MB": { "used": 1024, "capacity": 11374 }, "DISK_GB": { "used": 2, "capacity": 49 } } } }, "allocation_requests": [ { "allocations": [ { "resource_provider": { "uuid": "4cae2ef8-30eb-4571-80c3-3289e86bd65c" }, "resources": { "VCPU": 1, "MEMORY_MB": 512, "DISK_GB": 1 } } ] } ] }
對於placement API篩選出的節點,scheduler會再度進行篩選,大概的篩選過程:all hosts => filtering => weighting => random
1. get all hosts:這裏的all host固然不是指環境中全部的host,而是在經過placement API,返回的全部host的詳細信息;
2. filtering:首先過濾ignore host和force host,若是force host或者force node直接返回便可。而後結合nova的配置文件中available_filters和enabled_filters參數,依次執行全部的filter。下面咱們舉幾個filter的例子,執行filter的入口:架構
nova.filters.BaseFilterHandler#get_filtered_objects def get_filtered_objects(self, filters, objs, spec_obj, index=0): list_objs = list(objs) LOG.debug("Starting with %d host(s)", len(list_objs)) part_filter_results = [] full_filter_results = [] log_msg = "%(cls_name)s: (start: %(start)s, end: %(end)s)" # 循環遍歷配置文件中指定的filters for filter_ in filters: if filter_.run_filter_for_index(index): cls_name = filter_.__class__.__name__ # 記錄開始該執行filter以前的host的個數 start_count = len(list_objs) # 對全部的host執行該filter,返回只有通過該filter的host objs = filter_.filter_all(list_objs, spec_obj) if objs is None: LOG.debug("Filter %s says to stop filtering", cls_name) return list_objs = list(objs) end_count = len(list_objs) part_filter_results.append(log_msg % {"cls_name": cls_name, "start": start_count, "end": end_count}) if list_objs: remaining = [(getattr(obj, "host", obj), getattr(obj, "nodename", "")) for obj in list_objs] full_filter_results.append((cls_name, remaining)) else: LOG.info(_LI("Filter %s returned 0 hosts"), cls_name) full_filter_results.append((cls_name, None)) break LOG.debug("Filter %(cls_name)s returned " "%(obj_len)d host(s)", {'cls_name': cls_name, 'obj_len': len(list_objs)}) # 下邊是一些日誌中打印一些詳細信息,不在贅述 ………… return list_objs
接下來介紹幾個filter。app
class AvailabilityZoneFilter(filters.BaseHostFilter): # 若是是一次建立多個虛機,則AvailabilityZoneFilter指執行一次 run_filter_once_per_request = True # 全部的filter都須要實現該方法 def host_passes(self, host_state, spec_obj): # 獲取request_spec中指定的availability_zone,這裏須要強調一下,若是建立時,沒有指定--availability-zone 參數,request_sepc中的availability_zone就是空的。 availability_zone = spec_obj.availability_zone # 若是request_spec中availability_zone值爲空,那麼也就是這個操做是容許跨AZ操做的。 if not availability_zone: return True # 獲取host的availability_zone信息,首先獲取該host所屬的aggregate信息,aggregate信息中有availability_zone相關的信息 metadata = utils.aggregate_metadata_get_by_host( host_state, key='availability_zone') if 'availability_zone' in metadata: # 判斷request_spec中指定的availability_zone是否在該host所屬的availability_zone中。 hosts_passes = availability_zone in metadata['availability_zone'] host_az = metadata['availability_zone'] else: hosts_passes = availability_zone == CONF.default_availability_zone host_az = CONF.default_availability_zone if not hosts_passes: LOG.debug("Availability Zone '%(az)s' requested. " "%(host_state)s has AZs: %(host_az)s", {'host_state': host_state, 'az': availability_zone, 'host_az': host_az}) return hosts_passes
nova.scheduler.filters.image_props_filter.ImagePropertiesFilter#host_passes # 主要是根據鏡像中的property的值進行過濾,在ironic的調度中會使用到。 def host_passes(self, host_state, spec_obj): image_props = spec_obj.image.properties if spec_obj.image else {} # 判斷該compute_node是否支持image的property屬性中指定的參數值。 if not self._instance_supported(host_state, image_props, host_state.hypervisor_version): LOG.debug("%(host_state)s does not support requested " "instance_properties", {'host_state': host_state}) return False return True def _instance_supported(self, host_state, image_props, hypervisor_version): img_arch = image_props.get('hw_architecture') # 架構,i686或x86_64 img_h_type = image_props.get('img_hv_type') # hypervisor 類型 img_vm_mode = image_props.get('hw_vm_mode') # 虛擬化類型 ………… # 獲取該compute_node支持的instance類型,返回值爲列表。好比: [["x86_64", "baremetal", "hvm"]] [["i686", "qemu", "hvm"], ["i686", "kvm", "hvm"], ["x86_64", "qemu", "hvm"], ["x86_64", "kvm", "hvm"]] supp_instances = host_state.supported_instances ………… 比較規則 def _compare_props(props, other_props): # 對image的property指定的全部值進行遍歷 for i in props: 查看該property是不是該compute_node支持的 if i and i not in other_props: return False return True # 對該compute_node支持的全部類型進行遍歷 for supp_inst in supp_instances: if _compare_props(checked_img_props, supp_inst)
對於Ironic的調度須要咱們着重使用到ImagePropertiesFilter,虛機使用的鏡像和裸機使用的鏡像中的property的值是不一樣的,再結合相關的placement的調度,實現虛機不會調度到ironic node,同時建立裸機不會調度到qemu的node。dom
3. 把過濾後的hosts計算權重而且進行最優排序,下面咱們舉幾個weight的例子:ide
class BaseWeightHandler(loadables.BaseLoader): object_class = WeighedObject def get_weighed_objects(self, weighers, obj_list, weighing_properties): """Return a sorted (descending), normalized list of WeighedObjects.""" # obj_list 表示filter篩選出的全部hosts # weighing_properties 表示request_sepc信息 weighed_objs = [self.object_class(obj, 0.0) for obj in obj_list] # 若是通過filter篩選只剩一個host,則無需進行權重的比較,直接返回該host便可 if len(weighed_objs) <= 1: return weighed_objs # 根據配置文件中指定的weigher_classes,逐個計算權重 for weigher in weighers: # 以RAMWeigher爲例進行說明 weights = weigher.weigh_objects(weighed_objs, weighing_properties) # Normalize the weights weights = normalize(weights, minval=weigher.minval, maxval=weigher.maxval) for i, weight in enumerate(weights): obj = weighed_objs[i] # 將計算後的權重值,保存到host信息中,而且將全部類型的權重加到一塊,若是咱們想要增長某種類型的權重比例,咱們能夠修改配置文件中*_weight_multiplier的值,好比咱們想要在權重的計算中有關內存的權重佔更大的做用,那麼咱們能夠經過調節ram_weight_multiplier的值達到效果。 obj.weight += weigher.weight_multiplier() * weight # 按照權重進行性排序(倒序) return sorted(weighed_objs, key=lambda x: x.weight, reverse=True) class RAMWeigher(weights.BaseHostWeigher): minval = 0 def weight_multiplier(self): """Override the weight multiplier.""" return CONF.filter_scheduler.ram_weight_multiplier def _weigh_object(self, host_state, weight_properties): """Higher weights win. We want spreading to be the default.""" # 直接返回該節點的剩餘內存,也就是剩餘內存越多的節點,有關內存的權重越大。 return host_state.free_ram_mb
4. random,這個過程咱們經過代碼進行詳細的分析。優化
host_subset_size = CONF.filter_scheduler.host_subset_size if host_subset_size < len(weighed_hosts): weighed_subset = weighed_hosts[0:host_subset_size] else: weighed_subset = weighed_hosts # 使用隨機算法,從N箇中抽取1個 chosen_host = random.choice(weighed_subset) weighed_hosts.remove(chosen_host) return [chosen_host] + weighed_hosts
對於host_subset_size參數,默認值爲1。官方是這樣解釋的:若是設置大於1的正整數,當有多個scheduler進程處理相同的請求是會減小調度到同一臺host的可能性,創造了一種競爭機制。從N個host中挑選最適合請求的一個host,會減小衝突。然而,若是該值設置的越大,對於給定的請求,選擇的主機可能不太優化。ui