boot，rebuild，resize，migrate有關的scheduler流程

時間 2019-12-09

標籤 boot rebuild resize migrate 有關 scheduler 流程简体版

原文原文鏈接

代碼調用流程：node

1. nova.scheduler.client.query.SchedulerQueryClient#select_destinations
2. nova.scheduler.rpcapi.SchedulerAPI#select_destinations
3. nova.scheduler.manager.SchedulerManager#select_destinations
4. nova.scheduler.filter_scheduler.FilterScheduler#select_destinations

scheduler的rpcapi和manager屬於同步調用。算法

在第三步中scheduler會調用placement提供的API，對全部的`compute node`進行初步的篩選，placement的API會返回一個字典，格式以下：api

{
    "provider_summaries": {
        "4cae2ef8-30eb-4571-80c3-3289e86bd65c": {
            "resources": {
                "VCPU": {
                    "used": 2,
                    "capacity": 64
                },
                "MEMORY_MB": {
                    "used": 1024,
                    "capacity": 11374
                },
                "DISK_GB": {
                    "used": 2,
                    "capacity": 49
                }
            }
        }
    },
    "allocation_requests": [
        {
            "allocations": [
                {
                    "resource_provider": {
                        "uuid": "4cae2ef8-30eb-4571-80c3-3289e86bd65c"
                    },
                    "resources": {
                        "VCPU": 1,
                        "MEMORY_MB": 512,
                        "DISK_GB": 1
                    }
                }
            ]
        }
    ]
}

View Code

對於placement API篩選出的節點，scheduler會再度進行篩選，大概的篩選過程：all hosts => filtering => weighting => random
1. get all hosts：這裏的all host固然不是指環境中全部的host，而是在經過placement API，返回的全部host的詳細信息；
2. filtering：首先過濾ignore host和force host，若是force host或者force node直接返回便可。而後結合nova的配置文件中available_filters和enabled_filters參數，依次執行全部的filter。下面咱們舉幾個filter的例子，執行filter的入口：架構

nova.filters.BaseFilterHandler#get_filtered_objects
 
    def get_filtered_objects(self, filters, objs, spec_obj, index=0):
        list_objs = list(objs)
        LOG.debug("Starting with %d host(s)", len(list_objs))
        part_filter_results = []
        full_filter_results = []
        log_msg = "%(cls_name)s: (start: %(start)s, end: %(end)s)"
        # 循環遍歷配置文件中指定的filters
        for filter_ in filters:
            if filter_.run_filter_for_index(index):
                cls_name = filter_.__class__.__name__
                # 記錄開始該執行filter以前的host的個數
                start_count = len(list_objs)
                # 對全部的host執行該filter，返回只有通過該filter的host
                objs = filter_.filter_all(list_objs, spec_obj)
                if objs is None:
                    LOG.debug("Filter %s says to stop filtering", cls_name)
                    return
                list_objs = list(objs)
                end_count = len(list_objs)
                part_filter_results.append(log_msg % {"cls_name": cls_name,
                        "start": start_count, "end": end_count})
                if list_objs:
                    remaining = [(getattr(obj, "host", obj),
                                  getattr(obj, "nodename", ""))
                                 for obj in list_objs]
                    full_filter_results.append((cls_name, remaining))
                else:
                    LOG.info(_LI("Filter %s returned 0 hosts"), cls_name)
                    full_filter_results.append((cls_name, None))
                    break
                LOG.debug("Filter %(cls_name)s returned "
                          "%(obj_len)d host(s)",
                          {'cls_name': cls_name, 'obj_len': len(list_objs)})
        # 下邊是一些日誌中打印一些詳細信息，不在贅述
        …………
        return list_objs

View Code

接下來介紹幾個filter。app

class AvailabilityZoneFilter(filters.BaseHostFilter):
 
    # 若是是一次建立多個虛機，則AvailabilityZoneFilter指執行一次
    run_filter_once_per_request = True  
    # 全部的filter都須要實現該方法
    def host_passes(self, host_state, spec_obj):
        # 獲取request_spec中指定的availability_zone，這裏須要強調一下，若是建立時，沒有指定--availability-zone 參數，request_sepc中的availability_zone就是空的。
        availability_zone = spec_obj.availability_zone
        # 若是request_spec中availability_zone值爲空，那麼也就是這個操做是容許跨AZ操做的。
        if not availability_zone:
            return True
        # 獲取host的availability_zone信息，首先獲取該host所屬的aggregate信息，aggregate信息中有availability_zone相關的信息
        metadata = utils.aggregate_metadata_get_by_host(
                host_state, key='availability_zone')
 
        if 'availability_zone' in metadata:
            # 判斷request_spec中指定的availability_zone是否在該host所屬的availability_zone中。
            hosts_passes = availability_zone in metadata['availability_zone']
            host_az = metadata['availability_zone']
        else:
            hosts_passes = availability_zone == CONF.default_availability_zone
            host_az = CONF.default_availability_zone
 
        if not hosts_passes:
            LOG.debug("Availability Zone '%(az)s' requested. "
                      "%(host_state)s has AZs: %(host_az)s",
                      {'host_state': host_state,
                       'az': availability_zone,
                       'host_az': host_az})
 
        return hosts_passes

View Code

nova.scheduler.filters.image_props_filter.ImagePropertiesFilter#host_passes
 
    # 主要是根據鏡像中的property的值進行過濾，在ironic的調度中會使用到。
    def host_passes(self, host_state, spec_obj):
        image_props = spec_obj.image.properties if spec_obj.image else {}
        # 判斷該compute_node是否支持image的property屬性中指定的參數值。
        if not self._instance_supported(host_state, image_props,
                                        host_state.hypervisor_version):
            LOG.debug("%(host_state)s does not support requested "
                        "instance_properties", {'host_state': host_state})
            return False
        return True
     
    def _instance_supported(self, host_state, image_props,
                            hypervisor_version):
        img_arch = image_props.get('hw_architecture') # 架構，i686或x86_64
        img_h_type = image_props.get('img_hv_type') # hypervisor 類型
        img_vm_mode = image_props.get('hw_vm_mode') # 虛擬化類型
        …………
        # 獲取該compute_node支持的instance類型，返回值爲列表。好比：
        [["x86_64", "baremetal", "hvm"]]
        [["i686", "qemu", "hvm"], ["i686", "kvm", "hvm"], ["x86_64", "qemu", "hvm"], ["x86_64", "kvm", "hvm"]]
        supp_instances = host_state.supported_instances
        …………
        比較規則
        def _compare_props(props, other_props):
            # 對image的property指定的全部值進行遍歷
            for i in props:
                查看該property是不是該compute_node支持的
                if i and i not in other_props:
                    return False
            return True
        # 對該compute_node支持的全部類型進行遍歷
        for supp_inst in supp_instances:
            if _compare_props(checked_img_props, supp_inst)

View Code

對於Ironic的調度須要咱們着重使用到ImagePropertiesFilter，虛機使用的鏡像和裸機使用的鏡像中的property的值是不一樣的，再結合相關的placement的調度，實現虛機不會調度到ironic node，同時建立裸機不會調度到qemu的node。dom

3. 把過濾後的hosts計算權重而且進行最優排序，下面咱們舉幾個weight的例子：ide

class BaseWeightHandler(loadables.BaseLoader):
    object_class = WeighedObject
 
    def get_weighed_objects(self, weighers, obj_list, weighing_properties):
        """Return a sorted (descending), normalized list of WeighedObjects."""
        # obj_list 表示filter篩選出的全部hosts
        # weighing_properties 表示request_sepc信息
        weighed_objs = [self.object_class(obj, 0.0) for obj in obj_list]
        # 若是通過filter篩選只剩一個host，則無需進行權重的比較，直接返回該host便可
        if len(weighed_objs) <= 1:
            return weighed_objs
        # 根據配置文件中指定的weigher_classes，逐個計算權重
        for weigher in weighers:
            # 以RAMWeigher爲例進行說明
            weights = weigher.weigh_objects(weighed_objs, weighing_properties)
 
            # Normalize the weights
            weights = normalize(weights,
                                minval=weigher.minval,
                                maxval=weigher.maxval)
 
            for i, weight in enumerate(weights):
                obj = weighed_objs[i]
                # 將計算後的權重值，保存到host信息中，而且將全部類型的權重加到一塊，若是咱們想要增長某種類型的權重比例，咱們能夠修改配置文件中*_weight_multiplier的值，好比咱們想要在權重的計算中有關內存的權重佔更大的做用，那麼咱們能夠經過調節ram_weight_multiplier的值達到效果。
                obj.weight += weigher.weight_multiplier() * weight
        # 按照權重進行性排序（倒序）
        return sorted(weighed_objs, key=lambda x: x.weight, reverse=True)
         
class RAMWeigher(weights.BaseHostWeigher):
    minval = 0
 
    def weight_multiplier(self):
        """Override the weight multiplier."""
        return CONF.filter_scheduler.ram_weight_multiplier
 
    def _weigh_object(self, host_state, weight_properties):
        """Higher weights win.  We want spreading to be the default."""
        # 直接返回該節點的剩餘內存，也就是剩餘內存越多的節點，有關內存的權重越大。
        return host_state.free_ram_mb

View Code

4. random，這個過程咱們經過代碼進行詳細的分析。優化

host_subset_size = CONF.filter_scheduler.host_subset_size
if host_subset_size < len(weighed_hosts):
    weighed_subset = weighed_hosts[0:host_subset_size]
else:
    weighed_subset = weighed_hosts
# 使用隨機算法，從N箇中抽取1個
chosen_host = random.choice(weighed_subset)
weighed_hosts.remove(chosen_host)
return [chosen_host] + weighed_hosts

對於host_subset_size參數，默認值爲1。官方是這樣解釋的：若是設置大於1的正整數，當有多個scheduler進程處理相同的請求是會減小調度到同一臺host的可能性，創造了一種競爭機制。從N個host中挑選最適合請求的一個host，會減小衝突。然而，若是該值設置的越大，對於給定的請求，選擇的主機可能不太優化。ui

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。