nova evacuate功能分析

Part 1 node

宕機疏散對外呈現的命令行api

 

nova evacuate [--password <password>] <server> [<host>]參數詳解:<server> 故障計算節點上的虛擬機<host> 目標計算節點的名稱或ID。若是沒有指定特定的計算節點,則nova scheduler調度器隨機選擇選擇一個可用的計算節點password <password> 設置宕機疏散後虛機的登陸密碼

Part 2網絡

nova evacuate應用場景app

 

nova evacuate 的應用場景主要是,當虛擬機所在nova-compute計算節點出現宕機後,虛擬機能夠經過nova evacuate將虛擬機從宕機的nova-compute計算節點遷移至其它可用的計算節點上。當原compute節點再從新恢復後,會對疏散後的虛機進行刪除。 

Part 3ide

nova evacuate代碼梳理函數

 

當nova接受到用戶下發的nova evacuate請求時,nova各模塊處理的主體流程以下:
  • 1) nova-api服務接受用戶請求,對用戶的請求參數進行有效性校驗,而後向nova-conductor服務發送rpc請求,把處理流程交給nova-conductor服務進行處理post

  • 2) nova-conductor服務接受到rpc消息後,根據用戶下發的參數進行不一樣邏輯判斷,若是用戶沒有指定特定的計算節點,那麼會進一步的調用nova-scheduler服務,來選擇一個可用的計算節點ui

  • 3) nova-conductor服務給選中的nova-compute計算節點發送cast類型的rpc消息,交由具體的計算節點來進行處理spa

  • 4) nova-compute節點接受到rpc消息後,進行虛機的建立操做命令行

Part 4

nova-api服務階段

 

nova-api服務執行api目錄下代碼階段

 

  • 1)用戶下發 nova evacuate 命令時,使用post方法,給nova-api服務發送http請求,http body體裏面,使用的action動做爲 evacuate

  • 2)獲取http請求body體裏面的內容,從而獲取 host,force,password,on_shared_storage,這些參數的值

  • 3)若是指定了host 參數的值,那麼首先判斷該host節點是否存在,若是不存在的話,那麼拋出找不到該Host的異常,存在的話,執行第四步

  • 4)若是指定的Host,與虛機所在的host相同,那麼拋出異常,不容許指定的計算節點與虛機的host相同

D:\tran_code\nova_v1\nova\api\openstack\compute\evacuate.py    def _evacuate(self, req, id, body):        """Permit admins to evacuate a server from a failed host        to a new one.        """        context = req.environ["nova.context"]        instance = common.get_instance(self.compute_api, context, id)        context.can(evac_policies.BASE_POLICY_NAME,                    target={'user_id': instance.user_id,                            'project_id': instance.project_id})         evacuate_body = body["evacuate"]        host = evacuate_body.get("host")        force = None        ..........        if host is not None:            try:                self.host_api.service_get_by_compute_host(context, host)            except (exception.ComputeHostNotFound,                    exception.HostMappingNotFound):                msg = _("Compute host %s not found.") % host                raise exc.HTTPNotFound(explanation=msg)        if instance.host == host:            msg = _("The target host can't be the same one.")            raise exc.HTTPBadRequest(explanation=msg)         try:            self.compute_api.evacuate(context, instance, host,                                      on_shared_storage, password, force)        except exception.InstanceUnknownCell as e:            raise exc.HTTPNotFound(explanation=e.format_message())        ....... 

nova-api服務執行compute目錄模塊代碼階段

 

  • 1)獲取虛機的host信息

  • 2)判斷虛機Host的nova-compue服務是否處於Up狀態,若是是Up狀態,那麼就拋出一個異常(宕機疏散,只有在虛機所在節點宕機的狀況下,才進行宕機疏散的),不在執行後續的操做,若是非up,那麼執行3如下步驟

  • 3)建立Migration表,來記錄該虛機宕機疏散的操做信息,便於後續函數,獲取該虛機的信息

  • 4)若是指定了特定host主機,那麼把這個host更新到migration的dest_compute字段裏面

  • 5)根據虛機的 uuid,獲取虛機的request_spec信息。在T版本中,虛機的request_spect內容,存放在nova_api.request_specs表spec字段裏面

  • 6)若是指定了目標主機,可是不強制進行宕機疏散的話,把host參數置爲none,由nova-scheduler隨機選擇一個可有的計算節點

這個函數,只用了instance, host, on_shared_storage,admin_password=None, force=None,recreate=True這五個參數,其餘的參數沒有用,使用默認值,傳遞給nova conductor服務的rebuild_instance方法代碼邏輯以下:
D:\tran_code\nova_v1\nova\compute\api.py    def evacuate(self, context, instance, host, on_shared_storage,                 admin_password=None, force=None):        LOG.debug('vm evacuation scheduled', instance=instance)獲取虛機的所在host節點        inst_host = instance.host根據虛機的host來獲取其nova-compute service信息        service = objects.Service.get_by_compute_host(context, inst_host)對虛機所在的nova-compute節點狀態進行判斷,宕機疏散是在虛機所在的nova-compute節點爲down的狀況下,疏散的,所以若是虛機所在的nova-compute服務爲up,會拋出異常        if self.servicegroup_api.service_is_up(service):            LOG.error('Instance compute service state on %s '                      'expected to be down, but it was up.', inst_host)            raise exception.ComputeServiceInUse(host=inst_host)        設置虛機的任務狀態爲rebuiding,重建狀態        instance.task_state = task_states.REBUILDING        instance.save(expected_task_state=[None])        self._record_action_start(context, instance, instance_actions.EVACUATE)建立這個migration記錄,是爲源計算節點建立一個醒目標誌,爲了找到它及之後清理,這個參數不會經過參數的形式,下發下去,遷移的類型爲evacuate        migration = objects.Migration(context,                                      source_compute=instance.host,                                      source_node=instance.node,                                      instance_uuid=instance.uuid,                                      status='accepted',                                      migration_type='evacuation')若是指定了目標主機,那麼把目標主機記錄到Migration表裏面        if host:            migration.dest_compute = host        migration.create()         compute_utils.notify_about_instance_usage(            self.notifier, context, instance, "evacuate")         try:            request_spec = objects.RequestSpec.get_by_instance_uuid(                context, instance.uuid)        except exception.RequestSpecNotFound:            # Some old instances can still have no RequestSpec object attached            # to them, we need to support the old way            request_spec = None         # NOTE(sbauza): Force is a boolean by the new related API version若是不強制進行宕機疏散而且還強制指定了特定的host主機,那麼就走這段邏輯,其餘狀況是,不走。        if force is False and host:            nodes = objects.ComputeNodeList.get_all_by_host(context, host)            # NOTE(sbauza): Unset the host to make sure we call the scheduler雖然形參賦值了,可是在這裏把host賦值爲空,讓它走nova-scheduler調度            host = None            # FIXME(sbauza): Since only Ironic driver uses more than one            # compute per service but doesn't support evacuations,            # let's provide the first one.            target = nodes[0]            if request_spec:                destination = objects.Destination(                    host=target.host,                    node=target.hypervisor_hostname                )                request_spec.requested_destination = destination        return self.compute_task_api.rebuild_instance(context,                       instance=instance,                       new_pass=admin_password,                       injected_files=None,                       image_ref=None,                       orig_image_ref=None,                       orig_sys_metadata=None,                       bdms=None,                       recreate=True,                       on_shared_storage=on_shared_storage,                       host=host,                       request_spec=request_spec,                       ) 
 

Part 5

nova-conductor服務階段

 

nova-conductor服務接受到nova-api發送的rpc請求之後,nova-conductor階段 manager.py階段處理 1) 根據虛機的uuid,獲取虛機的migrantion表信息 2) 對傳入的Host進行不一樣邏輯判斷 3) host有值的情景
  • 第一種情景:在虛機原始的host上,使用虛機原始的鏡像進行重建rebuild;

  • 第二種情景:指定特定的主機,而且進行強制的宕機疏散。

這兩種狀況下,node這個參數是爲空的 4) host無值的狀況三種情景:
  • 第一種情景:要麼沒有指定主機進行宕機疏散;

  • 第二種情景:要麼指定主機了,可是沒有進行強制宕機疏散;

  • 第三種情景:要麼就是在虛機所在主機上,使用新的鏡像進行rebuild重建虛機。

在nova-scheduler的過程當中,instance的host是會被排除的,避免選擇到這個相同的主機,這種狀況下,選擇目標主機後,host和Node是非空的,host用於設置消息的目標主機路由參數,node用於後續函數中。 5) 給nova-compute服務發送rpc請求代碼邏輯以下:
D:\tran_code\nova_v1\nova\conductor\manager.py    def rebuild_instance(self, context, instance, orig_image_ref, image_ref,                         injected_files, new_pass, orig_sys_metadata,                         bdms, recreate, on_shared_storage,                         preserve_ephemeral=False, host=None,                         request_spec=None):         with compute_utils.EventReporter(context, 'rebuild_server', instance.uuid):            node = limits = None            try:根據虛機的Uuid,來獲取到虛機的migration表信息,若是沒有找到,那麼拋異常                migration = objects.Migration.get_by_instance_and_status(                    context, instance.uuid, 'accepted')            except exception.MigrationNotFoundByStatus:                LOG.debug("No migration record for the rebuild/evacuate "                          "request.", instance=instance)                migration = None            有兩種狀況,host變量是被傳遞的,第一種是虛機的host被傳遞過去,要在虛機所在的主機上進行重建,這個會跳過nova scheduler調度器;虛機重建有兩種狀況,一種是虛機使用原始的鏡像,另外一種是虛機使用非原始鏡像第二種狀況,在指定特定的目標主機,而且強制疏散的狀況下,那麼就不經過nova scheduler調度器            if host:                # We only create a new allocation on the specified host if                # we're doing an evacuate since that is a move operation.                if host != instance.host:                    self._allocate_for_evacuate_dest_host(                        context, instance, host, request_spec)            else:在相同的主機上使用新的鏡像進行重建或者指定特定的主機,進行宕機疏散,可是不強制沒有指定request_spec的狀況下,根據虛機的鏡像信息,來構造image元數據,而且來構造request_spec信息                if not request_spec:                    filter_properties = {'ignore_hosts': [instance.host]}                    # build_request_spec expects a primitive image dict                    image_meta = nova_object.obj_to_primitive(                        instance.image_meta)                    request_spec = scheduler_utils.build_request_spec(                            context, image_meta, [instance])                    request_spec = objects.RequestSpec.from_primitives(                        context, request_spec, filter_properties)                elif recreate:宕機疏散是要走這個的經過在RequestSpec中增長source host來排除調度器調度到它                    # NOTE(sbauza): Augment the RequestSpec object by excluding                    # the source host for avoiding the scheduler to pick it                    request_spec.ignore_hosts = [instance.host]排除掉虛機的host                    # NOTE(sbauza): Force_hosts/nodes needs to be reset                    # if we want to make sure that the next destination                    # is not forced to be the original host                    request_spec.reset_forced_destinations()                try:                    request_spec.ensure_project_id(instance)nova scheduler服務,根據request_spec來調度選擇一個可用的計算節點                    hosts = self._schedule_instances(context, request_spec,                                                     [instance.uuid])                    host_dict = hosts.pop(0)                    host, node, limits = (host_dict['host'],                                          host_dict['nodename'],                                          host_dict['limits']).......             compute_utils.notify_about_instance_usage(                self.notifier, context, instance, "rebuild.scheduled")             instance.availability_zone = (                availability_zones.get_host_availability_zone(                    context, host))            self.compute_rpcapi.rebuild_instance(context,                    instance=instance,                    new_pass=new_pass,                    injected_files=injected_files,                    image_ref=image_ref,                    orig_image_ref=orig_image_ref,                    orig_sys_metadata=orig_sys_metadata,                    bdms=bdms,                    recreate=recreate,                    on_shared_storage=on_shared_storage,                    preserve_ephemeral=preserve_ephemeral,                    migration=migration, 此時傳遞了migration這個結構體                    host=host, node=node, limits=limits) 
 

Part 6

目標節點的nova-compute 服務階段

 

nova-compute階段 manager.py階段
  • 1) 根據recreate值來區分是nova evacuate宕機疏散操做仍是nova rebuild操做

  • 2) Recreate參數爲真的狀況下,nova evacuate宕機疏散,recreate爲假的狀況下,nova rebuild操做

  • 3) 根據選擇的sceduler node 來對目標節點進行資源申請

  • 4) 獲取虛機的鏡像信息

  • 5) 根據虛機的uuid,讀取 block_device_mapping 表來獲取虛機的塊設備信息,

  • 6) 獲取虛機的網絡信息

  • 7) 把虛機的塊設備進行卸載

  • 8) 由於libvirt沒有實現rebuild驅動,因此實際調用了_rebuild_default_impl方法來實現,宕機疏散和rebuild重建

  • 9) 若是是宕機疏散nova evacuate操做,那麼就在目標節點上,調用spawn驅動,進行新建操做

  • 10) 若是是rebuild操做,那麼先在目標節點上destory虛機,而後再調用spawn驅動,進行新建操做,若是是evacuate操做,那麼直接進行重建虛機

整個nova-compute服務調用的主要函數流程以下:rebuild_instance------->_do_rebuild_instance_with_claim----->_do_rebuild_instance----->_rebuild_default_impl
相關文章
相關標籤/搜索