探索 OpenStack 之（9）：深刻塊存儲服務Cinder （功能篇）

時間 2019-11-09

標籤探索 openstack 深刻存儲服務 cinder 功能简体版

原文原文鏈接

繼研究了Neutron以後，繼續Nova的外圍研究之旅。本站是研究塊存儲服務Cinder。html

0。驗證環境node

環境包括：python

一、一個controller節點，運行nova-api, nova-scheduler, cinder-api, cinder-scheduler, mysql, rabbitmqmysql

二、一個Nova compute節點，運行一個虛機git

三、三個cinder volume節點，每一個節點使用LVMISCSIDriver來使用本地存儲github

4. 建立一個volume type，設置 volume_backend_name = lvmbackend算法

cinder.conf 在 block1上
enabled_backends =  lvmdriver-b1
[lvmdriver-b1]
volume_group = cinder-volumes
volume_driver = cinder.volume.drivers.lvm.LVMISCSIDriver
volume_backend_name = lvmbackend

cinder.conf 在 block2上
enabled_backends = lvmdriver-b21,lvmdriver-b22
storage_availability_zone=az1
[lvmdriver-b21]
iscsi_ip_address = 10.0.1.29
volume_group = cinder-volumes1
volume_driver = cinder.volume.drivers.lvm.LVMISCSIDriver
volume_backend_name = lvmbackend

[lvmdriver-b22]
volume_group = cinder-volumes2
volume_driver = cinder.volume.drivers.lvm.LVMISCSIDriver
volume_backend_name = lvmbackend

cinder.conf 在 block3上
enabled_backends = lvmdrier-network
[lvmdriver-network]
volume_group = system
volume_driver = cinder.volume.drivers.lvm.LVMISCSIDriver
volume_backend_name = lvmbackend

關於幾個小問題的說明：sql

[***] 會定一個一個 backend，下面幾行是該backend 的配置。須要啓用的backend 須要添加到 enabled_backends 的值中
volume_backend_name 會被 volume type 用到，經過 --property volume_backend_name 屬性來設置。
多個backends 中的 volume_backend_name 能夠相同。此時 scheduler 會按照指定的調度放在在多個 backend 以內選擇一個最合適的。
每一個 backend 都會有一個 cinder volume service instance，出如今 cinder service-list 命令的輸出中，其 Host 的格式爲 <節點名稱>@<backend 值>

cinder的service以下：數據庫

root@controller:/home/s1# cinder service-list
+------------------+---------------------------+------+---------+-------+----------------------------+-----------------+
|      Binary      |            Host           | Zone |  Status | State |         Updated_at         | Disabled Reason |
+------------------+---------------------------+------+---------+-------+----------------------------+-----------------+
|  cinder-backup   |         controller        | nova | enabled |   up  | 2015-01-11T16:36:00.000000 |       None      |
| cinder-scheduler |         controller        | nova | enabled |   up  | 2015-01-11T16:36:01.000000 |       None      |
|  cinder-volume   |    block1@lvmdriver-b1    | nova | enabled |   up  | 2015-01-11T16:36:08.000000 |       None      |
|  cinder-volume   |    block2@lvmdriver-b21   | az1  | enabled |   up  | 2015-01-11T16:36:06.000000 |       None      |
|  cinder-volume   |    block2@lvmdriver-b22   | az1  | enabled |   up  | 2015-01-11T16:36:05.000000 |       None      |
|  cinder-volume   | network@lvmdriver-network | nova | enabled |   up  | 2015-01-11T16:36:02.000000 |       None      |
+------------------+---------------------------+------+---------+-------+----------------------------+-----------------+

說明:後端

Cinder爲每個backend運行一個cinder-volume服務
經過在cinder.conf中設置 storage_availability_zone=az1 能夠指定cinder-volume host的Zone。用戶建立volume的時候能夠選擇AZ，配合cinder-scheduler的AvailabilityZoneFilter能夠將volume建立到指定的AZ中。默認的狀況下Zone爲nova。
經過在cinder.conf中的backend配置部分設置 iscsi_ip_address = 10.0.1.29 能夠指定iSCSI session使用的網卡，從而作到數據網絡和管理網絡的分離。
搭以上多節點環境，必定要注意各節點之間使用NTP進行時間同步，不然可能出現cinder-volume沒有任何錯誤，可是其狀態爲down的狀況。

一、關於OpenStack塊存儲

1.1 OpenStack中的存儲

1.2 虛機對塊存儲的要求

1.3 Cinder概述

1.4 Cinder的內部架構

三個主要組成部分
- –cinder-api 組件負責向外提供Cinder REST API
- –cinder-scheduler 組件負責分配存儲資源
- –cinder-volume 組件負責封裝driver，不一樣的driver負責控制不一樣的後端存儲
組件之間的RPC靠消息隊列（Queue）實現
Cinder的開發工做主要集中在scheduler和driver，以便提供更多的調度算法、更多的功能、以及指出更多的後端存儲
Volume元數據和狀態保存在Database中

1.5 Cinder的基本功能

1	卷操做	建立卷
2		從已有卷建立卷（克隆）
3		擴展卷
4		刪除卷
5	卷-虛機操做	掛載捲到虛機
6	卷-虛機操做	分離虛機卷
7	卷-快照操做	建立卷的快照
8		從已有卷快照建立卷
9		刪除快照
10	卷-鏡像操做	從鏡像建立卷
11	卷-鏡像操做	從卷建立鏡像

1.6 Cinder 插件

1.6.1 LVMISCSIDriver

上圖說明：

cinder-volume 服務部署在控制節點上。該節點上建立了一個 volume group。所謂LVM中的 Local，指的就是 volume group 和 cinder-volume 在同一個節點上，由於 cinder LVM driver 須要調用本地的相關命令來管理 volume group。
每一個 nova-compute 計算節點做爲 iSCSI Intiator 和存儲節點通訊。每當 VG 中爲一個計算節點的一臺虛機分配一個volume 後，在計算節點上就會出現一個 /dev/sdx 設備。此設備會被 qemu-kvm 提供給虛機，做爲虛機的一個disk。
一個 cinder-volume 節點上能夠有多個 volume group。此時每一個 volume group 都有一個 cinder-volume 服務實例。
一個 openstack 環境中能夠有多個 cinder-volume 節點。

每一個計算節點做爲一個iSCSI initiator。好比 Initiator: iqn.1993-08.org.debian:01:8d794081cd6a alias: compute1

root@compute1:/home/s1# cat /etc/iscsi/initiatorname.iscsi
InitiatorName=iqn.1993-08.org.debian:01:8d794081cd6a

每一個Lun做爲一個iSCSI target：

好比 IQN - iqn.2010-10.org.openstack:volume-3f204086-609e-449f-90a1-3a0d2c92c525

每一個initiator和target之間有個tcp session。在compute node上查看iSCSI session：

root@compute1:/home/s1# iscsiadm -m session
tcp: [10] 192.168.1.24:3260,1 iqn.2010-10.org.openstack:volume-5cfc715d-a7b3-47b4-bded-44c0a228360c
tcp: [11] 192.168.1.19:3260,1 iqn.2010-10.org.openstack:volume-4039eb07-90eb-4a92-8fd3-e3514cb4969b
tcp: [14] 192.168.1.29:3260,1 iqn.2010-10.org.openstack:volume-3f204086-609e-449f-90a1-3a0d2c92c525
tcp: [16] 10.0.1.29:3260,1 iqn.2010-10.org.openstack:volume-1b7f6669-06db-474e-bf78-4feea529be5b
tcp: [6] 192.168.1.24:3260,1 iqn.2010-10.org.openstack:volume-39363c5f-cf3c-4461-af83-00314839f05a
tcp: [9] 192.168.1.24:3260,1 iqn.2010-10.org.openstack:volume-a0a7ccb3-8864-4fd0-aee2-0e20d43ba8dd

每一個target的詳細信息：

tgtadm --lld iscsi --op show --mode target
Target 1: iqn.2010-10.org.openstack:volume-136354c3-5920-46b9-a930-52c055c53295
    System information:
        Driver: iscsi
        State: ready
    I_T nexus information:
        I_T nexus: 2
            Initiator: iqn.1993-08.org.debian:01:8d794081cd6a alias: compute1
            Connection: 0
                IP Address: 192.168.1.15
    LUN information:
        LUN: 0
            Type: controller
            SCSI ID: IET     00010000
            SCSI SN: beaf10
            Size: 0 MB, Block size: 1
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            SWP: No
            Thin-provisioning: No
            Backing store type: null
            Backing store path: None
            Backing store flags:
        LUN: 1
            Type: disk
            SCSI ID: IET     00010001
            SCSI SN: beaf11
            Size: 1074 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            SWP: No
            Thin-provisioning: No
            Backing store type: rdwr
            Backing store path: /dev/cinder-volumes/volume-136354c3-5920-46b9-a930-52c055c53295
            Backing store flags:
    Account information:
        s6KdhjSUrU2meEyxPTDZ
    ACL information:
        ALL

volume被從虛機分離後，相應的tcp session被刪除。

1.6.2 IBM SVC/DS8K/XIV 插件

上圖說明：

cinder-volume 經過管理網絡和存儲的控制器進行通訊來管理volume，所以，理論上最少只須要一個 cinder-volume 實例。可是，爲了高可用須要，每每會再部署一個備 cinder-volume 服務，並用 Pacemaker 進行控制。當主服務down後，備服務會接管。
以 XIV iSCSI driver （代碼在這裏）爲例說明建立卷的過程
- 該 cinder 經過遠程運行SSH 來執行XIV命令來操縱 XIV，好比運行 svctask mkvdisk 命令來建立一個XIV 卷。
- 當爲虛機分配一個XIV volume 時 cinder driver 所作的工做：（1）計算計算節點的 host name （2）在 XIV 上爲該host 建立一個 virtual host，須要指定HBA卡的WWPN（當採用FC鏈接時），或者host 的 IQN（當採用 iSCSI 鏈接時）（4）在XIV 上建立一個 LUN （5）將該LUN mount 給virtual host
- 此時，host 上每一個 volume 都會有一個 /dev/sdx 設備
- qemu 將 sdx 設備虛擬化爲虛機的一個 disk

1.6.3 LVM插件和Vendor插件的比較

二、 Cinder操做

下面講講幾個比較有意思的操做。

2.1 tranfer volume：將volume 的擁有權從一個tenant中的用戶轉移到另外一個tenant中的用戶。

兩步走：

1。在volume所在tenant的用戶使用命令 cinder transfer-create 產生tranfer的時候會產生transfer id 和 authkey：

目前的tenant id： os-vol-tenant-attr:tenant_id | 96aacc75dc3a488cb073faa06a34b235

2。在另外一個tenant中的用戶使用命令cinder transfer-accept 接受transfer的時候，須要輸入transfer id 和 auth_key

新的tenant id： os-vol-tenant-attr:tenant_id | 2f07ad0f1beb4b629e42e1113196c04b

其實，對volume來講，就是修改了tenant id （屬性：os-vol-tenant-attr:tenant_id ）而已。

2.2 volume migrate：將volume從一個backend遷移到另外一個backend

多種可能的狀況：

1. 若是volume沒有attach到虛機，

1.1 若是是同一個存儲上不一樣backend之間的遷移，須要存儲的driver會直接支持存儲上的migrate。

1.2 若是是不一樣存儲上的backend之間的volume遷移，或者存儲cinder driver不支持同一個存儲上backend之間的遷移，那麼將使用cinder默認的遷移操做：Cinder首先建立一個新的volume，而後從源volume拷貝數據到新volume，而後將老的volume刪除。

2. 若是volume已經被attach到虛機，Cinder建立一個新的volume，調用Nova去將數據從源volume拷貝到新volume，而後將老的volume刪除。目前只支持Compute libvirt driver.

注意在多個backend的狀況下，host必須使用host全名。好比： cinder migrate vol-b21-1 block2@lvmdriver-b22

2.3 volume backup

OpenStack Juno版本支持將volume備份到 Ceph，Swift，IBM Tivoli Storage Manager (TSM)。

2.4 qos 支持

Cinder提供qos支持框架，具體的實現依賴於各vendor實現的plugin。

以IBM SVC爲例，能夠按照以下方法使用qos：

（1）建立一個qos spec：

cinder qos-create qos-spec qos:IOThrottling=12345

（2）關聯qos spec到一個volume type

cinder qos-associate 0e710a13-3c40-4d50-8522-72bddabd93cc

（3）建立該volume type類型的volume

cinder create 1 --volume-type svc-driver25 --display-name volwit

（4）查看該volume，其被設置了throttling 屬性，它限制了該volume上最大的I/O。

SVC Volume: throttling 12345

3 cinder的組件

關於RPC： cinder內部各組件之間使用基於RabbitMQ的RPC通訊。cinder-scheduler和cinder-volume分別會建立RPC鏈接，啓動消費者線程，而後等待隊列消息。當輪詢查詢到消息到達後，建立線程處理相關消息。

3.1 cinder-api

主要服務接口, 負責接受和處理外界的API請求，並將請求放入RabbitMQ隊列，交由後端執行。

cinder-api提供兩個版本的REST API：V1提供Volume，Vloume type，Snapshot操做的API；V2增長了QoS，Limits，Backup操做的API。

除了V1和V2文檔列出來的API外，一些volume的操做須要經過POST + action的方式實現，好比extend volume：

POST http://controller:8776/v1/fa2046aaead44a698de8268f94759fc1/volumes/8e87490c-fa18-4cff-b10e-27645c2a7b99/action

Action body: {"os-extend": {"new_size": 2}}

此類操做有：

os-reset_status
os-force_delete
os-force_detach
os-migrate_volume
os-migrate_volume_completion
os-reset_status
os-update_snapshot_status
os-attach
os-detach
os-reserve
os-unreserve
os-begin_detaching
os-roll_detaching
os-initialize_connection
os-terminate_connection
os-volume_upload_image
os-extend
os-update_readonly_flag
os-retype
os-set_bootable
os-promote-replica
os-reenable-replica
os-unmanage

cinder-api service 的啓動過程分析見探索 OpenStack 之（11）：cinder-api Service 啓動過程分析以及 WSGI / Paste deploy / Router 等介紹 (2015-02-04 16:01)

3.2 cinder-scheduler

cinder-scheduler的用途是在多backend環境中決定新建volume的放置host：

0。首先判斷host的狀態，只有service狀態爲up的host纔會被考慮。

1。建立volume的時候，根據filter和weight算法選出最優的host來建立volume。

2。遷移volume的時候，根據filter和weight算法來判斷目的host是否是符合要求。

若是選出一個host，則使用RPC調用cinder-volume來執行volume操做。

爲了維護host的狀態，cinder-scheduler接受定時的host上cinder-volume狀態上報：

2015-01-12 02:02:56.688 828 DEBUG cinder.scheduler.host_manager [req-403ef666-5551-4f31-a130-7bcad8e9d1ec - - - - -] Received volume service update from block2@lvmdriver-b21: {u'pools': [{u'pool_name': u'lvmbackend', u'QoS_support': False, u'allocated_capacity_gb': 1, u'free_capacity_gb': 3.34, u'location_info': u'LVMVolumeDriver:block2:cinder-volumes1:default:0', u'total_capacity_gb': 5.34, u'reserved_percentage': 0}], u'driver_version': u'2.0.0', u'vendor_name': u'Open Source', u'volume_backend_name': u'lvmbackend', u'storage_protocol': u'iSCSI'} update_service_capabilities /usr/lib/python2.7/dist-packages/cinder/scheduler/host_manager.py:434

3.2.1 Host Filtering 算法

默認的filter包括 AvailabilityZoneFilter,CapacityFilter,CapabilitiesFilter。其中：

AvailabilityZoneFilter會判斷cinder host的availability zone是否是與目的az相同。不一樣則被過濾掉。
CapacityFilter會判斷host上的剩餘空間 free_capacity_gb 大小，確保free_capacity_gb 大於volume 的大小。不夠則被過濾掉。
CapabilitiesFilter會檢查host的屬性是否和volume type中的extra specs是否徹底一致。不一致則被國旅掉。

通過以上Filter的過濾，cinder-scheduler會獲得符合條件的host列表，而後進入weighting環節，根據weighting算法選出最優的host。獲得空列表則報No valid host was found錯誤。

cinder.conf中，scheduler_default_filters不設置的話，cinder-scheduler默認會使用這三個filter。

3.2.2 Host Weighting 算法

AllocatedCapacityWeigher：有最小已使用空間的host勝出。可設置allocated_capacity_weight_multiplier爲正值來反轉，其默認值爲-1。
CapacityWeigher：有最大可以使用空間的host勝出。可設置capacity_weight_multiplier爲負值來反轉算法，其默認值爲1
ChanceWeigher：隨機從過濾出的host中選擇一個host

通過此步驟，cinder-scheduler將獲得一個weighted_hosts列表，它將會選擇第一個host作爲volume的目的host，把它加到retry_hosts列表中，而後經過RPC調用上面的cinder-volume來建立volume。

cinder.conf中，scheduler_default_weighers不設置的話，cinder-scheduler默認使用 CapacityWeigher。

3.3 cinder-volume

該服務運行在存儲節點上，管理存儲空間，處理cinder數據庫的維護狀態的讀寫請求，經過消息隊列和直接在塊存儲設備或軟件上與其餘進程交互。每一個存儲節點都有一個Volume Service，若干個這樣的存儲節點聯合起來能夠構成一個存儲資源池。

cinder-volume會實現一些common操做，好比 copy_volume_data，在driver.py裏面實現先attach source 和 target volume，而後執行拷貝數據。其它操做則須要調用driver的接口來實現volume的操做。

3.3.1 volume建立失敗重試機制

用戶能夠在cinder.conf中使用scheduler_max_attempts來配置volume建立失敗時候的重試次數，默認次數爲3，值爲1則表示不使用重試機制。

# Maximum number of attempts to schedule an volume (integer value)
#scheduler_max_attempts=3

cinder-sheduler和cinder-volume之間會傳遞當前是重試次數。若是volume建立失敗，cinder-volume會經過RPC從新調用cinder-scheduler去建立volume，cinder-scheduler會檢查當前的重試次數是否是超過最大可重試次數。若是沒超過，它會選擇下一個可使用的host去從新建立volume。若是在規定的重試次數內仍然沒法建立volume，那麼會報No valid host was found錯誤。

好比下面的重試過程：

cinder-volume：

Insufficient free space for volume creation on host network@lvmdriver-network#lvmbackend (requested / avail): 5/0.0
Insufficient free space for volume creation on host block2@lvmdriver-b2#lvmbackend (requested / avail): 5/4.0
Insufficient free space for volume creation on host block1@lvmdriver-b1#lvmbackend (requested / avail): 5/1.0

cinder-scheduler： No valid host was found

3.3.2 從image建立volume

a。volume-driver首先嚐試去調用driver的clone_image方法。大多數driver沒有實現該方法，好比默認的LVM driver。IBM的GPFS Driver有實現該方法，其實現參考其註釋：

Attempt to create a volume by efficiently copying image to volume. If both source and target are backed by gpfs storage and the source image is in raw format move the image to create a volume using either gpfs clone operation or with a file copy. If the image format is not raw, convert it to raw at the volume path.

b。若driver的clone-image方法不成功，則執行Cinder的默認方法：（1）建立一個raw的volume，設置其狀態爲downloading （2）將image下載並拷貝到該volume。具體方法每一個driver能夠自行實現，Cinder也提供默認實現。

c。拷貝image的metadata到volume的metadata。