RBD塊存儲是ceph提供的3種存儲類型中使用最普遍,最穩定的存儲類型。RBD塊相似於磁盤,能夠掛載到物理機或虛擬機中,一般的掛載方式有兩種:node
塊是一個有序字節,普通的一個塊大小爲512字節。基於塊存儲是最多見的方式,常見的硬盤、軟盤和CD光驅等都是存儲數據最簡單快捷的設備。linux
在物理機上提供塊設備時,使用的是Kernel的RBD模塊,基於內核模塊驅動時,可使用Linux自帶的頁緩存(Page Caching)來提升性能。
當在虛擬機(好比QUEM/KVM)提供塊設備時,一般是使用LIBVIRT調用librbd庫的方式提供塊設備。docker
在部署以前,須要檢查內核版本,看是否支持RBD,建議升級到4.5以上版本內核緩存
[root@local-node-1 ~]# uname -r 4.4.174-1.el7.elrepo.x86_64 [root@local-node-1 ~]# modprobe rbd
[root@local-node-1 ~]# ceph -s cluster: id: 7bd25f8d-b76f-4ff9-89ec-186287bbeaa5 health: HEALTH_OK services: mon: 2 daemons, quorum local-node-2,local-node-3 mgr: ceph-mgr(active) osd: 9 osds: 9 up, 9 in data: pools: 0 pools, 0 pgs objects: 0 objects, 0 B usage: 9.2 GiB used, 81 GiB / 90 GiB avail pgs: [root@local-node-1 ~]# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0.08817 root default -3 0.02939 host local-node-1 0 hdd 0.00980 osd.0 up 1.00000 1.00000 1 hdd 0.00980 osd.1 up 1.00000 1.00000 2 hdd 0.00980 osd.2 up 1.00000 1.00000 -5 0.02939 host local-node-2 3 hdd 0.00980 osd.3 up 1.00000 1.00000 4 hdd 0.00980 osd.4 up 1.00000 1.00000 5 hdd 0.00980 osd.5 up 1.00000 1.00000 -7 0.02939 host local-node-3 6 hdd 0.00980 osd.6 up 1.00000 1.00000 7 hdd 0.00980 osd.7 up 1.00000 1.00000 8 hdd 0.00980 osd.8 up 1.00000 1.00000
ceph osd pool create ceph_rbd 128 rbd pool init ceph_rbd
ceph auth get-or-create client.{ID} mon 'profile rbd' osd 'profile {profile name} [pool={pool-name}][, profile ...]' EG: # ceph auth get-or-create client.docker mon 'profile rbd' osd 'profile rbd pool=ceph_rbd, profile rbd-read-only pool=images' [client.docker] key = AQDQkK1cpNAKJRAAnaw2ZYeFHsXrsTWX3QonkQ==
[root@local-node-1 ~]# cat /etc/ceph/ceph.client.docker.keyring [client.docker] key = AQDQkK1cpNAKJRAAnaw2ZYeFHsXrsTWX3QonkQ==
rbd create --size {megabytes} {pool-name}/{image-name} eg: [root@local-node-1 ~]# rbd create --size 1024 ceph_rbd/docker_image
[root@local-node-1 ~]# rbd ls ceph_rbd docker_image
[root@local-node-1 ~]# rbd trash ls ceph_rbd
[root@local-node-1 ~]# rbd info ceph_rbd/docker_image rbd image 'docker_image': size 1 GiB in 256 objects order 22 (4 MiB objects) id: 1bedc6b8b4567 block_name_prefix: rbd_data.1bedc6b8b4567 format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten op_features: flags: create_timestamp: Wed Apr 10 14:52:48 2019
# 擴容 [root@local-node-1 ~]# rbd resize --size 2048 ceph_rbd/docker_image Resizing image: 100% complete...done. # 縮容 [root@local-node-1 ~]# rbd resize --size 1024 ceph_rbd/docker_image --allow-shrink Resizing image: 100% complete...done.
Ceph的塊鏡像設備是精簡配置,在建立他們並指定大小時並不會使用物理存儲空間,直到存儲數據。可是它們能夠經過--size指定最大容量的限制。ide
rbd rm ceph_rbd/docker_image
# rbd trash mv ceph_rbd/docker_image # rbd trash ls ceph_rbd 1beb76b8b4567 docker_image
rbd trash restore ceph_rbd/1beb76b8b4567 # 此時trash中將不會有數據,原來的鏡像已經恢復 # rbd ls ceph_rbd docker_image
rbd trash restore ceph_rbd/1beb76b8b4567 --image docker # 將鏡像重命名爲docker
rbd trash rm ceph_rbd/1beb76b8b4567
提示:性能
# 在內核版本爲4.5如下的,建議使用hammer,不然會出現沒法映射rbd的狀況 [root@local-node-1 ~]# ceph osd crush show-tunables { "choose_local_tries": 0, "choose_local_fallback_tries": 0, "choose_total_tries": 50, "chooseleaf_descend_once": 1, "chooseleaf_vary_r": 1, "chooseleaf_stable": 0, "straw_calc_version": 1, "allowed_bucket_algs": 54, "profile": "hammer", "optimal_tunables": 0, "legacy_tunables": 0, "minimum_required_version": "hammer", "require_feature_tunables": 1, "require_feature_tunables2": 1, "has_v2_rules": 0, "require_feature_tunables3": 1, "has_v3_rules": 0, "has_v4_buckets": 1, "require_feature_tunables5": 0, "has_v5_rules": 0 } # 設置版本 [root@local-node-1 ~]# ceph osd crush tunables hammer
[root@local-node-1 ~]# rbd list ceph_rbd docker_image
3.在客戶端映射塊設備(須要安裝ceph)測試
[root@local-node-1 ~]# rbd device map ceph_rbd/docker_image --id admin /dev/rbd0
==提示:==
若是在執行映射步驟時出現如下報錯,說明當前內核rbd的一些特性並不支持,須要禁用某些特性:ui
# rbd device map ceph_rbd/docker_image --id admin rbd: sysfs write failed RBD image feature set mismatch. Try disabling features unsupported by the kernel with "rbd feature disable". In some cases useful info is found in syslog - try "dmesg | tail". rbd: map failed: (6) No such device or address
禁用特性:this
# rbd feature disable ceph_rbd/docker_image exclusive-lock, object-map, fast-diff, deep-flatten
sudo rbd device map rbd/myimage --id admin --keyring /path/to/keyring sudo rbd device map rbd/myimage --id admin --keyfile /path/to/file eg: #rbd device map ceph_rbd/docker_image --id docker --keyring /etc/ceph/ceph.client.docker.keyring /dev/rbd0
[root@local-node-1 ~]# rbd device list id pool image snap device 0 ceph_rbd docker_image - /dev/rbd0 [root@local-node-1 ~]# lsblk | grep rbd rbd0 252:0 0 1G 0 disk
[root@local-node-1 ~]# mkfs.xfs /dev/rbd [root@local-node-1 ~]# mount /dev/rbd0 /mnt
7 . 若是要卸載設備使用以下命令:lua
[root@local-node-1 ~]# umount /mnt/ [root@local-node-1 ~]# rbd device unmap /dev/rbd0
當掛載的rbd 被卸載掉以後,塊設備中的數據通常狀況下不會丟失(強制重啓後可能會損壞數據從而不可用),能夠從新掛載到另外一個主機上。
RBD支持複製,快照和在線擴容等功能。
快照是映像在某個特定時間點的一份==只讀副本==。 Ceph 塊設備的一個高級特性就是你能夠爲映像建立快照來保留其歷史。 Ceph 還支持分層快照,讓你快速、簡便地克隆映像(如 VM 映像)。 Ceph 的快照功能支持 rbd 命令和多種高級接口,包括 QEMU 、 libvirt 、 OpenStack 和 CloudStack 。
若是在作快照時映像仍在進行 I/O 操做,快照可能就獲取不到該映像準確的或最新的數據,而且該快照可能不得不被克隆到一個新的可掛載的映像中。因此,咱們建議在作快照前先中止 I/O 操做。若是映像內包含文件系統,在作快照前請確保文件系統處於一致的狀態或者使用
fsck
命令先檢查掛載的塊設備。要中止 I/O 操做可使用 fsfreeze 命令。 對於虛擬機,qemu-guest-agent 被用來在作快照時自動凍結文件系統。
fsfreeze -f /mnt
rbd snap create rbd/foo@snapname EG: rbd snap create rbd/test@test-snap
fsfreeze -u /mnt/
[root@local-node-1 ~]# rbd snap ls rbd/test SNAPID NAME SIZE TIMESTAMP 4 test-snap 1 GiB Tue Apr 16 14:49:27 2019 5 test-snap-2 1 GiB Tue Apr 16 15:56:18 2019
rbd snap rollback rbd/test@test-snap
# mount /dev/rbd0 /mnt/ mount: wrong fs type, bad option, bad superblock on /dev/rbd0, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so. # 當使用xfs_repair修復時出現以下報錯: xfs_repair /dev/rbd0 Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. # 使用-L 參數修復: xfs_repair -L /dev/rbd0
mount /dev/rbd0 /mnt/
rbd snap rm rbd/test@test-snap # 若是要刪除一個映像的全部快照,執行: rbd snap purge rbd/foo
Ceph 能夠從快照中克隆寫時複製副本,因爲快照是隻讀的,當須要對文件進行修改時,咱們可使用快照建立一個寫時複製的副本。(Openstack中,使用這種機制建立新的虛擬機,一般使用快照保存鏡像,複製這個快照建立新的虛擬機)
Ceph 僅支持克隆 format 2 的映像(即用 rbd create --image-format 2 建立的,這個在新版中是默認的)。內核客戶端從 3.10 版開始支持克隆的映像
具體的流程以下:
建立塊設備映像-->建立快照-->保護快照-->克隆快照
# 建立快照 rbd snap create rbd/test@test-snap #列出快照 # rbd snap list rbd/test SNAPID NAME SIZE TIMESTAMP 10 test-snap 1 GiB Tue Apr 16 17:46:48 2019 # 保護快照,這樣就沒法被刪除了 rbd snap protect rbd/test@test-snap
rbd clone {pool-name}/{parent-image}@{snap-name} {pool-name}/{child-image-name} EG: rbd clone rbd/test@test-snap rbd/test-new
查看新建立的鏡像:
# rbd ls test test-new
rbd snap unprotect rbd/test@test-snap
# rbd children rbd/test@test-snap rbd/test-new [root@local-node-1 ~]# rbd --pool rbd --image test-new info rbd image 'test-new': size 1 GiB in 256 objects order 22 (4 MiB objects) id: ba9096b8b4567 block_name_prefix: rbd_data.ba9096b8b4567 format: 2 features: layering op_features: flags: create_timestamp: Tue Apr 16 17:53:51 2019 parent: rbd/test@test-snap # 此處顯示了父快照等關聯信息 overlap: 1 GiB
rbd flatten rbd/test-new
# rbd resize ceph_rbd/docker_image --size 4096 Resizing image: 100% complete...done.
# rbd info ceph_rbd/docker_image rbd image 'docker_image': size 4 GiB in 1024 objects order 22 (4 MiB objects) id: 1bef96b8b4567 block_name_prefix: rbd_data.1bef96b8b4567 format: 2 features: layering op_features: flags: create_timestamp: Wed Apr 10 15:50:21 2019
# xfs_growfs -d /mnt meta-data=/dev/rbd0 isize=512 agcount=9, agsize=31744 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=0 spinodes=0 data = bsize=4096 blocks=262144, imaxpct=25 = sunit=1024 swidth=1024 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=8 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 data blocks changed from 262144 to 1048576 # lsblk | grep mnt rbd0 252:0 0 4G 0 disk /mnt
# df -h | grep mnt /dev/rbd0 4.0G 3.1G 998M 76% /mnt
# rbd resize --size 2048 ceph_rbd/docker_image --allow-shrink Resizing image: 100% complete...done.
# lsblk |grep mnt rbd0 252:0 0 2G 0 disk /mnt # xfs_growfs -d /mnt/ meta-data=/dev/rbd0 isize=512 agcount=34, agsize=31744 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=0 spinodes=0 data = bsize=4096 blocks=1048576, imaxpct=25 = sunit=1024 swidth=1024 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=8 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 data size 524288 too small, old size is 1048576
==若是要卸載RBD,存放的數據會丟失。==
[root@local-node-3 mnt]# df -h | grep mnt ... /dev/rbd0 2.0G 33M 2.0G 2% /mnt [root@local-node-3 ~]# rbd device list id pool image snap device 0 ceph_rbd docker_image - /dev/rbd0 [root@local-node-3 ~]# rbd device unmap /dev/rbd0
[root@local-node-3 ~]# rbd ls ceph_rbd docker_image [root@local-node-3 ~]# rbd info ceph_rbd/docker_image rbd image 'docker_image': size 2 GiB in 512 objects order 22 (4 MiB objects) id: 1bef96b8b4567 block_name_prefix: rbd_data.1bef96b8b4567 format: 2 features: layering op_features: flags: create_timestamp: Wed Apr 10 15:50:21 2019 [root@local-node-3 ~]# rbd trash ls ceph_rbd # 將rbd移除到trash,也能夠直接刪除 [root@local-node-3 ~]# rbd trash mv ceph_rbd/docker_image [root@local-node-3 ~]# rbd trash ls ceph_rbd 1bef96b8b4567 docker_image # 從trash 中刪除鏡像 [root@local-node-3 ~]# rbd trash rm ceph_rbd/1bef96b8b4567 Removing image: 100% complete...done. [root@local-node-3 ~]# rbd trash ls ceph_rbd [root@local-node-3 ~]# rbd ls ceph_rbd
[root@local-node-3 ~]# ceph osd lspools 7 ceph_rbd [root@local-node-3 ~]# ceph osd pool rm ceph_rbd ceph_rbd --yes-i-really-really-mean-it pool 'ceph_rbd' removed
[root@local-node-3 ~]# ceph -s cluster: id: 7bd25f8d-b76f-4ff9-89ec-186287bbeaa5 health: HEALTH_OK services: mon: 3 daemons, quorum local-node-1,local-node-2,local-node-3 mgr: ceph-mgr(active) osd: 9 osds: 9 up, 9 in data: pools: 0 pools, 0 pgs objects: 0 objects, 0 B usage: 9.3 GiB used, 81 GiB / 90 GiB avail pgs: [root@local-node-3 ~]# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0.08817 root default -3 0.02939 host local-node-1 0 hdd 0.00980 osd.0 up 1.00000 1.00000 1 hdd 0.00980 osd.1 up 1.00000 1.00000 2 hdd 0.00980 osd.2 up 1.00000 1.00000 -5 0.02939 host local-node-2 3 hdd 0.00980 osd.3 up 1.00000 1.00000 4 hdd 0.00980 osd.4 up 1.00000 1.00000 5 hdd 0.00980 osd.5 up 1.00000 1.00000 -7 0.02939 host local-node-3 6 hdd 0.00980 osd.6 up 1.00000 1.00000 7 hdd 0.00980 osd.7 up 1.00000 1.00000 8 hdd 0.00980 osd.8 up 1.00000 1.00000