Ceph塊設備,之前稱爲 RADOS 塊設備,爲客戶機提供可靠的、分佈式的和高性能的塊存儲磁盤。 node
RADOS 塊設備利用 librbd 庫並以順序的形式在 Ceph 集羣中的多個 osd 上存儲數據塊。RBD是由 Ceph 的 RADOS 層支持的,所以每一個塊設備都分佈在多個 Ceph 節點上,提供了高性能和優異的可靠性。RBD 有 linux 內核的本地支持。linux
任何普通的 linux 主機均可以充當 Ceph 的客戶機。客戶端經過網絡與 Ceph 存儲集羣交互以存儲或檢索用戶數據。Ceph RBD 支持已經添加到 Linux主線內核中,從 2.6.34和之後的版本開始。git
192.168.3.158爲客戶端作以下操做
github
[root@localhost ~]# cat /etc/hosts …… 192.168.3.165 ceph165 192.168.3.166 ceph166 192.168.3.167 ceph167 192.168.3.158 ceph158 [root@localhost ~]# hostnamectl set-hostname ceph158
# wget -O /etc/yum.repos.d/ceph.repo https://raw.githubusercontent.com/aishangwei/ceph-demo/master/ceph-deploy/ceph.repo
# mkdir -p /etc/ceph
# yum -y install epel-release # yum -y install ceph # cat /etc/ceph/ceph.client.rbd.keyring
# 建立 ceph 塊客戶端用戶名和認證密鑰vim
[ceph@ceph165 my-cluster]$ ceph auth get-or-create client.rbd mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=rbd' | tee ./ceph.client.rbd.keyring [client.rbd] key = AQBLBwRepKVJABAALyRx67z6efeI4xogPqHkyw== 注: client.rbd 爲客戶端名 mon 以後的全爲受權配置
拷貝配置文件及密鑰到客戶機centos
[ceph@ceph165 my-cluster]$ scp ceph.client.rbd.keyring root@192.168.3.158:/etc/ceph [ceph@ceph165 my-cluster]$ scp ceph.conf root@192.168.3.158:/etc/ceph
# 檢查是否符合塊設備環境要求bash
uname -r modprobe rbd
#安裝 ceph 客戶端服務器
# wget -O /etc/yum.repos.d/ceph.repo https://raw.githubusercontent.com/aishangwei/ceph-demo/master/ceph-deploy/ceph.repo
查看密鑰文件網絡
[root@ceph158 ~]# cat /etc/ceph/ceph.client.rbd.keyring [client.rdb] key = AQBLBwRepKVJABAALyRx67z6efeI4xogPqHkyw== [root@ceph158 ~]# ceph -s --name client.rbd
在服務器192.168.3.165執行以下命令app
(1) 建立塊設備
默認建立塊設備,會直接建立在 rbd 池中,但使用 deploy 安裝後,該 rbd 池並無建立。
# 建立池和塊
$ ceph osd lspools # 查看集羣存儲池 $ ceph osd pool create rbd 50 pool 'rbd' created # 50 爲 place group 數量,因爲咱們後續測試,也須要更多的 pg,因此這裏設置爲50
肯定pg_num取值是強制性的,由於不能自動計算。下面是幾個經常使用的值:
少於5個OSD時可把pg_num設置爲128
OSD數量在5到10個時,可把pg_num設置爲512
OSD數量在10到50個時,可把pg_num設置爲4096
OSD數量大於50時,你得理解權衡方法、以及如何本身計算pg_num取值
(2)客戶端建立 塊設備
建立一個容量爲 5105M 的 rbd 塊設備
[root@ceph158 ~]# rbd create rbd2 --size 5105 --name client.rbd
192.168.3.158 客戶端查看 rbd2 塊設備
[root@ceph158 ~]# rbd ls --name client.rbd rbd2 [root@ceph158 ~]# rbd ls -p rbd --name client.rbd rbd2 [root@ceph158 ~]# rbd list --name client.rbd rbd2
查看 rbd2塊設備信息
[root@ceph158 ~]# rbd --image rbd2 info --name client.rbd
# 映射到客戶端,應該會報錯
[root@ceph158 ~]# rbd map --image rbd2 --name client.rbd
layering:分層支持
- exclusive-lock:排它鎖定支持對
- object-map:對象映射支持(須要排它鎖定(exclusive-lock))
- deep-flatten:快照平支持(snapshot flatten support)
- fast-diff:在client-node1上使用krbd(內核rbd)客戶機進行快速diff計算(須要對象映射),咱們將沒法在CentOS內核3.10上映射塊設備映像,由於該內核不支持對象映射(object-map)、深平(deep-flatten)和快速dif(fast-dif)(在內核4.9中引入了支持)。爲了解決這個問題,咱們將禁用不支持的特性,有幾個選項能夠作到這一點:
1)動態禁用
rbd feature disable rbdl exclusive-lock object-map deep-flatten fast-diff--name client.rbd
2)建立RBD鏡像時,只啓用分層特性。
rbd create rbd2 --size 10240 --image-feature layering--name client.rbd
3)ceph配置文件中禁用
rbd_default_features=1
# 咱們這裏動態禁用
[root@ceph158 ~]# rbd feature disable rbd2 exclusive-lock object-map fast-diff deep-flatten --name client.rbd
對 rbd2 進行映射
[root@ceph158 ~]# rbd map --image rbd2 --name client.rbd
查看本機已經映射的 rbd 鏡像
[root@ceph158 ~]# rbd showmapped --name client.rbd
查看磁盤 rbd0 大小
格式化 rbd0
建立掛載目錄並進行掛載
[root@ceph158 ~]# mkdir /mnt/ceph-disk1 [root@ceph158 ~]# mount /dev/rbd0 /mnt/ceph-disk1/
# 寫入數據測試
[root@ceph158 ~]# dd if=/dev/zero of=/mnt/ceph-disk1/file1 count=100 bs=1M
# 作成服務,開機自動掛載
[root@ceph103-]# wget -O /usr/local/bin/rbd-mount https://raw.githubusercontent.com/aishangwei/ceph-demo/master/client/rbd-mount
# vim /usr/local/bin/rbd-mount
[root@ceph158 ~]# chmod +x /usr/local/bin/rbd-mount [root@ceph158~ ]# wget -O /etc/systemd/system/rbd-mount.service https://raw.githubusercontent.com/aishangwei/ceph-demo/master/client/rbd-mount.service [root@ceph158 ~]# systemctl daemon-reload [root@ceph158 ~]# systemctl enable rbd-mount.service Created symlink from /etc/systemd/system/multi-user.target.wants/rbd-mount.service to /etc/systemd/system/rbd-mount.service.
卸載手動掛載的目錄,進行服務自動掛載測試
[root@ceph158 ~]# umount /mnt/ceph-disk1/ [root@ceph158 ~]# systemctl status rbd-mount
Ceph: RBD 在線擴容容量
Ceph管理端的操做
查詢 pool 總容量及已經分配容量
[root@ceph165 ~]# ceph df
查看已經存在的 pool
[root@ceph165 ~]# ceph osd lspools
查看已經有的 rbd
開始對 rbd2 進行動態擴容
[root@ceph165 ~]# rbd resize rbd/rbd2 --size 7168
Ceph客戶端的操做
[root@ceph158 ~]# rbd showmapped
[root@ceph158 ~]# df -h
[root@ceph158 ~]# xfs_growfs -d /mnt/ceph-disk1
輸入ceph-deploy mon create-initial命令獲取密鑰key,會在當前目錄(如個人是~/etc/ceph/)下生成幾個key,但報錯以下。意思是:就是配置失敗的兩個結點的配置文件的內容於當前節點不一致,提示使用--overwrite-conf參數去覆蓋不一致的配置文件。
# ceph-deploy mon create-initial ... [ceph2][DEBUG ] remote hostname: ceph2 [ceph2][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf [ceph_deploy.mon][ERROR ] RuntimeError: config file /etc/ceph/ceph.conf exists with different content; use --overwrite-conf to overwrite [ceph_deploy][ERROR ] GenericError: Failed to create 2 monitors ...
輸入命令以下(此處我共配置了三個結點ceph1~3):
# ceph-deploy --overwrite-conf mon create ceph{3,1,2} ... [ceph2][DEBUG ] remote hostname: ceph2 [ceph2][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf [ceph2][DEBUG ] create the mon path if it does not exist [ceph2][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-ceph2/done ...
以後配置成功,可繼續進行初始化磁盤操做。
[root@ceph1 ceph]# ceph -s cluster: id: 8e2248e4-3bb0-4b62-ba93-f597b1a3bd40 health: HEALTH_WARN too few PGs per OSD (21 < min 30) services: mon: 3 daemons, quorum ceph2,ceph1,ceph3 ……
從上面集羣狀態信息可查,每一個osd上的pg數量=21<最小的數目30個。pgs爲32,由於我以前設置的是2副本的配置,因此當有3個osd的時候,每一個osd上均分了32÷3*2=21個pgs,也就是出現瞭如上的錯誤 小於最小配置30個。
集羣這種狀態若是進行數據的存儲和操做,會發現集羣卡死,沒法響應io,同時會致使大面積的osd down。
解決辦法:
增長pg數
由於個人一個pool有8個pgs,因此我須要增長兩個pool才能知足osd上的pg數量=48÷3*2=32>最小的數目30。
[root@ceph1 ceph]# ceph osd pool create mytest 8 pool 'mytest' created [root@ceph1 ceph]# ceph osd pool create mytest1 8 pool 'mytest1' created [root@ceph1 ceph]# ceph -s cluster: id: 8e2248e4-3bb0-4b62-ba93-f597b1a3bd40 health: HEALTH_OK services: mon: 3 daemons, quorum ceph2,ceph1,ceph3 mgr: ceph2(active), standbys: ceph1, ceph3 osd: 3 osds: 3 up, 3 in rgw: 1 daemon active data: pools: 6 pools, 48 pgs objects: 219 objects, 1.1 KiB usage: 3.0 GiB used, 245 GiB / 248 GiB avail pgs: 48 active+clean
集羣健康狀態顯示正常。
若是此時,查看集羣狀態是HEALTH_WARN application not enabled on 1 pool(s):
[root@ceph1 ceph]# ceph -s cluster: id: 13430f9a-ce0d-4d17-a215-272890f47f28 health: HEALTH_WARN application not enabled on 1 pool(s) [root@ceph1 ceph]# ceph health detail HEALTH_WARN application not enabled on 1 pool(s) POOL_APP_NOT_ENABLED application not enabled on 1 pool(s) application not enabled on pool 'mytest' use 'ceph osd pool application enable <pool-name> <app-name>', where <app-name> is 'cephfs', 'rbd', 'rgw', or freeform for custom applications.
運行ceph health detail命令發現是新加入的存儲池mytest沒有被應用程序標記,由於以前添加的是RGW實例,因此此處依提示將mytest被rgw標記便可:
[root@ceph1 ceph]# ceph osd pool application enable mytest rgw enabled application 'rgw' on pool 'mytest'
再次查看集羣狀態發現恢復正常
[root@ceph1 ceph]# ceph health HEALTH_OK
如下以刪除mytest存儲池爲例,運行ceph osd pool rm mytest命令報錯,顯示須要在原命令的pool名字後再寫一遍該pool名字並最後加上--yes-i-really-really-mean-it參數
[root@ceph1 ceph]# ceph osd pool rm mytest Error EPERM: WARNING: this will *PERMANENTLY DESTROY* all data stored in pool mytest. If you are *ABSOLUTELY CERTAIN* that is what you want, pass the pool name *twice*, followed by --yes-i-really-really-mean-it.
按照提示要求複寫pool名字後加上提示參數以下,繼續報錯:
[root@ceph1 ceph]# ceph osd pool rm mytest mytest --yes-i-really-really-mean-it Error EPERM: pool deletion is disabled; you must first set the mon_allow_pool_delete config option to true before you can destroy a pool
錯誤信息顯示,刪除存儲池操做被禁止,應該在刪除前如今ceph.conf配置文件中增長mon_allow_pool_delete選項並設置爲true。因此分別登陸到每個節點並修改每個節點的配置文件。操做以下:
[root@ceph1 ceph]# vi ceph.conf [root@ceph1 ceph]# systemctl restart ceph-mon.target
在ceph.conf配置文件底部加入以下參數並設置爲true,保存退出後使用systemctl restart ceph-mon.target命令重啓服務。
[mon]
mon allow pool delete = true
其他節點操做同理。
[root@ceph2 ceph]# vi ceph.conf [root@ceph2 ceph]# systemctl restart ceph-mon.target [root@ceph3 ceph]# vi ceph.conf [root@ceph3 ceph]# systemctl restart ceph-mon.target
再次刪除,即成功刪除mytest存儲池。
[root@ceph1 ceph]# ceph osd pool rm mytest mytest --yes-i-really-really-mean-it pool 'mytest' removed
筆者將ceph集羣中的三個節點分別關機並重啓後,查看ceph集羣狀態以下:
[root@ceph1 ~]# ceph -s cluster: id: 13430f9a-ce0d-4d17-a215-272890f47f28 health: HEALTH_WARN 1 MDSs report slow metadata IOs 324/702 objects misplaced (46.154%) Reduced data availability: 126 pgs inactive Degraded data redundancy: 144/702 objects degraded (20.513%), 3 pgs degraded, 126 pgs undersized services: mon: 3 daemons, quorum ceph2,ceph1,ceph3 mgr: ceph1(active), standbys: ceph2, ceph3 mds: cephfs-1/1/1 up {0=ceph1=up:creating} osd: 3 osds: 3 up, 3 in; 162 remapped pgs data: pools: 8 pools, 288 pgs objects: 234 objects, 2.8 KiB usage: 3.0 GiB used, 245 GiB / 248 GiB avail pgs: 43.750% pgs not active 144/702 objects degraded (20.513%) 324/702 objects misplaced (46.154%) 162 active+clean+remapped 123 undersized+peered 3 undersized+degraded+peered
查看
[root@ceph1 ~]# ceph health detail HEALTH_WARN 1 MDSs report slow metadata IOs; 324/702 objects misplaced (46.154%); Reduced data availability: 126 pgs inactive; Degraded data redundancy: 144/702 objects degraded (20.513%), 3 pgs degraded, 126 pgs undersized MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs mdsceph1(mds.0): 9 slow metadata IOs are blocked > 30 secs, oldest blocked for 42075 secs OBJECT_MISPLACED 324/702 objects misplaced (46.154%) PG_AVAILABILITY Reduced data availability: 126 pgs inactive pg 8.28 is stuck inactive for 42240.369934, current state undersized+peered, last acting [0] pg 8.2a is stuck inactive for 45566.934835, current state undersized+peered, last acting [0] pg 8.2d is stuck inactive for 42240.371314, current state undersized+peered, last acting [0] pg 8.2f is stuck inactive for 45566.913284, current state undersized+peered, last acting [0] pg 8.32 is stuck inactive for 42240.354304, current state undersized+peered, last acting [0] .... pg 8.28 is stuck undersized for 42065.616897, current state undersized+peered, last acting [0] pg 8.2a is stuck undersized for 42065.613246, current state undersized+peered, last acting [0] pg 8.2d is stuck undersized for 42065.951760, current state undersized+peered, last acting [0] pg 8.2f is stuck undersized for 42065.610464, current state undersized+peered, last acting [0] pg 8.32 is stuck undersized for 42065.959081, current state undersized+peered, last acting [0] ....
可見在數據修復中, 出現了inactive和undersized的值, 則是不正常的現象
解決方法:
①處理inactive的pg:
重啓一下osd服務便可
[root@ceph1 ~]# systemctl restart ceph-osd.target
繼續查看集羣狀態發現,inactive值的pg已經恢復正常,此時還剩undersized的pg。
[root@ceph1 ~]# ceph -s cluster: id: 13430f9a-ce0d-4d17-a215-272890f47f28 health: HEALTH_WARN 1 filesystem is degraded 241/723 objects misplaced (33.333%) Degraded data redundancy: 59 pgs undersized services: mon: 3 daemons, quorum ceph2,ceph1,ceph3 mgr: ceph1(active), standbys: ceph2, ceph3 mds: cephfs-1/1/1 up {0=ceph1=up:rejoin} osd: 3 osds: 3 up, 3 in; 229 remapped pgs rgw: 1 daemon active data: pools: 8 pools, 288 pgs objects: 241 objects, 3.4 KiB usage: 3.0 GiB used, 245 GiB / 248 GiB avail pgs: 241/723 objects misplaced (33.333%) 224 active+clean+remapped 59 active+undersized 5 active+clean io: client: 1.2 KiB/s rd, 1 op/s rd, 0 op/s wr
②處理undersized的pg:
學會出問題先查看健康狀態細節,仔細分析發現雖然設定的備份數量是3,可是PG 12.x卻只有兩個拷貝,分別存放在OSD 0~2的某兩個上。
[root@ceph1 ~]# ceph health detail HEALTH_WARN 241/723 objects misplaced (33.333%); Degraded data redundancy: 59 pgs undersized OBJECT_MISPLACED 241/723 objects misplaced (33.333%) PG_DEGRADED Degraded data redundancy: 59 pgs undersized pg 12.8 is stuck undersized for 1910.001993, current state active+undersized, last acting [2,0] pg 12.9 is stuck undersized for 1909.989334, current state active+undersized, last acting [2,0] pg 12.a is stuck undersized for 1909.995807, current state active+undersized, last acting [0,2] pg 12.b is stuck undersized for 1910.009596, current state active+undersized, last acting [1,0] pg 12.c is stuck undersized for 1910.010185, current state active+undersized, last acting [0,2] pg 12.d is stuck undersized for 1910.001526, current state active+undersized, last acting [1,0] pg 12.e is stuck undersized for 1909.984982, current state active+undersized, last acting [2,0] pg 12.f is stuck undersized for 1910.010640, current state active+undersized, last acting [2,0]
進一步查看集羣osd狀態樹,發現ceph2和cepn3宕機再恢復後,osd.1 和osd.2進程已不在ceph2和cepn3上。
[root@ceph1 ~]# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0.24239 root default -9 0.16159 host centos7evcloud 1 hdd 0.08080 osd.1 up 1.00000 1.00000 2 hdd 0.08080 osd.2 up 1.00000 1.00000 -3 0.08080 host ceph1 0 hdd 0.08080 osd.0 up 1.00000 1.00000 -5 0 host ceph2 -7 0 host ceph3
分別查看osd.1 和osd.2服務狀態。
解決方法:
分別進入到ceph2和ceph3節點中重啓osd.1 和osd.2服務,將這兩個服務從新映射到ceph2和ceph3節點中。
[root@ceph1 ~]# ssh ceph2 [root@ceph2 ~]# systemctl restart ceph-osd@1.service [root@ceph2 ~]# ssh ceph3 [root@ceph3 ~]# systemctl restart ceph-osd@2.service
最後查看集羣osd狀態樹發現這兩個服務從新映射到ceph2和ceph3節點中。
[root@ceph3 ~]# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0.24239 root default -9 0 host centos7evcloud -3 0.08080 host ceph1 0 hdd 0.08080 osd.0 up 1.00000 1.00000 -5 0.08080 host ceph2 1 hdd 0.08080 osd.1 up 1.00000 1.00000 -7 0.08080 host ceph3 2 hdd 0.08080 osd.2 up 1.00000 1.00000
集羣狀態也顯示了久違的HEALTH_OK。
[root@ceph3 ~]# ceph -s cluster: id: 13430f9a-ce0d-4d17-a215-272890f47f28 health: HEALTH_OK services: mon: 3 daemons, quorum ceph2,ceph1,ceph3 mgr: ceph1(active), standbys: ceph2, ceph3 mds: cephfs-1/1/1 up {0=ceph1=up:active} osd: 3 osds: 3 up, 3 in rgw: 1 daemon active data: pools: 8 pools, 288 pgs objects: 241 objects, 3.6 KiB usage: 3.1 GiB used, 245 GiB / 248 GiB avail pgs: 288 active+clean
掛載命令以下:
mount -t ceph 10.0.86.246:6789,10.0.86.221:6789,10.0.86.253:6789:/ /mnt/mycephfs/ -o name=admin,secret=AQBAI/JbROMoMRAAbgRshBRLLq953AVowLgJPw==
卸載CephFS後再掛載時報錯:mount error(2): No such file or directory
說明:首先檢查/mnt/mycephfs/目錄是否存在並可訪問,個人是存在的但依然報錯No such file or directory。可是我重啓了一下osd服務意外好了,能夠正常掛載CephFS。
[root@ceph1 ~]# systemctl restart ceph-osd.target [root@ceph1 ~]# mount -t ceph 10.0.86.246:6789,10.0.86.221:6789,10.0.86.253:6789:/ /mnt/mycephfs/ -o name=admin,secret=AQBAI/JbROMoMRAAbgRshBRLLq953AVowLgJPw==
可見掛載成功~!
[root@ceph1 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/vda2 48G 7.5G 41G 16% / devtmpfs 1.9G 0 1.9G 0% /dev tmpfs 2.0G 8.0K 2.0G 1% /dev/shm tmpfs 2.0G 17M 2.0G 1% /run tmpfs 2.0G 0 2.0G 0% /sys/fs/cgroup tmpfs 2.0G 24K 2.0G 1% /var/lib/ceph/osd/ceph-0 tmpfs 396M 0 396M 0% /run/user/0 10.0.86.246:6789,10.0.86.221:6789,10.0.86.253:6789:/ 249G 3.1G 246G 2% /mnt/mycephfs
參考連接