CEPH塊存儲

時間 2020-01-25

標籤 ceph 存儲简体版

原文原文鏈接

1. 安裝 Ceph 塊存儲客戶端

Ceph塊設備，之前稱爲 RADOS 塊設備，爲客戶機提供可靠的、分佈式的和高性能的塊存儲磁盤。 node

RADOS 塊設備利用 librbd 庫並以順序的形式在 Ceph 集羣中的多個 osd 上存儲數據塊。RBD是由 Ceph 的 RADOS 層支持的，所以每一個塊設備都分佈在多個 Ceph 節點上，提供了高性能和優異的可靠性。RBD 有 linux 內核的本地支持。linux

任何普通的 linux 主機均可以充當 Ceph 的客戶機。客戶端經過網絡與 Ceph 存儲集羣交互以存儲或檢索用戶數據。Ceph RBD 支持已經添加到 Linux主線內核中，從 2.6.34和之後的版本開始。git

192.168.3.158爲客戶端作以下操做
github

1.1 修改主機名

[root@localhost ~]# cat /etc/hosts
……
192.168.3.165 ceph165
192.168.3.166 ceph166
192.168.3.167 ceph167
192.168.3.158 ceph158

[root@localhost ~]# hostnamectl set-hostname ceph158

1.2 修改 ceph 源文件

# wget -O /etc/yum.repos.d/ceph.repo https://raw.githubusercontent.com/aishangwei/ceph-demo/master/ceph-deploy/ceph.repo

1.3 建立目錄

# mkdir -p /etc/ceph

1.4 安裝 ceph

# yum -y install epel-release
# yum -y install ceph
# cat /etc/ceph/ceph.client.rbd.keyring

# 建立 ceph 塊客戶端用戶名和認證密鑰vim

[ceph@ceph165 my-cluster]$ ceph auth get-or-create client.rbd mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=rbd' | tee ./ceph.client.rbd.keyring
[client.rbd]
   key = AQBLBwRepKVJABAALyRx67z6efeI4xogPqHkyw==
  注： client.rbd 爲客戶端名
     mon 以後的全爲受權配置

拷貝配置文件及密鑰到客戶機centos

[ceph@ceph165 my-cluster]$ scp ceph.client.rbd.keyring root@192.168.3.158:/etc/ceph
[ceph@ceph165 my-cluster]$ scp ceph.conf root@192.168.3.158:/etc/ceph

# 檢查是否符合塊設備環境要求bash

uname -r
modprobe rbd

#安裝 ceph 客戶端服務器

# wget -O /etc/yum.repos.d/ceph.repo https://raw.githubusercontent.com/aishangwei/ceph-demo/master/ceph-deploy/ceph.repo

查看密鑰文件網絡

[root@ceph158 ~]# cat /etc/ceph/ceph.client.rbd.keyring 
[client.rdb]
   key = AQBLBwRepKVJABAALyRx67z6efeI4xogPqHkyw==
[root@ceph158 ~]# ceph -s --name client.rbd

2. 客戶端建立塊設備及映射

在服務器192.168.3.165執行以下命令app

(1) 建立塊設備

默認建立塊設備，會直接建立在 rbd 池中，但使用 deploy 安裝後，該 rbd 池並無建立。

# 建立池和塊

$ ceph osd lspools    # 查看集羣存儲池
$ ceph osd pool create rbd 50
   pool 'rbd' created   # 50 爲 place group 數量，因爲咱們後續測試，也須要更多的 pg，因此這裏設置爲50

肯定pg_num取值是強制性的，由於不能自動計算。下面是幾個經常使用的值：

少於5個OSD時可把pg_num設置爲128

OSD數量在5到10個時，可把pg_num設置爲512

OSD數量在10到50個時，可把pg_num設置爲4096

OSD數量大於50時，你得理解權衡方法、以及如何本身計算pg_num取值

(2)客戶端建立塊設備

建立一個容量爲 5105M 的 rbd 塊設備

[root@ceph158 ~]# rbd create rbd2 --size 5105 --name client.rbd

192.168.3.158 客戶端查看 rbd2 塊設備

[root@ceph158 ~]# rbd ls --name client.rbd
 rbd2
[root@ceph158 ~]# rbd ls -p rbd --name client.rbd
 rbd2
[root@ceph158 ~]# rbd list --name client.rbd
 rbd2

查看 rbd2塊設備信息

[root@ceph158 ~]# rbd --image rbd2 info --name client.rbd

# 映射到客戶端，應該會報錯

[root@ceph158 ~]# rbd map --image rbd2 --name client.rbd

layering：分層支持

- exclusive-lock：排它鎖定支持對

- object-map：對象映射支持（須要排它鎖定（exclusive-lock））

- deep-flatten：快照平支持（snapshot flatten support）

- fast-diff：在client-node1上使用krbd（內核rbd）客戶機進行快速diff計算（須要對象映射），咱們將沒法在CentOS內核3.10上映射塊設備映像，由於該內核不支持對象映射（object-map）、深平（deep-flatten）和快速dif（fast-dif）（在內核4.9中引入了支持）。爲了解決這個問題，咱們將禁用不支持的特性，有幾個選項能夠作到這一點：

1）動態禁用

rbd feature disable rbdl exclusive-lock object-map deep-flatten fast-diff--name client.rbd

2）建立RBD鏡像時，只啓用分層特性。

rbd create rbd2 --size 10240 --image-feature layering--name client.rbd

3）ceph配置文件中禁用

rbd_default_features=1

# 咱們這裏動態禁用

[root@ceph158 ~]# rbd feature disable rbd2 exclusive-lock object-map fast-diff deep-flatten --name client.rbd

對 rbd2 進行映射

[root@ceph158 ~]# rbd map --image rbd2 --name client.rbd

查看本機已經映射的 rbd 鏡像

[root@ceph158 ~]# rbd showmapped --name client.rbd

查看磁盤 rbd0 大小

格式化 rbd0

建立掛載目錄並進行掛載

[root@ceph158 ~]# mkdir /mnt/ceph-disk1
[root@ceph158 ~]# mount /dev/rbd0 /mnt/ceph-disk1/

# 寫入數據測試

[root@ceph158 ~]# dd if=/dev/zero of=/mnt/ceph-disk1/file1 count=100 bs=1M

# 作成服務，開機自動掛載

[root@ceph103-]# wget -O /usr/local/bin/rbd-mount https://raw.githubusercontent.com/aishangwei/ceph-demo/master/client/rbd-mount

# vim /usr/local/bin/rbd-mount

[root@ceph158 ~]# chmod +x /usr/local/bin/rbd-mount
[root@ceph158~ ]# wget -O /etc/systemd/system/rbd-mount.service https://raw.githubusercontent.com/aishangwei/ceph-demo/master/client/rbd-mount.service
[root@ceph158 ~]# systemctl daemon-reload
[root@ceph158 ~]# systemctl enable rbd-mount.service
Created symlink from /etc/systemd/system/multi-user.target.wants/rbd-mount.service to /etc/systemd/system/rbd-mount.service.

卸載手動掛載的目錄，進行服務自動掛載測試

[root@ceph158 ~]# umount /mnt/ceph-disk1/
[root@ceph158 ~]# systemctl status rbd-mount

Ceph: RBD 在線擴容容量

Ceph管理端的操做

查詢 pool 總容量及已經分配容量

[root@ceph165 ~]# ceph df

查看已經存在的 pool

[root@ceph165 ~]# ceph osd lspools

查看已經有的 rbd

開始對 rbd2 進行動態擴容

[root@ceph165 ~]# rbd resize rbd/rbd2 --size 7168

Ceph客戶端的操做

[root@ceph158 ~]# rbd showmapped

[root@ceph158 ~]# df -h

[root@ceph158 ~]# xfs_growfs -d /mnt/ceph-disk1

3. Ceph集羣報錯解決方案

3.1 節點間配置文件內容不一致錯誤

輸入ceph-deploy mon create-initial命令獲取密鑰key，會在當前目錄(如個人是~/etc/ceph/)下生成幾個key，但報錯以下。意思是：就是配置失敗的兩個結點的配置文件的內容於當前節點不一致，提示使用--overwrite-conf參數去覆蓋不一致的配置文件。

# ceph-deploy mon create-initial
...
[ceph2][DEBUG ] remote hostname: ceph2
[ceph2][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph_deploy.mon][ERROR ] RuntimeError: config file /etc/ceph/ceph.conf exists with different content; use --overwrite-conf to overwrite
[ceph_deploy][ERROR ] GenericError: Failed to create 2 monitors
...

輸入命令以下(此處我共配置了三個結點ceph1~3)：

# ceph-deploy --overwrite-conf mon create ceph{3,1,2}
...
[ceph2][DEBUG ] remote hostname: ceph2
[ceph2][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph2][DEBUG ] create the mon path if it does not exist
[ceph2][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-ceph2/done
...

以後配置成功，可繼續進行初始化磁盤操做。

3.2 too few PGs per OSD (21 < min 30)警告

[root@ceph1 ceph]# ceph -s
cluster:
id: 8e2248e4-3bb0-4b62-ba93-f597b1a3bd40
health: HEALTH_WARN
too few PGs per OSD (21 < min 30)
services:
mon: 3 daemons, quorum ceph2,ceph1,ceph3
……

從上面集羣狀態信息可查，每一個osd上的pg數量=21<最小的數目30個。pgs爲32，由於我以前設置的是2副本的配置，因此當有3個osd的時候，每一個osd上均分了32÷3*2=21個pgs,也就是出現瞭如上的錯誤小於最小配置30個。

集羣這種狀態若是進行數據的存儲和操做，會發現集羣卡死，沒法響應io，同時會致使大面積的osd down。

解決辦法：

增長pg數

由於個人一個pool有8個pgs，因此我須要增長兩個pool才能知足osd上的pg數量=48÷3*2=32>最小的數目30。

[root@ceph1 ceph]# ceph osd pool create mytest 8
pool 'mytest' created
[root@ceph1 ceph]# ceph osd pool create mytest1 8
pool 'mytest1' created
[root@ceph1 ceph]# ceph -s
cluster:
id: 8e2248e4-3bb0-4b62-ba93-f597b1a3bd40
health: HEALTH_OK
services:
mon: 3 daemons, quorum ceph2,ceph1,ceph3
mgr: ceph2(active), standbys: ceph1, ceph3
osd: 3 osds: 3 up, 3 in
rgw: 1 daemon active
data:
pools: 6 pools, 48 pgs
objects: 219 objects, 1.1 KiB
usage: 3.0 GiB used, 245 GiB / 248 GiB avail
pgs: 48 active+clean

集羣健康狀態顯示正常。

3.3 集羣狀態是 HEALTH_WARN application not enabled on 1 pool(s)

若是此時，查看集羣狀態是HEALTH_WARN application not enabled on 1 pool(s)：

[root@ceph1 ceph]# ceph -s
cluster:
id: 13430f9a-ce0d-4d17-a215-272890f47f28
health: HEALTH_WARN
application not enabled on 1 pool(s)
[root@ceph1 ceph]# ceph health detail
HEALTH_WARN application not enabled on 1 pool(s)
POOL_APP_NOT_ENABLED application not enabled on 1 pool(s)
application not enabled on pool 'mytest'
use 'ceph osd pool application enable <pool-name> <app-name>', where <app-name> is 'cephfs', 'rbd', 'rgw', or freeform for custom applications.

運行ceph health detail命令發現是新加入的存儲池mytest沒有被應用程序標記，由於以前添加的是RGW實例，因此此處依提示將mytest被rgw標記便可：

[root@ceph1 ceph]# ceph osd pool application enable mytest rgw
enabled application 'rgw' on pool 'mytest'

再次查看集羣狀態發現恢復正常

[root@ceph1 ceph]# ceph health
HEALTH_OK

3.4 刪除存儲池報錯

如下以刪除mytest存儲池爲例，運行ceph osd pool rm mytest命令報錯，顯示須要在原命令的pool名字後再寫一遍該pool名字並最後加上--yes-i-really-really-mean-it參數

[root@ceph1 ceph]# ceph osd pool rm mytest
Error EPERM: WARNING: this will *PERMANENTLY DESTROY* all data stored in pool mytest. If you are *ABSOLUTELY CERTAIN* that is what you want, pass the pool name *twice*, followed by --yes-i-really-really-mean-it.

按照提示要求複寫pool名字後加上提示參數以下，繼續報錯：

[root@ceph1 ceph]# ceph osd pool rm mytest mytest --yes-i-really-really-mean-it
Error EPERM: pool deletion is disabled; you must first set the 
mon_allow_pool_delete config option to true before you can destroy a pool

錯誤信息顯示，刪除存儲池操做被禁止，應該在刪除前如今ceph.conf配置文件中增長mon_allow_pool_delete選項並設置爲true。因此分別登陸到每個節點並修改每個節點的配置文件。操做以下：

[root@ceph1 ceph]# vi ceph.conf 
[root@ceph1 ceph]# systemctl restart ceph-mon.target

在ceph.conf配置文件底部加入以下參數並設置爲true，保存退出後使用systemctl restart ceph-mon.target命令重啓服務。

[mon]

mon allow pool delete = true

其他節點操做同理。

[root@ceph2 ceph]# vi ceph.conf 
[root@ceph2 ceph]# systemctl restart ceph-mon.target
[root@ceph3 ceph]# vi ceph.conf 
[root@ceph3 ceph]# systemctl restart ceph-mon.target

再次刪除，即成功刪除mytest存儲池。

[root@ceph1 ceph]# ceph osd pool rm mytest mytest --yes-i-really-really-mean-it
pool 'mytest' removed

3.5 集羣節點宕機後恢復節點排錯

筆者將ceph集羣中的三個節點分別關機並重啓後，查看ceph集羣狀態以下：

[root@ceph1 ~]# ceph -s
cluster:
id: 13430f9a-ce0d-4d17-a215-272890f47f28
health: HEALTH_WARN
1 MDSs report slow metadata IOs
324/702 objects misplaced (46.154%)
Reduced data availability: 126 pgs inactive
Degraded data redundancy: 144/702 objects degraded (20.513%), 3 pgs degraded, 126 pgs undersized
services:
mon: 3 daemons, quorum ceph2,ceph1,ceph3
mgr: ceph1(active), standbys: ceph2, ceph3
mds: cephfs-1/1/1 up {0=ceph1=up:creating}
osd: 3 osds: 3 up, 3 in; 162 remapped pgs
data:
pools: 8 pools, 288 pgs
objects: 234 objects, 2.8 KiB
usage: 3.0 GiB used, 245 GiB / 248 GiB avail
pgs: 43.750% pgs not active
144/702 objects degraded (20.513%)
324/702 objects misplaced (46.154%)
162 active+clean+remapped
123 undersized+peered
3 undersized+degraded+peered

查看

[root@ceph1 ~]# ceph health detail
HEALTH_WARN 1 MDSs report slow metadata IOs; 324/702 objects misplaced (46.154%); Reduced data availability: 126 pgs inactive; Degraded data redundancy: 144/702 objects degraded (20.513%), 3 pgs degraded, 126 pgs undersized
MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs
mdsceph1(mds.0): 9 slow metadata IOs are blocked > 30 secs, oldest blocked for 42075 secs
OBJECT_MISPLACED 324/702 objects misplaced (46.154%)
PG_AVAILABILITY Reduced data availability: 126 pgs inactive
pg 8.28 is stuck inactive for 42240.369934, current state undersized+peered, last acting [0]
pg 8.2a is stuck inactive for 45566.934835, current state undersized+peered, last acting [0]
pg 8.2d is stuck inactive for 42240.371314, current state undersized+peered, last acting [0]
pg 8.2f is stuck inactive for 45566.913284, current state undersized+peered, last acting [0]
pg 8.32 is stuck inactive for 42240.354304, current state undersized+peered, last acting [0]
....
pg 8.28 is stuck undersized for 42065.616897, current state undersized+peered, last acting [0]
pg 8.2a is stuck undersized for 42065.613246, current state undersized+peered, last acting [0]
pg 8.2d is stuck undersized for 42065.951760, current state undersized+peered, last acting [0]
pg 8.2f is stuck undersized for 42065.610464, current state undersized+peered, last acting [0]
pg 8.32 is stuck undersized for 42065.959081, current state undersized+peered, last acting [0]
....

可見在數據修復中, 出現了inactive和undersized的值, 則是不正常的現象

解決方法：

①處理inactive的pg:

重啓一下osd服務便可

[root@ceph1 ~]# systemctl restart ceph-osd.target

繼續查看集羣狀態發現，inactive值的pg已經恢復正常，此時還剩undersized的pg。

[root@ceph1 ~]# ceph -s
cluster:
id: 13430f9a-ce0d-4d17-a215-272890f47f28
health: HEALTH_WARN
1 filesystem is degraded
241/723 objects misplaced (33.333%)
Degraded data redundancy: 59 pgs undersized
services:
mon: 3 daemons, quorum ceph2,ceph1,ceph3
mgr: ceph1(active), standbys: ceph2, ceph3
mds: cephfs-1/1/1 up {0=ceph1=up:rejoin}
osd: 3 osds: 3 up, 3 in; 229 remapped pgs
rgw: 1 daemon active
data:
pools: 8 pools, 288 pgs
objects: 241 objects, 3.4 KiB
usage: 3.0 GiB used, 245 GiB / 248 GiB avail
pgs: 241/723 objects misplaced (33.333%)
224 active+clean+remapped
59 active+undersized
5 active+clean
io:
client: 1.2 KiB/s rd, 1 op/s rd, 0 op/s wr

②處理undersized的pg:

學會出問題先查看健康狀態細節，仔細分析發現雖然設定的備份數量是3，可是PG 12.x卻只有兩個拷貝，分別存放在OSD 0~2的某兩個上。

[root@ceph1 ~]# ceph health detail 
HEALTH_WARN 241/723 objects misplaced (33.333%); Degraded data redundancy: 59 pgs undersized
OBJECT_MISPLACED 241/723 objects misplaced (33.333%)
PG_DEGRADED Degraded data redundancy: 59 pgs undersized
pg 12.8 is stuck undersized for 1910.001993, current state active+undersized, last acting [2,0]
pg 12.9 is stuck undersized for 1909.989334, current state active+undersized, last acting [2,0]
pg 12.a is stuck undersized for 1909.995807, current state active+undersized, last acting [0,2]
pg 12.b is stuck undersized for 1910.009596, current state active+undersized, last acting [1,0]
pg 12.c is stuck undersized for 1910.010185, current state active+undersized, last acting [0,2]
pg 12.d is stuck undersized for 1910.001526, current state active+undersized, last acting [1,0]
pg 12.e is stuck undersized for 1909.984982, current state active+undersized, last acting [2,0]
pg 12.f is stuck undersized for 1910.010640, current state active+undersized, last acting [2,0]

進一步查看集羣osd狀態樹，發現ceph2和cepn3宕機再恢復後，osd.1 和osd.2進程已不在ceph2和cepn3上。

[root@ceph1 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF 
-1 0.24239 root default
-9 0.16159 host centos7evcloud
1 hdd 0.08080 osd.1 up 1.00000 1.00000 
2 hdd 0.08080 osd.2 up 1.00000 1.00000 
-3 0.08080 host ceph1
0 hdd 0.08080 osd.0 up 1.00000 1.00000 
-5 0 host ceph2
-7 0 host ceph3

分別查看osd.1 和osd.2服務狀態。

解決方法：

分別進入到ceph2和ceph3節點中重啓osd.1 和osd.2服務，將這兩個服務從新映射到ceph2和ceph3節點中。

[root@ceph1 ~]# ssh ceph2
[root@ceph2 ~]# systemctl restart ceph-osd@1.service
[root@ceph2 ~]# ssh ceph3
[root@ceph3 ~]# systemctl restart ceph-osd@2.service

最後查看集羣osd狀態樹發現這兩個服務從新映射到ceph2和ceph3節點中。

[root@ceph3 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF 
-1 0.24239 root default
-9 0 host centos7evcloud
-3 0.08080 host ceph1
0 hdd 0.08080 osd.0 up 1.00000 1.00000 
-5 0.08080 host ceph2
1 hdd 0.08080 osd.1 up 1.00000 1.00000 
-7 0.08080 host ceph3
2 hdd 0.08080 osd.2 up 1.00000 1.00000

集羣狀態也顯示了久違的HEALTH_OK。

[root@ceph3 ~]# ceph -s
cluster:
id: 13430f9a-ce0d-4d17-a215-272890f47f28
health: HEALTH_OK
services:
mon: 3 daemons, quorum ceph2,ceph1,ceph3
mgr: ceph1(active), standbys: ceph2, ceph3
mds: cephfs-1/1/1 up {0=ceph1=up:active}
osd: 3 osds: 3 up, 3 in
rgw: 1 daemon active
data:
pools: 8 pools, 288 pgs
objects: 241 objects, 3.6 KiB
usage: 3.1 GiB used, 245 GiB / 248 GiB avail
pgs: 288 active+clean

3.6 卸載CephFS後再掛載時報錯

掛載命令以下：

mount -t ceph 10.0.86.246:6789,10.0.86.221:6789,10.0.86.253:6789:/ /mnt/mycephfs/ -o name=admin,secret=AQBAI/JbROMoMRAAbgRshBRLLq953AVowLgJPw==

卸載CephFS後再掛載時報錯：mount error(2): No such file or directory

說明：首先檢查/mnt/mycephfs/目錄是否存在並可訪問，個人是存在的但依然報錯No such file or directory。可是我重啓了一下osd服務意外好了，能夠正常掛載CephFS。

[root@ceph1 ~]# systemctl restart ceph-osd.target
[root@ceph1 ~]# mount -t ceph 10.0.86.246:6789,10.0.86.221:6789,10.0.86.253:6789:/ /mnt/mycephfs/ -o name=admin,secret=AQBAI/JbROMoMRAAbgRshBRLLq953AVowLgJPw==

可見掛載成功~！

[root@ceph1 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/vda2 48G 7.5G 41G 16% /
devtmpfs 1.9G 0 1.9G 0% /dev
tmpfs 2.0G 8.0K 2.0G 1% /dev/shm
tmpfs 2.0G 17M 2.0G 1% /run
tmpfs 2.0G 0 2.0G 0% /sys/fs/cgroup
tmpfs 2.0G 24K 2.0G 1% /var/lib/ceph/osd/ceph-0
tmpfs 396M 0 396M 0% /run/user/0
10.0.86.246:6789,10.0.86.221:6789,10.0.86.253:6789:/ 249G 3.1G 246G 2% /mnt/mycephfs

參考連接

https://blog.csdn.net/SL_World/article/details/84584366