場景:ceph-4的osd21磁盤故障,更換新的磁盤。ceph-4上osd爲21-26node
準備操做
下降osd優先級
在大部分故障場景, 咱們須要關機操做, 爲了讓用戶無感知, 咱們須要提早下降待操做的節點的優先級ceph -v
app
首先對比下版本號. 第二版OpenStack的ceph版本爲10.x. 咱們啓用了primary-affinity支持,
用戶的io請求會先轉給primary pg處理. 而後寫入其餘replica(副本).
先找出host ceph-4對應的osd.
而後把這些osd的primary-affinity設爲0, 意思就是上面的pg除非其餘副本掛了, 不然不該該成爲主pg.for osd in {21..26}; do ceph osd primary-affinity "$osd" 0; done
ide
使用ceph osd tree能夠看到對應的節點設置ui
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -4 21.81499 host ceph-4 21 3.63599 osd.21 up 0.79999 0 22 3.63599 osd.22 up 0.79999 0 23 3.63599 osd.23 up 0.79999 0 24 3.63599 osd.24 up 0.99989 0 25 3.63599 osd.25 up 0.79999 0 26 3.63599 osd.26 up 0.79999 0
禁止節點踢出ceph osd set noout
code
默認狀況下, osd長時間無響應則會被自動踢出集羣, 從而觸發數據遷移. 關機更換磁盤操做時間較長, 爲了不數據無心義地來回遷移, 咱們須要臨時禁止集羣自動踢osd,使用ceph -s檢查是否配置完成。
能夠看到集羣狀態變爲WARN, 額外提示說noout flag被設置了, 並且flags這樣多了一項server
cluster 936a5233-9441-49df-95c1-01de82a192f4 health HEALTH_WARN noout flag(s) set election epoch 382, quorum 0,1,2,3,4,5 ceph-1,ceph-2,ceph-3,ceph-4,ceph-5,ceph-6 fsmap e85: 1/1/1 up {0=ceph-2=up:active} osdmap e62563: 111 osds: 109 up, 109 in flags noout,sortbitwise,require_jewel_osds
中止服務前的檢查,檢查pg是否切換完成進程
ceph pg ls | grep "\[2[1-6]," 39.5 1 0 0 0 0 12 1 1 active+clean 2019-01-17 06:49:18.749517 16598'1 62554:2290 [22,44,76] 22 [22,44,76] 22 16598'1 2019-01-17 06:49:18.749416 16598'1 2019-01-11 15:21:07.641442
能夠看到, 這裏還有個pg39.5是優先使用osd.22的. 那麼繼續等啊等,等待的目的是爲了讓用戶無感, 若是狀況緊急, 能夠發佈通知告知狀況, 而後跳過此步。
檢查已經 關機/中止服務 的節點數量
一個存儲3份的集羣, 能夠容忍任意兩個主機故障.
因此你須要確保已經關機的節點數量不要超出限制. 以避免引起更大的故障rem
關機更換磁盤
重建osd,我這裏journal分區是用的一塊ssd磁盤,其中/dev/nvme0n1p1是用來給osd21使用的。
使用ceph-deploy重建
要先刪除指定的osd id,我這裏是osd.21
1,中止osd進程,(這一步是中止osd的進程,讓其餘的osd知道這個節點不提供服務了)systemctl stop ceph-osd@21.service
it
2,out掉osd,(這個一步是告訴mon,這個節點已經不能服務了,須要在其餘的osd上進行數據的恢復了)ceph osd out osd.21
io
3,從crush裏面刪除osd,(從crush中刪除是告訴集羣這個點回不來了,徹底從集羣的分佈當中剔除掉,讓集羣的crush進行一次從新計算,以前節點還佔着這個crush weight,會影響到當前主機的host crush weight)ceph osd crush remove osd.21
4,刪除節點ceph osd rm osd.21
5,刪除節點認證(不刪除編號會佔住,由於咱們後面要新建一個osd.21 id的節點)ceph auth del osd.21
而後就是正常的osd建立過程,(新盤是/dev/sdh,對應的journal使用/dev/nvme0n1p1)
開始如下操做前須要等集羣狀態爲 health: HEALTH_OK
建議將osd優先級別逐步調整,好比先給0.三、0.五、0.8這樣
方法一:
1,擦除磁盤
用下列命令擦淨(刪除分區表)磁盤,以用於 Ceph :
ceph-deploy disk zap {osd-server-name}:{disk-name} ceph-deploy disk zap ceph-4:sdh
2,準備osd
ceph-deploy osd prepare {node-name}:{data-disk}[:{journal-disk}] ceph-deploy osd prepare ceph-4:sdh:/dev/nvme0n1p1
3,激活osd
ceph-deploy osd activate {node-name}:{data-disk-partition}[:{journal-disk-partition}] ceph-deploy osd activate ceph-4:/dev/sdh1:/dev/nvme0n1p1
最後觀察集羣狀態真正數據遷移
[root@ceph-4 ~]# ceph -s cluster 936a5233-9441-49df-95c1-01de82a192f4 health HEALTH_WARN 189 pgs backfill_wait 8 pgs backfilling 35 pgs degraded 1 pgs recovering 34 pgs recovery_wait 35 pgs stuck degraded 232 pgs stuck unclean recovery 33559/38322730 objects degraded (0.088%) recovery 1127457/38322730 objects misplaced (2.942%) monmap e5: 6 mons at {ceph-1=100.100.200.201:6789/0,ceph-2=100.100.200.202:6789/0,ceph-3=100.100.200.203:6789/0,ceph-4=100.100.200.204:6789/0,ceph-5=100.100.200.205:6789/0,ceph-6=100.100.200.206:6789/0} election epoch 394, quorum 0,1,2,3,4,5 ceph-1,ceph-2,ceph-3,ceph-4,ceph-5,ceph-6 fsmap e87: 1/1/1 up {0=ceph-2=up:active} osdmap e64105: 111 osds: 109 up, 109 in; 197 remapped pgs flags sortbitwise,require_jewel_osds pgmap v79953954: 5064 pgs, 24 pools, 87760 GB data, 12364 kobjects 257 TB used, 149 TB / 407 TB avail 33559/38322730 objects degraded (0.088%) 1127457/38322730 objects misplaced (2.942%) 4828 active+clean 189 active+remapped+wait_backfill 34 active+recovery_wait+degraded 8 active+remapped+backfilling 4 active+clean+scrubbing+deep 1 active+recovering+degraded recovery io 597 MB/s, 102 objects/s client io 1457 kB/s rd, 30837 kB/s wr, 271 op/s rd, 846 op/s wr
查看pg狀態,osd.21已經正常加入使用
[root@ceph-4 ~]# ceph pg ls | grep "\[21," 13.2c 4837 0 0 9674 0 39526913024 3030 3030 active+remapped+wait_backfill 2019-04-18 10:28:26.804788 64115'27875808 64115:22041197 [21,109,54] 21 [109,54,26] 109 63497'27874419 2019-04-17 11:38:23.070192 63497'27874419 2019-04-17 11:38:23.070192 13.3c4 4769 0 0 4769 0 38960525312 3053 3053 active+remapped+wait_backfill 2019-04-18 10:28:26.478877 64115'30669377 64115:23715977 [21,103,28] 21 [22,103,28] 22 64048'30669336 2019-04-18 09:30:54.731664 63494'30667018 2019-04-17 07:01:35.747738 13.732 4852 0 0 4852 0 39605818368 3033 3033 active+remapped+wait_backfill 2019-04-18 10:28:26.625253 64115'35861577 64115:24577872 [21,109,66] 21 [22,109,66] 22 63494'35851988 2019-04-17 00:23:06.041163 63335'35775155 2019-04-12 08:20:06.199557 14.28 631 0 0 0 0 1245708288 2766 2766 active+clean 2019-04-18 10:37:27.551922 63012'2766 64115:1344 [21,88,59] 21 [21,88,59] 21 63012'2766 2019-04-18 00:35:27.505205 63012'2766 2019-04-18 00:35:27.505205 14.cd 638 0 0 0 0 1392508928 2878 2878 active+clean 2019-04-18 10:34:59.825683 63012'2878 64115:2004 [21,103,82] 21 [21,103,82] 21 63012'2878 2019-04-17 15:29:04.106329 63012'2878 2019-04-14 23:32:15.401675 14.144 642 0 0 642 0 1296039936 3065 3065 active+remapped+wait_backfill 2019-04-18 10:28:26.475525 63012'19235 64115:1809456 [21,108,39] 21 [39,108,85] 39 63012'19235 2019-04-17 11:38:26.926001 63012'19235 2019-04-14 22:23:28.753992 15.1d4 1718 0 0 1718 0 7180375552 3079 3079 active+remapped+wait_backfill 2019-04-18 10:28:27.009492 64115'55809177 64115:56046351 [21,101,30] 21 [23,101,30] 23 64047'55805700 2019-04-18 02:22:58.405438 63329'55704522 2019-04-11 08:45:36.280574 15.255 1636 0 0 1636 0 6831808512 3329 3329 active+remapped+wait_backfill 2019-04-18 10:28:27.077705 64115'50269798 64115:51399985 [21,93,78] 21 [22,93,78] 22 63494'50261333 2019-04-17 08:16:31.952291 63332'50165547 2019-04-11 15:24:03.756344 18.65 0 0 0 0 0 0 0 0 active+clean 2019-04-18 10:28:25.128442 0'0 64106:8 [21,74,60] 21 [21,74,60] 21 0'0 2019-04-17 05:13:16.502404 0'0 2019-04-11 09:45:49.746497 18.8b 0 0 0 0 0 0 0 0 active+clean 2019-04-18 10:28:27.414351 0'0 64106:8 [21,69,27] 21 [21,69,27] 21 0'0 2019-04-18 09:47:32.328522 0'0 2019-04-15 12:37:33.118690 20.b 0 0 0 0 0 0 16 16 active+clean 2019-04-18 10:28:27.417344 55957'16 64106:8 [21,84,54] 21 [21,84,54] 21 55957'16 2019-04-17 21:59:53.986712 55957'16 2019-04-13 21:31:20.267855 21.1 0 0 0 0 0 0 0 0 active+clean 2019-04-18 10:28:27.414667 0'0 64106:8 [21,56,85] 21 [21,56,85] 21 0'0 2019-04-17 15:32:15.034621 0'0 2019-04-17 15:32:15.034621 38.49 16 0 0 0 0 24102240 127 127 active+clean 2019-04-18 10:44:55.727955 16602'127 64114:43 [21,50,81] 21 [21,50,81] 21 16602'127 2019-04-17 09:35:03.687063 16602'127 2019-04-14 17:57:45.779953 38.b2 15 0 0 0 0 23948729 107 107 active+clean 2019-04-18 10:33:32.139150 16602'107 64110:41 [21,86,69] 21 [21,86,69] 21 16602'107 2019-04-18 05:53:38.968505 16602'107 2019-04-16 23:42:37.321383 38.c2 14 0 0 0 0 33846787 118 118 active+clean 2019-04-18 10:28:31.234530 16602'118 64106:39 [21,72,56] 21 [21,72,56] 21 16602'118 2019-04-17 09:48:27.765306 16602'118 2019-04-14 00:56:36.180240 38.ce 9 0 0 0 0 16840679 65 65 active+clean 2019-04-18 10:33:30.229084 16602'65 64110:29 [21,75,30] 21 [21,75,30] 21 16602'65 2019-04-18 03:29:52.768179 16602'65 2019-04-18 03:29:52.768179
在重建osd完成後, 便可進入下一步, 恢復osd flag
須要把干預期間的其餘操做所有恢復ceph osd unset noout
等待集羣恢復 等待集羣自動recovery恢復到 HEALHTH_OK 狀態. 期間若是出現 HEALTH_ERROR 狀態, 能夠及時跟進, 搜索Google
方法二:
ceph-deploy osd create --data /dev/sdk wuhan31-ceph03
主機名爲wuhan31-ceph03 磁盤爲/dev/sdk