上面左邊是個人我的微信,如需進一步溝通,請加微信。 右邊是個人公衆號「Openstack私有云」,若有興趣,請關注。node
在實際生產使用過程當中,不免會有ceph的osd硬盤損壞的時候,本文針對這種場景描述更換故障磁盤的操做,本文的操做環境是經過kolla部署的openstack,3個HA節點上各有3個osd,咱們將osd.6 、osd.7 、osd.8 剔除並更換。docker
一、首先查看和收集相關的信息:centos
查看ceph集羣空間使用: #docker exec -it ceph_mon ceph df GLOBAL: SIZE AVAIL RAW USED %RAW USED 584G 548G 37548M 6.27 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS rbd 0 0 0 1017M 0 .rgw.root 1 1588 0 1017M 4 default.rgw.control 2 0 0 1017M 8 default.rgw.data.root 3 670 0 1017M 2 default.rgw.gc 4 0 0 1017M 32 default.rgw.log 5 0 0 1017M 127 images 6 13286M 92.89 1017M 6674 volumes 7 230 0 1017M 7 backups 8 0 0 1017M 0 vms 9 5519M 84.44 1017M 1495 default.rgw.users.uid 10 209 0 1017M 2 default.rgw.buckets.index 11 0 0 1017M 1 default.rgw.buckets.data 12 0 0 1017M 1 查看OSD的分佈權重: # docker exec -it ceph_mon ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 9.00000 root default -2 3.00000 host 192.168.1.132 0 1.00000 osd.0 up 1.00000 1.00000 3 1.00000 osd.3 up 1.00000 1.00000 6 1.00000 osd.6 up 1.00000 1.00000 -3 3.00000 host 192.168.1.130 1 1.00000 osd.1 up 1.00000 1.00000 5 1.00000 osd.5 up 1.00000 1.00000 8 1.00000 osd.8 up 1.00000 1.00000 -4 3.00000 host 192.168.1.131 2 1.00000 osd.2 up 1.00000 1.00000 4 1.00000 osd.4 up 1.00000 1.00000 7 1.00000 osd.7 up 1.00000 1.00000 查看crushmap: # docker exec -it ceph_mon ceph osd getcrushmap -o /var/log/kolla/ceph/crushmap.bin got crush map from osdmap epoch 247 # docker exec -it ceph_mon crushtool -d /var/log/kolla/ceph/crushmap.bin -o /var/log/kolla/ceph/crushmap # cd /var/lib/docker/volumes/kolla_logs/_data/ceph # more crushmap # begin crush map tunable choose_local_tries 0 tunable choose_local_fallback_tries 0 tunable choose_total_tries 50 tunable chooseleaf_descend_once 1 tunable chooseleaf_vary_r 1 tunable straw_calc_version 1 # devices device 0 osd.0 device 1 osd.1 device 2 osd.2 device 3 osd.3 device 4 osd.4 device 5 osd.5 device 6 osd.6 device 7 osd.7 device 8 osd.8 # types type 0 osd type 1 host type 2 chassis type 3 rack type 4 row type 5 pdu type 6 pod type 7 room type 8 datacenter type 9 region type 10 root # buckets host 192.168.1.132 { id -2# do not change unnecessarily # weight 3.000 alg straw hash 0# rjenkins1 item osd.0 weight 1.000 item osd.3 weight 1.000 item osd.6 weight 1.000 } host 192.168.1.130 { id -3# do not change unnecessarily # weight 3.000 alg straw hash 0# rjenkins1 item osd.1 weight 1.000 item osd.5 weight 1.000 item osd.8 weight 1.000 } host 192.168.1.131 { id -4# do not change unnecessarily # weight 3.000 alg straw hash 0# rjenkins1 item osd.2 weight 1.000 item osd.4 weight 1.000 item osd.7 weight 1.000 } root default { id -1# do not change unnecessarily # weight 9.000 alg straw hash 0# rjenkins1 item 192.168.1.132 weight 3.000 item 192.168.1.130 weight 3.000 item 192.168.1.131 weight 3.000 } # rules rule replicated_ruleset { ruleset 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } rule disks { ruleset 1 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } # end crush map 查看磁盤掛載狀況: [root@control01 ceph]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/centos-root 88G 14G 74G 16% / devtmpfs 9.8G 0 9.8G 0% /dev tmpfs 9.8G 0 9.8G 0% /dev/shm tmpfs 9.8G 20M 9.8G 1% /run tmpfs 9.8G 0 9.8G 0% /sys/fs/cgroup /dev/sdd1 5.0G 4.8G 227M 96% /var/lib/ceph/osd/a007c495-e2e9-4a08-a565-83dfef1df30d /dev/sdb1 95G 4.1G 91G 5% /var/lib/ceph/osd/b7267089-cae5-4e28-b9ff-a37c373c0d34 /dev/sdc1 95G 4.6G 91G 5% /var/lib/ceph/osd/c35eea02-f07d-4557-acdb-7280a571aaf9 檢查健康狀態: # docker exec -it ceph_mon ceph health detail HEALTH_ERR 1 full osd(s); 2 near full osd(s); full flag(s) set osd.8 is full at 95% osd.6 is near full at 90% osd.7 is near full at 85% full flag(s) set # docker exec -it ceph_mon ceph -s cluster 33932e16-1909-4d68-b085-3c01d0432adc health HEALTH_ERR 1 full osd(s) 2 near full osd(s) full flag(s) set monmap e2: 3 mons at {192.168.1.130=192.168.1.130:6789/0,192.168.1.131=192.168.1.131:6789/0,192.168.1.132=192.168.1.132:6789/0} election epoch 60, quorum 0,1,2 192.168.1.130,192.168.1.131,192.168.1.132 osdmap e247: 9 osds: 9 up, 9 in flags nearfull,full,sortbitwise,require_jewel_osds pgmap v179945: 640 pgs, 13 pools, 18806 MB data, 8353 objects 37548 MB used, 548 GB / 584 GB avail 640 active+clean 查看文件系統配置文件fstab: # cat /etc/fstab # # /etc/fstab # Created by anaconda on Mon Jan 22 13:32:42 2018 # # Accessible filesystems, by reference, are maintained under '/dev/disk' # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info # /dev/mapper/centos-root / xfs defaults 0 0 UUID=c2c088f8-a530-4099-a9ef-0fd41508d304 /boot xfs defaults 0 0 UUID=7FA3-61ED /boot/efi vfat defaults,uid=0,gid=0,umask=0077,shortname=winnt 0 0 UUID=b7267089-cae5-4e28-b9ff-a37c373c0d34 /var/lib/ceph/osd/b7267089-cae5-4e28-b9ff-a37c373c0d34 xfs defaults,noatime 0 0 UUID=c35eea02-f07d-4557-acdb-7280a571aaf9 /var/lib/ceph/osd/c35eea02-f07d-4557-acdb-7280a571aaf9 xfs defaults,noatime 0 0 UUID=a007c495-e2e9-4a08-a565-83dfef1df30d /var/lib/ceph/osd/a007c495-e2e9-4a08-a565-83dfef1df30d xfs defaults,noatime 0 0 查看磁盤分區與掛在文件系統對應關係: # lsblk|grep osd ├─sdb1 8:17 0 95G 0 part /var/lib/ceph/osd/b7267089-cae5-4e28-b9ff-a37c373c0d34 ├─sdc1 8:33 0 95G 0 part /var/lib/ceph/osd/c35eea02-f07d-4557-acdb-7280a571aaf9 ├─sdd1 8:49 0 5G 0 part /var/lib/ceph/osd/a007c495-e2e9-4a08-a565-83dfef1df30d
2.執行reweight,讓這個osd上的數據遷移到其餘osd,同步能夠使用ceph -w查看實時數據遷移狀況 :bash
# docker exec -it ceph_mon ceph osd crush reweight osd.8 0.0 //一樣的命令重平衡osd.七、osd.6 權重爲0
3.剔除osd出集羣:服務器
# docker exec -it ceph_mon ceph osd out osd.8 //一樣的命令out掉osd.七、osd.6 # docker exec -it ceph_mon ceph osd crush remove osd.8 //一樣的命令crush remove掉osd.七、osd.6 # docker exec -it ceph_mon ceph auth del osd.8 //一樣的命令auth del掉osd.七、osd.6 # docker stop ceph_osd_8 //在另外2臺服務器,一樣的命令stop掉ceph_osd_七、ceph_osd_6 # docker exec -it ceph_mon ceph osd rm osd.8 //一樣的命令rm掉osd.七、osd.6
4.整個操做過程,能夠使用 # docker exec -it ceph_mon ceph -w 查看實時的ceph集羣數據變化,或者使用# docker exec -it ceph_mon ceph -s 查看整體狀況。微信
# docker exec -it ceph_mon ceph -s cluster 33932e16-1909-4d68-b085-3c01d0432adc health HEALTH_OK monmap e2: 3 mons at {192.168.1.130=192.168.1.130:6789/0,192.168.1.131=192.168.1.131:6789/0,192.168.1.132=192.168.1.132:6789/0} election epoch 60, quorum 0,1,2 192.168.1.130,192.168.1.131,192.168.1.132 osdmap e297: 6 osds: 6 up, 6 in flags sortbitwise,require_jewel_osds pgmap v181612: 640 pgs, 13 pools, 14884 MB data, 7296 objects 30025 MB used, 540 GB / 569 GB avail 640 active+clean
如上所示,說明3個磁盤剔除成功,整個ceph集羣數據pg分佈到osd從新得到平衡。
app
5.替換磁盤後,從新使用kolla部署工具添加osd磁盤,能夠參考個人另外一篇博客「Openstack 之 kolla 部署ceph」中的後半部分,擴容磁盤,簡單描述以下:ide
直接在宿主機上增長硬盤,而後對每一塊硬盤打標籤,而後從新執行deploy便可,步驟以下:
工具
1)..OSD盤打標:ui
1
2
|
parted
/dev/sdd
-s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP 1 -1
|
注意,整塊盤做爲一個OSD只能用標籤KOLLA_CEPH_OSD_BOOTSTRAP ,若是日誌分區單獨使用SSD固態硬盤的分區,則使用不一樣標籤,好比/dev/sdb 標籤爲KOLLA_CEPH_OSD_BOOTSTRAP_SDC ,日誌分區/dev/sdh1 標籤爲:KOLLA_CEPH_OSD_BOOTSTRAP_SDC_J
2).部署前的檢查:
1
|
tools
/kolla_ansible
prechecks -i 3node
|
注意:3node是inventory文件,根據實際環境替換。
檢查報錯6780端口占用,查詢這個端口是ceph_rgw容器佔用,臨時將這個容器關閉: docker stop ceph_rgw
3).部署:
1
|
tools
/kolla_ansible
deploy -i 3node
|
注意:3node是inventory文件,根據實際環境替換。
部署完成後,將上面關閉的容器ceph_rgw啓動:
1
|
docker start ceph_rgw
|
附:執行deploy的時候,同時經過ceph -s 監控ceph的狀況以下:
[root@control02 mariadb]# docker exec -it ceph_mon ceph -s cluster 33932e16-1909-4d68-b085-3c01d0432adc health HEALTH_ERR 2 pgs are stuck inactive for more than 300 seconds 13 pgs degraded 83 pgs peering 2 pgs recovering 11 pgs recovery_wait 2 pgs stuck inactive 75 pgs stuck unclean recovery 424/14592 objects degraded (2.906%) recovery 1826/14592 objects misplaced (12.514%) 3/9 in osds are down monmap e2: 3 mons at {192.168.1.130=192.168.1.130:6789/0,192.168.1.131=192.168.1.131:6789/0,192.168.1.132=192.168.1.132:6789/0} election epoch 66, quorum 0,1,2 192.168.1.130,192.168.1.131,192.168.1.132 osdmap e354: 9 osds: 6 up, 9 in; 382 remapped pgs flags sortbitwise,require_jewel_osds pgmap v340179: 640 pgs, 13 pools, 14885 MB data, 7296 objects 30253 MB used, 540 GB / 569 GB avail 424/14592 objects degraded (2.906%) 1826/14592 objects misplaced (12.514%) 423 active+clean 113 active+remapped 81 remapped+peering 11 active+recovery_wait+degraded 8 active 2 active+recovering+degraded 2 peering recovery io 118 MB/s, 102 objects/s [root@control02 mariadb]# docker exec -it ceph_mon ceph -s cluster 33932e16-1909-4d68-b085-3c01d0432adc health HEALTH_WARN 11 pgs backfill_wait 85 pgs degraded 4 pgs recovering 81 pgs recovery_wait 31 pgs stuck unclean recovery 4087/14835 objects degraded (27.550%) recovery 486/14835 objects misplaced (3.276%) monmap e2: 3 mons at {192.168.1.130=192.168.1.130:6789/0,192.168.1.131=192.168.1.131:6789/0,192.168.1.132=192.168.1.132:6789/0} election epoch 66, quorum 0,1,2 192.168.1.130,192.168.1.131,192.168.1.132 osdmap e366: 9 osds: 9 up, 9 in; 11 remapped pgs flags sortbitwise,require_jewel_osds pgmap v340248: 640 pgs, 13 pools, 14885 MB data, 7296 objects 30572 MB used, 824 GB / 854 GB avail 4087/14835 objects degraded (27.550%) 486/14835 objects misplaced (3.276%) 544 active+clean 81 active+recovery_wait+degraded 11 active+remapped+wait_backfill 4 active+recovering+degraded recovery io 54446 kB/s, 67 objects/s [root@control02 mariadb]# docker exec -it ceph_mon ceph -s cluster 33932e16-1909-4d68-b085-3c01d0432adc health HEALTH_WARN 1 pgs backfill_wait 23 pgs degraded 3 pgs recovering 20 pgs recovery_wait 10 pgs stuck unclean recovery 1373/14594 objects degraded (9.408%) recovery 4/14594 objects misplaced (0.027%) monmap e2: 3 mons at {192.168.1.130=192.168.1.130:6789/0,192.168.1.131=192.168.1.131:6789/0,192.168.1.132=192.168.1.132:6789/0} election epoch 66, quorum 0,1,2 192.168.1.130,192.168.1.131,192.168.1.132 osdmap e380: 9 osds: 9 up, 9 in; 1 remapped pgs flags sortbitwise,require_jewel_osds pgmap v340305: 640 pgs, 13 pools, 14885 MB data, 7296 objects 30234 MB used, 825 GB / 854 GB avail 1373/14594 objects degraded (9.408%) 4/14594 objects misplaced (0.027%) 616 active+clean 20 active+recovery_wait+degraded 3 active+recovering+degraded 1 active+remapped+wait_backfill recovery io 99837 kB/s, 35 objects/s [root@control02 mariadb]# [root@control02 mariadb]# docker exec -it ceph_mon ceph -s cluster 33932e16-1909-4d68-b085-3c01d0432adc health HEALTH_OK monmap e2: 3 mons at {192.168.1.130=192.168.1.130:6789/0,192.168.1.131=192.168.1.131:6789/0,192.168.1.132=192.168.1.132:6789/0} election epoch 66, quorum 0,1,2 192.168.1.130,192.168.1.131,192.168.1.132 osdmap e382: 9 osds: 9 up, 9 in flags sortbitwise,require_jewel_osds pgmap v340363: 640 pgs, 13 pools, 14885 MB data, 7296 objects 30204 MB used, 825 GB / 854 GB avail 640 active+clean
從上面的監控記錄能夠看到,添加osd以後整個ceph集羣有一個從新平衡數據的過程。從開始的HEALTH_ERR狀態,到HEALTH_WARN,最後到HEALTH_OK 。
2018年3月29日注:
服務器系統重啓後發現centos操做系統進入緊急模式,提示以下:
welcome to emergency mode!after logging in ,type 「journalctl -xb」 to view system logs,「systemctl reboot」 to reboot ,「systemctl default」 to try again to boot into default mode。
give root password for maintenance
最後發現是替換了osd硬盤以後, 文件/etc/fstab 裏面關於原來的磁盤配置項還在,因此在系統啓動時找不到相應的設備文件而出錯。解決辦法就是將這個文件裏面對應被替換的osd硬盤的記錄刪除:
[root@control01 etc]# more /etc/fstab # # /etc/fstab # Created by anaconda on Mon Jan 22 13:32:42 2018 # # Accessible filesystems, by reference, are maintained under '/dev/disk' # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info # /dev/mapper/centos-root / xfs defaults 0 0 UUID=c2c088f8-a530-4099-a9ef-0fd41508d304 /boot xfs defaults 0 0 UUID=7FA3-61ED /boot/efi vfat defaults,uid=0,gid=0,umask=0077,shortname=winnt 0 0 UUID=b7267089-cae5-4e28-b9ff-a37c373c0d34 /var/lib/ceph/osd/b7267089-cae5-4e28-b9ff-a37c373c0d34 xfs defaults,noatime 0 0 UUID=c35eea02-f07d-4557-acdb-7280a571aaf9 /var/lib/ceph/osd/c35eea02-f07d-4557-acdb-7280a571aaf9 xfs defaults,noatime 0 0 #UUID=a007c495-e2e9-4a08-a565-83dfef1df30d /var/lib/ceph/osd/a007c495-e2e9-4a08-a565-83dfef1df30d xfs defaults,noatime 0 0 UUID=e758ebfe-e4fc-41e2-a493-b66ad5aecd0f /var/lib/ceph/osd/e758ebfe-e4fc-41e2-a493-b66ad5aecd0f xfs defaults,noatime 0 0
如上面所示,倒數第二行對應的就是原來被替換的OSD硬盤。如何找出須要刪除的記錄,能夠用這條命令:
[root@control01 etc]# lsblk |grep osd ├─sdb1 8:17 0 95G 0 part /var/lib/ceph/osd/b7267089-cae5-4e28-b9ff-a37c373c0d34 ├─sdc1 8:33 0 95G 0 part /var/lib/ceph/osd/c35eea02-f07d-4557-acdb-7280a571aaf9 ├─sdd1 8:49 0 95G 0 part /var/lib/ceph/osd/e758ebfe-e4fc-41e2-a493-b66ad5aecd0f