(1)SUSE Storage6 實驗環境搭建詳細步驟 - Win10 + VMware WorkStation
html
(2)SUSE Linux Enterprise 15 SP1 系統安裝node
(3)SUSE Ceph 快速部署 - Storage6python
(4)SUSE Ceph 增長節點、減小節點、 刪除OSD磁盤等操做 - Storage6
vim
(5)深刻理解 DeepSea 和 Salt 部署工具 - Storage6bash
以前咱們已快速部署好一套Ceph集羣(3節點),現要測試在現有集羣中在線方式增長節點網絡
主機名架構 |
Public網絡app |
管理網絡工具 |
集羣網絡post |
說明 |
admin |
192.168.2.39 |
172.200.50.39 |
--- |
管理節點 |
node001 |
192.168.2.40 |
172.200.50.40 |
192.168.3.40 |
MON,OSD |
node002 |
192.168.2.41 |
172.200.50.41 |
192.168.3.41 |
MON,OSD |
node003 |
192.168.2.42 |
172.200.50.42 |
192.168.3.42 |
MON,OSD |
node004 |
192.168.2.43 |
172.200.50.43 |
192.168.3.43 |
OSD |
能夠看到架構圖中增長了node004節點,而且node004節點只是做爲OSD節點,並沒有MON或MGR服務
一、收集集羣信息
(1)集羣狀態
# ceph -s cluster: id: f7b451b3-4a4c-4681-a4ef-4b5359242a92 health: HEALTH_OK services: mon: 3 daemons, quorum node001,node002,node003 (age 90m) mgr: node001(active, since 89m), standbys: node002, node003 osd: 6 osds: 6 up (since 90m), 6 in (since 23h) data: pools: 0 pools, 0 pgs objects: 0 objects, 0 B usage: 12 GiB used, 48 GiB / 60 GiB avail pgs:
(2)集羣OSD磁盤信息
# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0.05878 root default -5 0.01959 host node001 2 hdd 0.00980 osd.2 up 1.00000 1.00000 5 hdd 0.00980 osd.5 up 1.00000 1.00000 -3 0.01959 host node002 0 hdd 0.00980 osd.0 up 1.00000 1.00000 3 hdd 0.00980 osd.3 up 1.00000 1.00000 -7 0.01959 host node003 1 hdd 0.00980 osd.1 up 1.00000 1.00000 4 hdd 0.00980 osd.4 up 1.00000 1.00000
(3)查看新增節點磁盤
# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 20G 0 disk ├─sda1 8:1 0 1G 0 part /boot └─sda2 8:2 0 19G 0 part ├─vgoo-lvswap 254:0 0 2G 0 lvm [SWAP] └─vgoo-lvroot 254:1 0 17G 0 lvm / sdb 8:16 0 10G 0 disk sdc 8:32 0 10G 0 disk sr0 11:0 1 1024M 0 rom nvme0n1 259:0 0 20G 0 disk nvme0n2 259:1 0 20G 0 disk nvme0n3 259:2 0 20G 0 disk
二、初始化操做系統
(1)初始化步驟
參考快速部署 storage6 文檔
(2)顯示倉庫
# zypper lr Repository priorities are without effect. All enabled repositories share the same priority. # | Alias | Name | Enabled | GPG Check | Refresh ---+----------------------------------------------------+----------------------------------------------------+---------+-----------+-------- 1 | SLE-Module-Basesystem-SLES15-SP1-Pool | SLE-Module-Basesystem-SLES15-SP1-Pool | Yes | (r ) Yes | No 2 | SLE-Module-Basesystem-SLES15-SP1-Upadates | SLE-Module-Basesystem-SLES15-SP1-Upadates | Yes | (r ) Yes | No 3 | SLE-Module-Legacy-SLES15-SP1-Pool | SLE-Module-Legacy-SLES15-SP1-Pool | Yes | (r ) Yes | No 4 | SLE-Module-Legacy-SLES15-SP1-Updates | SLE-Module-Legacy-SLES15-SP1-Updates | Yes | ( p) Yes | No 5 | SLE-Module-Server-Applications-SLES15-SP1-Pool | SLE-Module-Server-Applications-SLES15-SP1-Pool | Yes | (r ) Yes | No 6 | SLE-Module-Server-Applications-SLES15-SP1-Upadates | SLE-Module-Server-Applications-SLES15-SP1-Upadates | Yes | (r ) Yes | No 7 | SLE-Product-SLES15-SP1-Pool | SLE-Product-SLES15-SP1-Pool | Yes | (r ) Yes | No 8 | SLE-Product-SLES15-SP1-Updates | SLE-Product-SLES15-SP1-Updates | Yes | (r ) Yes | No 9 | SUSE-Enterprise-Storage-6-Pool | SUSE-Enterprise-Storage-6-Pool | Yes | (r ) Yes | No 10 | SUSE-Enterprise-Storage-6-Updates | SUSE-Enterprise-Storage-6-Updates | Yes | (r ) Yes | No
(3)hosts文件
192.168.2.39 admin.example.com admin 192.168.2.40 node001.example.com node001 192.168.2.41 node002.example.com node002 192.168.2.42 node003.example.com node003 192.168.2.43 node004.example.com node004 192.168.2.44 node005.example.com node005
三、安裝 satlt-minion
zypper -n in salt-minion sed -i '17i\master: 192.168.2.39' /etc/salt/minion systemctl restart salt-minion.service systemctl enable salt-minion.service systemctl status salt-minion.service
# salt-key Accepted Keys: admin.example.com node001.example.com node002.example.com node003.example.com Denied Keys: Unaccepted Keys: node004.example.com <==== 新加節點 Rejected Keys:
# salt-key -A
# salt "node004*" test.ping node004.example.com: True
四、預防集羣數據平衡
# ceph osd set norebalance norebalance is set admin:/etc/salt/pki/master # ceph -s cluster: id: f7b451b3-4a4c-4681-a4ef-4b5359242a92 health: HEALTH_WARN norebalance flag(s) set
services:
mon: 3 daemons, quorum node001,node002,node003 (age 2h)
mgr: node001(active, since 2h), standbys: node002, node003
osd: 6 osds: 6 up (since 2h), 6 in (since 24h)
flags norebalance
data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 B
usage: 12 GiB used, 48 GiB / 60 GiB avail
pgs:
(1)新建 global.conf 文件(admin節點)
# vim /srv/salt/ceph/configuration/files/ceph.conf.d/global.conf osd_crush_initial_weight = 0
(2)建立新配置文件(admin節點)
# salt '*' state.apply ceph.configuration.create
注意:執行時有報錯,可忽略
node003.example.com: Name: /var/cache/salt/minion/files/base/ceph/configuration - Function: file.absent - Result: Changed Started: - 15:42:45.362265 Duration: 22.133 ms ---------- ID: /srv/salt/ceph/configuration/cache/ceph.conf Function: file.managed Result: False Comment: Unable to manage file: Jinja error: 'select.minions' Traceback (most recent call last): File "/usr/lib/python3.6/site-pack
(3)執行新配置文件,而且只在node001 node002 node003節點上生效 (admin節點)
# salt 'node00[1-3]*' state.apply ceph.configuration
(4)檢查個節點配置文件 (node001,node002,node003)
# cat /etc/ceph/ceph.conf osd crush initial weight = 0
五、執行stage0 1 2 (admin節點)
# salt-run state.orch ceph.stage.0 # salt-run state.orch ceph.stage.1 # salt-run state.orch ceph.stage.2 # salt 'node004*' pillar.items # 查看pillar設置是否正確 public_network: 192.168.2.0/24 roles: - storage # 僅僅是 storage 角色 time_server: admin.example.com
六、檢查生成 OSD 報告(admin節點)
# salt-run disks.report node004.example.com: |_ - 0 - Total OSDs: 2 Solid State VG: Targets: block.db Total size: 19.00 GB Total LVs: 2 Size per LV: 1.86 GB Devices: /dev/nvme0n2 Type Path LV Size % of device ------------------------------------------------------------------------- [data] /dev/sdb 9.00 GB 100.0% [block.db] vg: vg/lv 1.86 GB 10% ------------------------------------------------------------------------- [data] /dev/sdc 9.00 GB 100.0% [block.db] vg: vg/lv 1.86 GB 10%
七、運行stage3, 把node004節點添加進來,並自動建立OSD (admin節點)
# salt-run state.orch ceph.stage.3
八、執行後檢查集羣OSD狀態 (admin節點)
能夠發現新增節點權重都是0,這是因爲以前配置的「osd_crush_initial_weight」參數,預防新增節點或磁盤進來時進行數據平衡。
# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0.05878 root default -7 0.01959 host node001 2 hdd 0.00980 osd.2 up 1.00000 1.00000 5 hdd 0.00980 osd.5 up 1.00000 1.00000 -3 0.01959 host node002 0 hdd 0.00980 osd.0 up 1.00000 1.00000 3 hdd 0.00980 osd.3 up 1.00000 1.00000 -5 0.01959 host node003 1 hdd 0.00980 osd.1 up 1.00000 1.00000 4 hdd 0.00980 osd.4 up 1.00000 1.00000 -9 0 host node004 6 hdd 0 osd.6 up 1.00000 1.00000 <=== 新增節點OSD權重爲0 7 hdd 0 osd.7 up 1.00000 1.00000
九、手動增長OSD磁盤權重 (admin節點)
注意:生產環境請在變動時間執行,執行時會數據平衡,影響讀寫。固然也能夠經過Ceph參數或QOS來控制讀寫速率,後續文檔中會提到。
# ceph osd crush reweight osd.6 0.00980 # ceph osd crush reweight osd.7 0.00980
# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0.07837 root default -7 0.01959 host node001 2 hdd 0.00980 osd.2 up 1.00000 1.00000 5 hdd 0.00980 osd.5 up 1.00000 1.00000 -3 0.01959 host node002 0 hdd 0.00980 osd.0 up 1.00000 1.00000 3 hdd 0.00980 osd.3 up 1.00000 1.00000 -5 0.01959 host node003 1 hdd 0.00980 osd.1 up 1.00000 1.00000 4 hdd 0.00980 osd.4 up 1.00000 1.00000 -9 0.01959 host node004 6 hdd 0.00980 osd.6 up 1.00000 1.00000 7 hdd 0.00980 osd.7 up 1.00000 1.00000
一、首先咱們經過 VMware workstation 的虛擬機 node004 節點上添加一塊10G大小的磁盤
二、開啓虛擬機後,node004主機終端中查看新增磁盤
# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 20G 0 disk ├─sda1 8:1 0 1G 0 part /boot └─sda2 8:2 0 19G 0 part ├─vgoo-lvroot 254:0 0 17G 0 lvm / └─vgoo-lvswap 254:1 0 2G 0 lvm [SWAP] sdb 8:16 0 10G 0 disk └─ceph--block--0515f9d7--3407--46a5-- 254:4 0 9G 0 lvm sdc 8:32 0 10G 0 disk └─ceph--block--9f7394b2--3ad3--4cd8-- 254:5 0 9G 0 lvm sdd 8:48 0 10G 0 disk <== 新增磁盤 sr0 11:0 1 1024M 0 rom nvme0n1 259:0 0 20G 0 disk nvme0n2 259:1 0 20G 0 disk ├─ceph--block--dbs--57d07a01--4440--4 254:2 0 1G 0 lvm └─ceph--block--dbs--57d07a01--4440--4 254:3 0 1G 0 lvm nvme0n3 259:2 0 20G 0 disk
三、查看 VG LV 信息
# lvs LV VG Attr LSize osd-block-9a914f7d-ae9c-451a-ac7e-bcb6cb1fc926 ceph-block-0515f9d7-3407-46a5-be68-db80fc789dcc -wi-ao---- 9.00g osd-block-79f5920f-b41c-4dd0-94e9-dc85dbb2e7e4 ceph-block-9f7394b2-3ad3-4cd8-8267-7e5993af1271 -wi-ao---- 9.00g osd-block-db-2244293e-ca96-4847-a5cb-9112f59836fa ceph-block-dbs-57d07a01-4440-4892-b44c-eae536613586 -wi-ao---- 1.00g osd-block-db-2b295cc9-caff-45ad-a179-d7e3ba46a39d ceph-block-dbs-57d07a01-4440-4892-b44c-eae536613586 -wi-ao---- 1.00g osd-block-db-test ceph-block-dbs-57d07a01-4440-4892-b44c-eae536613586 -wi-a----- 2.00g lvroot vgoo -wi-ao---- 17.00g lvswap vgoo -wi-ao---- 2.00g
四、建立 OSD 磁盤的 VG 和 LV
# vgcreate ceph-block-0 /dev/sdd # lvcreate -l 100%FREE -n block-0 ceph-block-0
五、在已在nvme0n2磁盤上建立的 VG 上建立 LV
咱們從第3個步驟中能夠看到,nvme0n2磁盤已經被 VG ceph-block-dbs-57d07a01-4440-4892-b44c-eae536613586所使用,咱們要在該VG上建立LV,由於通常一塊PCIE SSD磁盤能夠承擔10塊OSD數據磁盤,做爲他們WAL和DB加速磁盤使用。
# lvcreate -L 2GB -n db-0 ceph-block-dbs-57d07a01-4440-4892-b44c-eae536613586
六、顯示 VG LV 信息
# lvs LV VG Attr LSize block-0 ceph-block-0 -wi-a----- 10.00g osd-block-9a914f7d-ae9c-451a-ac7e-bcb6cb1fc926 ceph-block-0515f9d7-3407-46a5-be68-db80fc789dcc -wi-ao---- 9.00g osd-block-79f5920f-b41c-4dd0-94e9-dc85dbb2e7e4 ceph-block-9f7394b2-3ad3-4cd8-8267-7e5993af1271 -wi-ao---- 9.00g db-0 ceph-block-dbs-57d07a01-4440-4892-b44c-eae536613586 -wi-a----- 2.00g osd-block-db-2244293e-ca96-4847-a5cb-9112f59836fa ceph-block-dbs-57d07a01-4440-4892-b44c-eae536613586 -wi-ao---- 1.00g osd-block-db-2b295cc9-caff-45ad-a179-d7e3ba46a39d ceph-block-dbs-57d07a01-4440-4892-b44c-eae536613586 -wi-ao---- 1.00g lvroot vgoo -wi-ao---- 17.00g lvswap vgoo -wi-ao---- 2.00g
七、使用 ceph-volume 方式建立
# ceph-volume lvm create --bluestore --data ceph-block-0/block-0 --block.db ceph-block-dbs-57d07a01-4440-4892-b44c-eae536613586/db-0
# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0.07837 root default -7 0.01959 host node001 2 hdd 0.00980 osd.2 up 1.00000 1.00000 5 hdd 0.00980 osd.5 up 1.00000 1.00000 -3 0.01959 host node002 0 hdd 0.00980 osd.0 up 1.00000 1.00000 3 hdd 0.00980 osd.3 up 1.00000 1.00000 -5 0.01959 host node003 1 hdd 0.00980 osd.1 up 1.00000 1.00000 4 hdd 0.00980 osd.4 up 1.00000 1.00000 -9 0.01959 host node004 6 hdd 0.00980 osd.6 up 1.00000 1.00000 7 hdd 0.00980 osd.7 up 1.00000 1.00000 8 hdd 0 osd.8 up 1.00000 1.00000 <==== 能夠看到 osd.8 被建立出來
八、設置權重
注意:生產環境操做時,會數據平衡會影響到讀寫性能。
# ceph osd crush reweight osd.8 0.00980
# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0.08817 root default -7 0.01959 host node001 2 hdd 0.00980 osd.2 up 1.00000 1.00000 5 hdd 0.00980 osd.5 up 1.00000 1.00000 -3 0.01959 host node002 0 hdd 0.00980 osd.0 up 1.00000 1.00000 3 hdd 0.00980 osd.3 up 1.00000 1.00000 -5 0.01959 host node003 1 hdd 0.00980 osd.1 up 1.00000 1.00000 4 hdd 0.00980 osd.4 up 1.00000 1.00000 -9 0.02939 host node004 6 hdd 0.00980 osd.6 up 1.00000 1.00000 7 hdd 0.00980 osd.7 up 1.00000 1.00000 8 hdd 0.00980 osd.8 up 1.00000 1.00000
語法: salt-run osd.remove OSD_ID
一、批量刪除 node004 節點上 OSD.7 OSD.8
admin:~ # ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0.08817 root default -7 0.01959 host node001 2 hdd 0.00980 osd.2 up 1.00000 1.00000 5 hdd 0.00980 osd.5 up 1.00000 1.00000 -3 0.01959 host node002 0 hdd 0.00980 osd.0 up 1.00000 1.00000 3 hdd 0.00980 osd.3 up 1.00000 1.00000 -5 0.01959 host node003 1 hdd 0.00980 osd.1 up 1.00000 1.00000 4 hdd 0.00980 osd.4 up 1.00000 1.00000 -9 0.02939 host node004 6 hdd 0.00980 osd.6 up 1.00000 1.00000 7 hdd 0.00980 osd.7 up 1.00000 1.00000 8 hdd 0.00980 osd.8 up 1.00000 1.00000
admin:~ # salt-run osd.remove 7 8 Removing osd 7 on host node004.example.com Draining the OSD Waiting for ceph to catch up. osd.7 is safe to destroy Purging from the crushmap Zapping the device Removing osd 8 on host node004.example.com Draining the OSD Waiting for ceph to catch up. osd.8 is safe to destroy Purging from the crushmap Zapping the device
二、顯示 osd 信息,node004主機上 osd.7 和 osd.8 已被刪除
admin:~ # ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0.06857 root default -7 0.01959 host node001 2 hdd 0.00980 osd.2 up 1.00000 1.00000 5 hdd 0.00980 osd.5 up 1.00000 1.00000 -3 0.01959 host node002 0 hdd 0.00980 osd.0 up 1.00000 1.00000 3 hdd 0.00980 osd.3 up 1.00000 1.00000 -5 0.01959 host node003 1 hdd 0.00980 osd.1 up 1.00000 1.00000 4 hdd 0.00980 osd.4 up 1.00000 1.00000 -9 0.00980 host node004 6 hdd 0.00980 osd.6 up 1.00000 1.00000
三、刪除 OSD 其餘命令
(1)刪除節點上全部osd
# salt-run osd.remove OSD_HOST_NAME
(2)當 WAL 或 DB 設備損壞時,移除破損的磁盤
# salt-run osd.remove OSD_ID force=True
從集羣中移出node004 osd節點,移除前請確保集羣有足夠的空間容納node004上的數據
一、手動方式
(1)管理節點查看 OSD 信息
# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0.06857 root default -7 0.01959 host node001 2 hdd 0.00980 osd.2 up 1.00000 1.00000 5 hdd 0.00980 osd.5 up 1.00000 1.00000 -3 0.01959 host node002 0 hdd 0.00980 osd.0 up 1.00000 1.00000 3 hdd 0.00980 osd.3 up 1.00000 1.00000 -5 0.01959 host node003 1 hdd 0.00980 osd.1 up 1.00000 1.00000 4 hdd 0.00980 osd.4 up 1.00000 1.00000 -9 0.00980 host node004 6 hdd 0.00980 osd.6 up 1.00000 1.00000
(2)node004 節點上中止OSD服務
# systemctl stop ceph-osd@6.service # systemctl stop ceph-osd.target
(3)node004 節點上,卸載掛載目錄
# umount /var/lib/ceph/osd/ceph-6
(4)admin節點上,移除OSD
# ceph osd out 6 # ceph osd crush remove osd.6 # ceph osd rm osd.6 # ceph auth del osd.6
(5)admin節點上,從CRUSH MAP上移除節點信息
# ceph osd crush rm node004
(6)檢查集羣是否清理乾淨node004
# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0.05878 root default -7 0.01959 host node001 2 hdd 0.00980 osd.2 up 1.00000 1.00000 5 hdd 0.00980 osd.5 up 1.00000 1.00000 -3 0.01959 host node002 0 hdd 0.00980 osd.0 up 1.00000 1.00000 3 hdd 0.00980 osd.3 up 1.00000 1.00000 -5 0.01959 host node003 1 hdd 0.00980 osd.1 up 1.00000 1.00000 4 hdd 0.00980 osd.4 up 1.00000 1.00000
(7)node004節點,刪除全部相關 VG LV 信息
node004:~ # vgs VG #PV #LV #SN Attr VSize VFree ceph-block-0515f9d7-3407-46a5-be68-db80fc789dcc 1 1 0 wz--n- 9.00g 0 ceph-block-dbs-57d07a01-4440-4892-b44c-eae536613586 1 1 0 wz--n- 19.00g 18.00g vg00 1 2 0 wz--n- 19.00g 0
node004:~ # lvs LV VG Attr LSize osd-block-9a914f7d-ae9c-451a-ac7e-bcb6cb1fc926 ceph-block-0515f9d7-3407-46a5-be68-db80fc789dcc -wi-a----- 9.00g osd-block-db-2b295cc9-caff-45ad-a179-d7e3ba46a39d ceph-block-dbs-57d07a01-4440-4892-b44c-eae536613586 -wi-a----- 1.00g lvroot vg00 -wi-ao---- 17.00g lvswap vg00 -wi-ao---- 2.00g
# for i in `vgs | grep ceph- | awk '{ print $1 }'`; do vgremove -f $i; done
node004:~ # lvs LV VG Attr LSize lvroot vg00 -wi-ao---- 17.00g lvswap vg00 -wi-ao---- 2.00g
node004:~ # vgs VG #PV #LV #SN Attr VSize VFree vg00 1 2 0 wz--n- 19.00g 0
二、DeepSea 方式
(1)管理節點上修改 policy.cfg 文件
vim /srv/pillar/ceph/proposals/policy.cfg ## Cluster Assignment #cluster-ceph/cluster/*.sls <=== 註釋掉 cluster-ceph/cluster/node00[1-3]*.sls <=== 匹配 target cluster-ceph/cluster/admin*.sls <=== 匹配 target ## Roles # ADMIN role-master/cluster/admin*.sls role-admin/cluster/admin*.sls # Monitoring role-prometheus/cluster/admin*.sls role-grafana/cluster/admin*.sls # MON role-mon/cluster/node00[1-3]*.sls # MGR (mgrs are usually colocated with mons) role-mgr/cluster/node00[1-3]*.sls # COMMON config/stack/default/global.yml config/stack/default/ceph/cluster.yml # Storage # 定義爲 storage 角色 #role-storage/cluster/node00*.sls <=== 註釋掉 role-storage/cluster/node00[1-3]*.sls <=== 匹配 target
(2)修改 drive_group.yml 文件
# vim /srv/salt/ceph/configuration/files/drive_groups.yml # This is the default configuration and # will create an OSD on all available drives drive_group_hdd_nvme: target: 'node00[1-3]*' <== 匹配 target data_devices: size: '9GB:12GB' db_devices: rotational: 0 limit: 1 block_db_size: '2G'
(3)執行salt命令,stage2和stage5
# salt-run state.orch ceph.stage.2 # salt-run state.orch ceph.stage.5