2017-7-14 Ceph三節點(3 mon+9 osd)集羣部署

 

    建議你們能夠去買《Ceph分佈式存儲學習指南》,快把Ceph吹上天了,不過我看Ceph確實真的很強啊。三副本存儲無單點故障。不須要RAID、分佈式可伸縮性的,加權選擇不同大小的磁盤,糾刪碼機制節省空間,寫時複製快速生成Openstack數百個實例,crush算法動態選擇數據存儲和訪問位置,不須要元數據表,自我感應組件故障並自我恢復速度極快,不存在性能瓶頸,是將塊存儲、對象存儲、文件存儲與一身的軟件定義存儲方案。html

 

1、基礎環境準備
3臺機器,每臺機器標配:1G內存、2塊網卡、3塊20G的SATA裸盤
"$"符號表示三個節點都進行一樣配置
$ cat /etc/hosts
ceph-node1 10.20.0.101
ceph-node2 10.20.0.102
ceph-node3 10.20.0.103
$ yum install epel-releasenode

 

 

2、安裝ceph-deploy
[root@ceph-node1 ~]# ssh-keygen //在node1節點配置免SSH密鑰登陸到其餘節點
[root@ceph-node1 ~]# ssh-copy-id ceph-node2
[root@ceph-node1 ~]# ssh-copy-id ceph-node3
[root@ceph-node1 ~]# yum install ceph-deploy -y算法

三個節點安裝: yum install ceph centos

[root@ceph01 yum.repos.d]# cat ceph.repo
[ceph]
name=ceph
baseurl=http://mirrors.aliyun.com/ceph/rpm-jewel/el7/x86_64/
gpgcheck=0
[ceph-noarch]
name=cephnoarch
baseurl=http://mirrors.aliyun.com/ceph/rpm-jewel/el7/noarch/
gpgcheck=0
[root@ceph01 yum.repos.d]# ssh


[root@ceph-node1 ~]# ceph-deploy new ceph-node1 ceph-node2 ceph-node3 //new命令會生成一個ceph-node1集羣,而且會在當前目錄生成配置文件和密鑰文件socket

+++++++++++++++++++++++++++++++++++++++
[root@ceph-node1 ~]# ceph -v
ceph version 0.94.10 (b1e0532418e4631af01acbc0cedd426f1905f4af)
[root@ceph-node1 ~]# ceph-deploy mon create-initial //建立第一個monitor
[root@ceph-node1 ~]# ceph status //此時集羣處於error狀態是正常的
cluster ea54af9f-f286-40b2-933d-9e98e7595f1a
health HEALTH_ERR
[root@ceph-node1 ~]# systemctl start ceph
[root@ceph-node1 ~]# systemctl enable ceph分佈式

 

3、建立對象存儲設備OSD,並加入到ceph集羣
[root@ceph-node1 ~]# ceph-deploy disk list ceph-node1 //列出ceph-node1已有的磁盤,很奇怪沒有列出sdb、sdc、sdd,可是確實存在的
//下面的zap命令慎用,會銷燬磁盤中已經存在的分區表和數據。ceph-node1是主機名,一樣能夠是ceph-node2
[root@ceph-node1 ~]# ceph-deploy disk zap ceph-node1:sdb ceph-node1:sdc ceph-node1:sdd
[root@ceph-node1 ~]# ceph-deploy osd create ceph-node1:sdb ceph-node1:sdc ceph-node1:sdd //擦除磁盤原有數據,並建立新的文件系統,默認是XFS,而後將磁盤的第一個分區做爲數據分區,第二個分區做爲日誌分區。加入到OSD中。
[root@ceph-node1 ~]# ceph status //能夠看到集羣依舊沒有處於健康狀態。咱們須要再添加一些節點到ceph集羣中,以便它可以造成分佈式的、冗餘的對象存儲,這樣集羣狀態才爲健康。
cluster ea54af9f-f286-40b2-933d-9e98e7595f1a
health HEALTH_WARN
64 pgs stuck inactive
64 pgs stuck unclean
monmap e1: 1 mons at {ceph-node1=10.20.0.101:6789/0}
election epoch 2, quorum 0 ceph-node1
osdmap e6: 3 osds: 0 up, 0 in
pgmap v7: 64 pgs, 1 pools, 0 bytes data, 0 objects
0 kB used, 0 kB / 0 kB avail
64 creating性能

 

4、縱向擴展多節點Ceph集羣,添加Monitor和OSD
注意:Ceph存儲集羣最少須要一個Monitor處於運行狀態,要提供可用性的話,則須要奇數個monitor,好比3個或5個,以造成仲裁(quorum)。
(1)在Ceph-node2和Ceph-node3部署monitor,可是是在ceph-node1執行命令!
[root@ceph-node1 ~]# ceph-deploy mon add ceph-node2
[root@ceph-node1 ~]# ceph-deploy mon add ceph-node3
++++++++++++++++++++++++++
報錯:[root@ceph-node1 ~]# ceph-deploy mon create ceph-node2
[ceph-node3][WARNIN] Executing /sbin/chkconfig ceph on
[ceph-node3][ERROR ] admin_socket: exception getting command descriptions: [Errno 2] No such file or directory
[ceph-node3][WARNIN] monitor: mon.ceph-node3, might not be running yet
[ceph-node3][INFO ] Running command: ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.ceph-node3.asok mon_status
[ceph-node2][WARNIN] neither `public_addr` nor `public_network` keys are defined for monitors
解決:①經過在CentOS 7上chkconfig,懷疑節點1並無遠程啓動節點2的ceph服務,若是我在node2手動啓動的話,應該就能夠了
[root@ceph-node2 ~]# systemctl status ceph
● ceph.service - LSB: Start Ceph distributed file system daemons at boot time
Loaded: loaded (/etc/rc.d/init.d/ceph)
Active: inactive (dead)
結果enable後仍是失敗了
②沃日,原來是書上寫錯了,在已經添加了監控節點後,後續添加監控節點應該是mon add,真的是醉了!學習

++++++++++++++++++++++++++++++++
[root@ceph-node1 ~]# ceph status
monmap e3: 3 mons at {ceph-node1=10.20.0.101:6789/0,ceph-node2=10.20.0.102:6789/0,ceph-node3=10.20.0.103:6789/0}
election epoch 8, quorum 0,1,2 ceph-node1,ceph-node2,ceph-node3ui

(2)添加更多的OSD節點,依然在ceph-node1執行命令便可。
[root@ceph-node1 ~]# ceph-deploy disk list ceph-node2 ceph-node3
//確保磁盤號不要出錯,不然的話,容易把系統盤都給格式化了!
[root@ceph-node1 ~]# ceph-deploy disk zap ceph-node2:sdb ceph-node2:sdc ceph-node2:sdd
[root@ceph-node1 ~]# ceph-deploy disk zap ceph-node3:sdb ceph-node3:sdc ceph-node3:sdd
//通過實踐,下面的這條命令,osd create最好分兩步,prepare和activate,終於爲何不清楚。
[root@ceph-node1 ~]# ceph-deploy osd create ceph-node2:sdb ceph-node2:sdc ceph-node2:sdd
[root@ceph-node1 ~]# ceph-deploy osd create ceph-node3:sdb ceph-node3:sdc ceph-node3:sdd

++++++++++++++++++++++++++++++++++++++++++++
報錯:
[ceph-node3][WARNIN] ceph-disk: Error: Command '['/usr/sbin/sgdisk', '--new=2:0:5120M', '--change-name=2:ceph journal', '--partition-guid=2:fa28bc46-55de-464a-8151-9c2b51f9c00d', '--typecode=2:45b0969e-9b03-4f30-b4c6-b4b80ceff106', '--mbrtogpt', '--', '/dev/sdd']' returned non-zero exit status 4
[ceph-node3][ERROR ] RuntimeError: command returned non-zero exit status: 1
[ceph_deploy.osd][ERROR ] Failed to execute command: ceph-disk -v prepare --fs-type xfs --cluster ceph -- /dev/sdd
[ceph_deploy][ERROR ] GenericError: Failed to create 3 OSDs
未解決:原來敲入osd create命令不當心把node2寫成node3了,哎我尼瑪,後面愈來愈難辦了
+++++++++++++++++++++++++++++

[root@ceph-node1 ~]# ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 0.08995 root default
-2 0.02998 host ceph-node1
0 0.00999 osd.0 up 1.00000 1.00000
1 0.00999 osd.1 up 1.00000 1.00000
2 0.00999 osd.2 up 1.00000 1.00000
-3 0.02998 host ceph-node2
3 0.00999 osd.3 down 0 1.00000
4 0.00999 osd.4 down 0 1.00000
8 0.00999 osd.8 down 0 1.00000
-4 0.02998 host ceph-node3
5 0.00999 osd.5 down 0 1.00000
6 0.00999 osd.6 down 0 1.00000
7 0.00999 osd.7 down 0 1.00000

有6個OSD都處於down狀態,ceph-deploy osd activate 依然失敗,根據以前的報告,osd create的時候就是失敗的。
未解決:因爲剛部署ceph集羣,尚未數據,能夠把OSD給清空。
參考文檔:http://www.cnblogs.com/zhangzhengyan/p/5839897.html
[root@ceph-node1 ~]# ceph-deploy disk zap ceph-node2:sdb ceph-node2:sdc ceph-node2:sdd
[root@ceph-node1 ~]# ceph-deploy disk zap ceph-node3:sdb ceph-node3:sdc ceph-node3:sdd

(1)從ceph osd tree移走crush map的osd.4,還有其餘osd號
[root@ceph-node1 ~]# ceph osd crush remove osd.3
[root@ceph-node1 ~]# ceph osd crush remove osd.4
[root@ceph-node1 ~]# ceph osd crush remove osd.8
[root@ceph-node1 ~]# ceph osd crush remove osd.5
[root@ceph-node1 ~]# ceph osd crush remove osd.6
[root@ceph-node1 ~]# ceph osd crush remove osd.7

(2)[root@ceph-node1 ~]# ceph osd rm 3
[root@ceph-node1 ~]# ceph osd rm 4
[root@ceph-node1 ~]# ceph osd rm 5
[root@ceph-node1 ~]# ceph osd rm 6
[root@ceph-node1 ~]# ceph osd rm 7
[root@ceph-node1 ~]# ceph osd rm 8

[root@ceph-node1 ~]# ceph osd tree //終於清理乾淨了
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 0.02998 root default
-2 0.02998 host ceph-node1
0 0.00999 osd.0 up 1.00000 1.00000
1 0.00999 osd.1 up 1.00000 1.00000
2 0.00999 osd.2 up 1.00000 1.00000
-3 0 host ceph-node2
-4 0 host ceph-node3
能夠登陸到node2和node3,sdb/sdc/sdd都被幹掉了,除了還剩GPT格式,如今從新ceph-deploy osd create
我尼瑪仍是報錯呀,[ceph_deploy.osd][ERROR ] Failed to execute command: ceph-disk -v prepare --fs-type xfs --cluster ceph -- /dev/sdd

++++++++++++++++++++++++++++++++++
報錯:(1)node1遠程激活node2的osd出錯。prepare和activate可以取代osd create的步驟
[root@ceph-node1 ~]# ceph-deploy osd prepare ceph-node2:sdb ceph-node2:sdc ceph-node2:sdd
[root@ceph-node1 ~]# ceph-deploy osd activate ceph-node2:sdb ceph-node2:sdc ceph-node2:sdd

[ceph-node2][WARNIN] ceph-disk: Cannot discover filesystem type: device /dev/sdb: Line is truncated:
[ceph-node2][ERROR ] RuntimeError: command returned non-zero exit status: 1
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: ceph-disk -v activate --mark-init sysvinit --mount /dev/sdb
解決:格式分區權限問題,在報錯的節點執行ceph-disk activate-all便可。

(2)明明是node2的sdb盤,可是ceph osd tree卻發現執行的是node3的sdb盤,固然會報錯了
Starting Ceph osd.4 on ceph-node2...
Running as unit ceph-osd.4.1500013086.674713414.service.
Error EINVAL: entity osd.3 exists but key does not match
[root@ceph-node1 ~]# ceph osd tree
3 0 osd.3 down 0 1.00000

解決:[root@ceph-node1 ~]# ceph auth del osd.3
[root@ceph-node1 ~]# ceph osd rm 3
在node2上lsblk發現sdb不正常,沒有掛載osd,那麼因而
[root@ceph-node1 ~]# ceph-deploy disk zap ceph-node2:sdb
[root@ceph-node1 ~]# ceph-deploy osd prepare ceph-node2:sdb
[root@ceph-node1 ~]# ceph osd tree //至少osd跑到node2上面,而不是node3,還好還好。
-3 0.02998 host ceph-node2
3 0.00999 osd.3 down 0 1.00000
[root@ceph-node1 ~]# ceph-deploy osd activate ceph-node2:sdb //確定失敗,按照上面的經驗,必須在Node2上單獨激活
[root@ceph-node2 ~]# ceph-disk activate-all

[root@ceph-node1 ~]# ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 0.08995 root default
-2 0.02998 host ceph-node1
0 0.00999 osd.0 up 1.00000 1.00000
1 0.00999 osd.1 up 1.00000 1.00000
2 0.00999 osd.2 up 1.00000 1.00000
-3 0.02998 host ceph-node2
4 0.00999 osd.4 up 1.00000 1.00000
5 0.00999 osd.5 up 1.00000 1.00000
3 0.00999 osd.3 up 1.00000 1.00000
-4 0.02998 host ceph-node3
6 0.00999 osd.6 up 1.00000 1.00000
7 0.00999 osd.7 up 1.00000 1.00000
8 0.00999 osd.8 up 1.00000 1.00000
哎,臥槽,終於解決了,以前只是一個小小的盤符寫錯了,就害得我搞這麼久啊,細心點!


=========================================
拓展:
[root@ceph-node1 ~]# lsblk // OSD up的分區都掛載到/var/lib/ceph/osd目錄下
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 40G 0 disk
├─sda1 8:1 0 500M 0 part /boot
└─sda2 8:2 0 39.5G 0 part
├─centos-root 253:0 0 38.5G 0 lvm /
└─centos-swap 253:1 0 1G 0 lvm [SWAP]
sdb 8:16 0 20G 0 disk
├─sdb1 8:17 0 15G 0 part /var/lib/ceph/osd/ceph-0
└─sdb2 8:18 0 5G 0 part
sdc 8:32 0 20G 0 disk
├─sdc1 8:33 0 15G 0 part /var/lib/ceph/osd/ceph-1
└─sdc2 8:34 0 5G 0 part
sdd 8:48 0 20G 0 disk
├─sdd1 8:49 0 15G 0 part /var/lib/ceph/osd/ceph-2
└─sdd2 8:50 0 5G 0 part
sr0 11:0 1 1024M 0 rom

   解決完問題後,心情真爽啊。。。

相關文章
相關標籤/搜索