單純修改/etc/hosts和/etc/ceph/ceph.conf是沒有用的! 由於ceph monitor 是把配置信息存在monmap中的, 不能隨隨便便更改, 由於monitor比如集羣大腦,過重要了!之後最好給monitor分配私網IP地址。node
2. 怎麼解決?app
我偷個懶,把請教大牛的IRC聊天記錄貼出了。主要思想就是把全部monitor先停下來,從集羣中移走,直到集羣剩下一個monitor,而後再一個一個從頭加入進來less
4:20:07 PM - zren: Hello, may I ask a quick question here: what should I do to recover my cluster after a long period of downtime? 2/3 nodes's IP has changed during this time. "ceph -s" still try to connect the old IP even after I've set the new ip in /etc/hosts. And the "mon_host=" list in /etc/ceph/ceph.conf still shows the old IP addresses, should I correct the list manually? 4:28:15 PM - oms101: zren -> never done this but do change the /etc/ceph/ceph.conf 4:28:38 PM - oms101: This is definitely used by the client tools 4:52:04 PM - joao: <oms101> zren -> never done this but do change the /etc/ceph/ceph.conf 4:52:07 PM - joao: this may not be sufficient 4:52:15 PM - joao: how did the ip change? 4:52:51 PM - joao: did you properly moved the monitors to the new ips first? 4:53:03 PM - joao: i'm guessing no 4:53:22 PM - joao: so you'll likely have a monmap with the old ips in it 4:53:55 PM - joao: likelihood is that the monitors won't even be able to form quorum because they have wrong ips for the monitors 4:53:56 PM - oms101: yes good point joao 4:54:28 PM - joao: in which case, your best chance will be extracting the current monmap from all monitors and injecting a new map 4:54:38 PM - oms101: http://docs.ceph.com/docs/master/man/8/monmaptool/ 4:54:49 PM - oms101: is useful documentation on the monmaptool 4:54:49 PM - joao: this will mean shutting down your monitors, but given you likely don't even have quorum who cares anyway 4:56:01 PM - joao: if we're pointing to upstream, i'd instead suggest http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/#removing-monitors-from-an-unhealthy-cluster 4:56:22 PM - joao: absolutely no clue if this has been mapped to our internal docs, although i hope so 4:56:48 PM - joao: omg 4:57:03 PM - zren: joao: first of all, thanks! it changed because the network facility in server room was reconstructed by the IT guy, hah. 5:22:07 PM - zren: joao: come back again;-) unfortunately, I got this error when trying to get the copy of monmap file according the link you point to: 5:22:07 PM - zren: ceph1:~ # ceph-mon -i `hostname` --extract-monmap /tmp/monmap 5:22:07 PM - zren: IO error: lock /var/lib/ceph/mon/ceph-ceph1/store.db/LOCK: Resource temporarily unavailable 5:22:07 PM - zren: 2016-06-29 17:14:04.155152 7fb9cf3607c0 -1 error opening mon data directory at '/var/lib/ceph/mon/ceph-ceph1': (22) Invalid argument 5:23:15 PM - joao: zren, the monitor is running 5:23:24 PM - joao: as i said, you need to shut them down 5:24:37 PM - zren: joao: Yes, according to the link, I stopped 2/3 nodes, so only one surviving monitor is left;-) 5:25:02 PM - joao: zren,you need to do that on *all* the monitors 5:25:11 PM - joao: you need the same map epoch on all the monitors 5:25:17 PM - joao: otherwise that will lead to inconsistencies 5:25:52 PM - joao: the idea is roughly to do 5:26:15 PM - zren: joao: OK, thanks! will try.. please treat me as a very newbie hah;-) 5:27:00 PM - joao: you only need to extract the monmap from the monitor with the latest monmap 5:27:11 PM - joao: but need to inject it into every monitor 5:27:23 PM - zren: joao: got it;-) 5:27:29 PM - smithfarm1 has left the room (Quit: Ping timeout: 121 seconds). 5:27:43 PM - joao: if by some chance you ended up running the cluster with quorum with less than 3 monitors, then you need to check which one has the latest monmap 5:28:02 PM - joao: in that case, extract the monmap on all the monitors and use the monmaptool to check the latest epoch 5:28:16 PM - joao: monmaptool --print /path/to/monmap 5:28:22 PM - joao: that will give you the map epoch 5:28:39 PM - joao: i can't emphasize this enough: use the latest epoch
下面步驟就是停掉全部monitor以後,恢復第二個monitor的大體方法
ceph2:~ # ceph mon remove ceph2 ceph2:~ # rm -rf /var/lib/ceph/mon/ceph-ceph2/* ceph2:~ # mkdir tmp ceph2:~ # ceph mon getmap -o tmp/monmap ceph2:~ # ceph auth get mon. -o tmp/keyring ceph2:~ # ceph-mon -i ceph2 --mkfs --monmap tmp/monmap --keyring tmp/keyring ceph2:~ # ceph-mon -i ceph2 --public-addr "new-ip":6789 ceph2:~ # systemctl start ceph-mon@ceph2.service ceph2:~ # systemctl status ceph-mon@ceph2.service
也可嘗試下面命令,來自[3]:ide
#Add the new monitor locations # monmaptool --create --add mon0 192.168.32.2:6789 --add osd1 192.168.32.3:6789 \ --add osd2 192.168.32.4:6789 --fsid 61a520db-317b-41f1-9752-30cedc5ffb9a \ --clobber monmap #Retrieve the monitor map # ceph mon getmap -o monmap.bin #Check new contents # monmaptool --print monmap.bin #Inject the monmap # ceph-mon -i mon0 --inject-monmap monmap.bin # ceph-mon -i osd1 --inject-monmap monmap.bin # ceph-mon -i osd2 --inject-monmap monmap.bin
參考文檔:ui
[1] http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/#removing-monitors-from-an-unhealthy-clusterthis
[2] http://docs.ceph.com/docs/master/man/8/monmaptool/idea
[3] http://os.51cto.com/art/201412/462140.htmcode