ceph1:~ # ceph -s
cluster f411aff0-1b95-4496-9310-68fa6d568903
health HEALTH_WARN
clock skew detected on mon.ceph1
Monitor clock skew detected
monmap e9: 2 mons at {ceph1=147.2.208.114:6789/0,ceph2=147.2.208.44:6789/0}
election epoch 58, quorum 0,1 ceph2,ceph1
osdmap e127: 3 osds: 3 up, 3 in
pgmap v2318: 72 pgs, 2 pools, 0 bytes data, 0 objects
105 MB used, 45941 MB / 46046 MB avail
72 active+cleanspa
配置ntp server, 我配置了,可是不知爲何不起做用!等一等!!!rest
後來才發現本身對ntp理解的不夠,以前配置的香港的public ntp server, 那時間能準確得嗎?code
ceph默認容忍的時間誤差不到1秒,因此要獲得更精確的時間必須使用local ntp server!ceph集羣要的不是一個準確的國際標準時間,而是集羣名節點有一個精確的時間基準。配好以後,ceph -w很快就恢復health_ok了!server
經過調整參數規避:ip
1. 在admin結點上,修改ceph.conf,添加:get
mon_clock_drift_allowed = 5 mon_clock_drift_warn_backoff = 30
這兩個參數請看:http://docs.ceph.com/docs/hammer/rados/configuration/mon-config-ref/#clockit
mon_clock_drift_allowed設置成多少合適?可參考這條消息:pip
2016-07-01 17:44:14.860902 mon.0 [WRN] mon.1 ****:6789/0 clock skew 3.0706s > max 2s
2. 執行下面命令,ceph1等是monitor結點的名稱io
ceph-deploy --overwrite-conf admin ceph1 ceph2 ceph3
3. 重啓monitorclass
systemctl restart ceph-mon@ceph1.service
3. 驗證
ceph1:~ # ceph -w 2016-07-01 18:19:08.168452 7fb98021d700 0 -- :/1003345 >> 147.2.208.73:6789/0 pipe(0x7fb96c05d370 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7fb96c0599d0).fault cluster f411aff0-1b95-4496-9310-68fa6d568903 health HEALTH_OK monmap e9: 2 mons at {ceph1=147.2.208.114:6789/0,ceph2=147.2.208.44:6789/0} election epoch 68, quorum 0,1 ceph2,ceph1 osdmap e134: 3 osds: 3 up, 3 in pgmap v2369: 72 pgs, 2 pools, 0 bytes data, 0 objects 106 MB used, 45940 MB / 46046 MB avail 72 active+clean 2016-07-01 18:19:03.545418 mon.1 [INF] mon.ceph1 calling new monitor election 2016-07-01 18:19:18.653547 mon.0 [INF] mon.ceph2 calling new monitor election 2016-07-01 18:19:18.686790 mon.0 [INF] mon.ceph2@0 won leader election with quorum 0,1 2016-07-01 18:19:18.687641 mon.0 [INF] HEALTH_OK