正在愉快的進行jenkins流程,忽然發現etcd鏈接不上去了。從新reboot後發現日誌publish errornode
Oct 31 10:22:42 k8s-master etcd: recognized and used environment variable ETCD_ADVERTISE_CLIENT_URLS=http://etcd:2379,http://etcd:4001 Oct 31 10:22:42 k8s-master etcd: recognized environment variable ETCD_NAME, but unused: shadowed by corresponding flag Oct 31 10:22:42 k8s-master etcd: recognized environment variable ETCD_DATA_DIR, but unused: shadowed by corresponding flag Oct 31 10:22:42 k8s-master etcd: recognized environment variable ETCD_LISTEN_CLIENT_URLS, but unused: shadowed by corresponding flag Oct 31 10:22:42 k8s-master etcd: etcd Version: 3.1.3 Oct 31 10:22:42 k8s-master etcd: Git SHA: 21fdcc6 Oct 31 10:22:42 k8s-master etcd: Go Version: go1.7.4 Oct 31 10:22:42 k8s-master etcd: Go OS/Arch: linux/amd64 Oct 31 10:22:42 k8s-master etcd: setting maximum number of CPUs to 1, total number of available CPUs is 1 Oct 31 10:22:42 k8s-master etcd: the server is already initialized as member before, starting as etcd member... Oct 31 10:22:42 k8s-master etcd: listening for peers on http://localhost:2380 Oct 31 10:22:42 k8s-master etcd: listening for client requests on 0.0.0.0:2379 Oct 31 10:22:42 k8s-master etcd: listening for client requests on 0.0.0.0:4001 Oct 31 10:22:42 k8s-master etcd: recovered store from snapshot at index 210021 Oct 31 10:22:42 k8s-master etcd: name = master Oct 31 10:22:42 k8s-master etcd: data dir = /var/lib/etcd/default.etcd Oct 31 10:22:42 k8s-master etcd: member dir = /var/lib/etcd/default.etcd/member Oct 31 10:22:42 k8s-master etcd: heartbeat = 100ms Oct 31 10:22:42 k8s-master etcd: election = 1000ms Oct 31 10:22:42 k8s-master etcd: snapshot count = 10000 Oct 31 10:22:42 k8s-master etcd: advertise client URLs = http://etcd:2379,http://etcd:4001 Oct 31 10:22:42 k8s-master etcd: ignored file 0000000000000001-0000000000012700.wal.broken in wal Oct 31 10:22:42 k8s-master etcd: restarting member 8e9e05c52164694d in cluster cdf818194e3a8c32 at commit index 215587 Oct 31 10:22:42 k8s-master etcd: 8e9e05c52164694d became follower at term 19 Oct 31 10:22:42 k8s-master etcd: newRaft 8e9e05c52164694d [peers: [8e9e05c52164694d], term: 19, commit: 215587, applied: 210021, lastindex: 215587, lastterm: 19] Oct 31 10:22:42 k8s-master etcd: enabled capabilities for version 3.1 Oct 31 10:22:42 k8s-master etcd: added member 8e9e05c52164694d [http://localhost:2380] to cluster cdf818194e3a8c32 from store Oct 31 10:22:42 k8s-master etcd: set the cluster version to 3.1 from store Oct 31 10:22:42 k8s-master etcd: starting server... [version: 3.1.3, cluster version: 3.1] Oct 31 10:22:49 k8s-master etcd: publish error: etcdserver: request timed out Oct 31 10:22:56 k8s-master etcd: publish error: etcdserver: request timed out Oct 31 10:23:03 k8s-master etcd: publish error: etcdserver: request timed out Oct 31 10:23:10 k8s-master etcd: publish error: etcdserver: request timed out Oct 31 10:23:17 k8s-master etcd: publish error: etcdserver: request timed out Oct 31 10:23:24 k8s-master etcd: publish error: etcdserver: request timed out
查了一下,緣由多是由於主機壓力比較大沒有在5s內將信息寫回照成不一致。linux
解決辦法是刪除data dir: /var/lib/etcd/default.etcd 中的數據,而後啓動成功docker
很差的地方是,全部的信息基本全丟,flanneld信息又須要再寫一次app
etcdctl mk /atomic.io/network/config '{ "Network": "10.0.0.0/16" }'
各類pv,pvc,pod啥信息也沒有了。。。。。ui
在一臺nodes節點上由於鏈接不上etcd致使啓動失敗,由於flannel組件須要鏈接etcd獲取網段,docker服務依賴於flanned Serviceatom
[Unit] Description=Flanneld overlay address etcd agent After=network.target After=network-online.target Wants=network-online.target After=etcd.service Before=docker.service
分析命令主要包括spa
systemctl list-unit-files 列出全部可用的Unit systemctl list-units 列出全部正在運行的Unit systemctl --failed 列出全部失敗單元 systemctl mask httpd.service 禁用服務 systemctl unmask httpd.service systemctl kill httpd 殺死服務
解決辦法,使用systemctl unmask flanneld.service禁止flanneld服務,而後刪除rest
/usr/lib/systemd/system/flanneld.service
/etc/systemd/system/docker.service.requires/flanneld.service,日誌
使用systemctl daemon-reload從新加載服務配置文件,最後systemctl start docker.service,code
發現docker啓動成功了