系列目錄html
問題描述:週五寫字樓總體停電,週一再來的時候發現不少pod的狀態都是Terminating
,經排查是由於測試環境kubernetes集羣中的有些節點是PC機,停電後須要手動開機才能起來.起來之後節點恢復正常,可是經過journalctl -fu kubelet
查看日誌不斷有如下錯誤node
[root@k8s-node4 pods]# journalctl -fu kubelet -- Logs begin at 二 2019-05-21 08:52:08 CST. -- 5月 21 14:48:48 k8s-node4 kubelet[2493]: E0521 14:48:48.748460 2493 kubelet_volumes.go:140] Orphaned pod "d29f26dc-77bb-11e9-971b-0050568417a2" found, but volume paths are still present on disk : There were a total of 1 errors similar to this. Turn up verbosity to see them.
咱們經過cd進入/var/lib/kubelet/pods
目錄,使用ls查看git
[root@k8s-node4 pods]# ls 36e224e2-7b73-11e9-99bc-0050568417a2 42e8cd65-76b1-11e9-971b-0050568417a2 42eaca2d-76b1-11e9-971b-0050568417a2 36e30462-7b73-11e9-99bc-0050568417a2 42e94e29-76b1-11e9-971b-0050568417a2 d29f26dc-77bb-11e9-971b-0050568417a2
能夠看到,錯誤信息裏的pod的ID在這裏面,咱們cd進入它(d29f26dc-77bb-11e9-971b-0050568417a2),能夠看到裏面有如下文件github
[root@k8s-node4 d29f26dc-77bb-11e9-971b-0050568417a2]# ls containers etc-hosts plugins volumes
咱們查看etc-hosts
文件docker
[root@k8s-node4 d29f26dc-77bb-11e9-971b-0050568417a2]# cat etc-hosts # Kubernetes-managed hosts file. 127.0.0.1 localhost ::1 localhost ip6-localhost ip6-loopback fe00::0 ip6-localnet fe00::0 ip6-mcastprefix fe00::1 ip6-allnodes fe00::2 ip6-allrouters 10.244.7.7 sagent-b4dd8b5b9-zq649
咱們在主節點上執行kubectl get pod|grep sagent-b4dd8b5b9-zq649
發現這個pod已經不存在了.bash
問題的討論查看這裏有人在pr裏提交了來解決這個問題,截至目前PR仍然是未合併狀態.oop
目前解決辦法是先在問題節點上進入/var/lib/kubelet/pods
目錄,刪除報錯的pod對應的hash(rm -rf 名稱
),而後從集羣主節點刪除此節點(kubectl delete node),而後在問題節點上執行測試
kubeadm reset systemctl stop kubelet systemctl stop docker systemctl start docker systemctl start kubelet
執行完成之後此節點從新加入集羣this