環境:Rancher管控的K8S集羣。git
現象:某個Node頻繁出現「PLEG is not healthy: pleg was last seen active 3m46.752815514s ago; threshold is 3m0s」錯誤,頻率在5-10分鐘就會出現一次。github
排查:docker
kubectl get pods --all-namespaces
發現有一個istio-ingressgateway-6bbdd58f8c-nlgnd
一直處於Terminating狀態,也就是說殺不死。到Node上docker logs --tail 100 kubelet
也看到這個Pod的狀態異常:spa
I0218 01:21:17.383650 10311 kubelet.go:1775] skipping pod synchronization - [PLEG is not healthy: pleg was last seen active 3m46.752815514s ago; threshold is 3m0s] ... E0218 01:21:30.654433 10311 generic.go:271] PLEG: pod istio-ingressgateway-6bbdd58f8c-nlgnd/istio-system failed reinspection: rpc error: code = DeadlineExceeded desc = context deadline exceeded
kubelet delete pod
嘗試刪除,命令掛住。kubectl delete pod --force --grace-period=0
,強制刪除Pod。docker ps -a| grep ingressgateway-6bbdd58f8c-nlgnd
,看到容器處於Exited狀態。kubectl delete pod --force --grace-period=0
的方式刪除。相關issuecode