因爲k8s集羣的資源不足,將部分服務器關機擴容再啓動,發現部分pod一直沒有起來。node
查看pod的日誌,發現解析集羣內的域名失敗,推測是dns的問題。服務器
查看dns相關的pod,發現nodelocaldns一直在重啓。oop
root@master1:~# k get pods --all-namespaces | grep dns kube-system coredns-74d59cc5c6-9brnf 1/1 Running 0 15d kube-system coredns-74d59cc5c6-g46rf 1/1 Running 0 15d kube-system nodelocaldns-bwnml 1/1 Running 0 15d kube-system nodelocaldns-f8tmj 0/1 CrashLoopBackOff 2926 12d kube-system nodelocaldns-rtngg 0/1 CrashLoopBackOff 44 15d
登陸對應的node,netstat -ntple | grep 53
查看53端口的佔用狀況,發現被named進程佔用。
spa
將named進程kill掉,在/lib/systemd/system目錄下定位到是bind9服務開機自啓動,並將其禁用掉,完成修復工做。日誌
root@node1:/lib/systemd/system# grep -rn named ./ ./bind9-pkcs11.service:3:Documentation=man:named(8) ./bind9-pkcs11.service:8:Environment=KRB5_KTNAME=/etc/bind/named.keytab ./bind9-pkcs11.service:10:ExecStart=/usr/sbin/named-pkcs11 -f -u bind ./bind9-resolvconf.service:3:Documentation=man:named(8) man:resolvconf(8) ./bind9-resolvconf.service:11:ExecStart=/bin/sh -c 'echo nameserver 127.0.0.1 | /sbin/resolvconf -a lo.named' ./bind9-resolvconf.service:12:ExecStop=/sbin/resolvconf -d lo.named ./systemd-hostnamed.service:12:Documentation=man:systemd-hostnamed.service(8) man:hostname(5) man:machine-info(5) ./systemd-hostnamed.service:13:Documentation=https://www.freedesktop.org/wiki/Software/systemd/hostnamed ./systemd-hostnamed.service:16:ExecStart=/lib/systemd/systemd-hostnamed ./bind9.service:3:Documentation=man:named(8) ./bind9.service:10:ExecStart=/usr/sbin/named -f $OPTIONS root@node1:/lib/systemd/system# systemctl disable bind9 Synchronizing state of bind9.service with SysV service script with /lib/systemd/systemd-sysv-install. Executing: /lib/systemd/systemd-sysv-install disable bind9