Today I spend some time to investigate how to remove nodes from the k8s cluster that built by kubeadm
.node
For example, I have a 3 nodes cluster called k8stest
, I deploy the application in namespace
test-1
, each worker node (k8stest2
and k8stest3
) holds some pods:docker
kubectl get pods -n test-1 -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES is-en-conductor-0 1/1 Running 0 5h40m 192.168.1.2 k8stest3.fyre.ibm.com <none> <none> is-engine-compute-0 1/1 Running 0 5h39m 192.168.1.3 k8stest3.fyre.ibm.com <none> <none> is-engine-compute-1 1/1 Running 0 5h38m 192.168.2.4 k8stest2.fyre.ibm.com <none> <none> is-servicesdocker-pod-7b4d9d5c48-vvfn6 1/1 Running 0 5h41m 192.168.2.3 k8stest2.fyre.ibm.com <none> <none> is-xmetadocker-pod-5ff59fff46-tkmqn 1/1 Running 0 5h42m 192.168.2.2 k8stest2.fyre.ibm.com <none> <none>
You can use kubectl drain
to safely evict all of your pods from a node before you perform maintenance on the node (e.g. kernel upgrade, hardware maintenance, etc.). Safe evictions allow the pod’s containers to gracefully terminate and will respect the PodDisruptionBudgets
you have specified.api
The drain
evicts or deletes all pods except mirror pods (which cannot be deleted through the API server). If there are DaemonSet-managed pods, drain will not proceed without --ignore-daemonsets
, and regardless it will not delete any DaemonSet-managed pods, because those pods would be immediately replaced by the DaemonSet controller, which ignores unschedulable markings. If there are any pods that are neither mirror pods nor managed by ReplicationController
, ReplicaSet
, DaemonSet
, StatefulSet
or Job
, then drain will not delete any pods unless you use --force
. --force
will also allow deletion to proceed if the managing resource of one or more pods is missing.bash
Let's first drain k8stest2
:app
kubectl drain k8stest2.fyre.ibm.com --delete-local-data --force --ignore-daemonsets node/k8stest2.fyre.ibm.com cordoned WARNING: Ignoring DaemonSet-managed pods: calico-node-txjpn, kube-proxy-52njn pod/is-engine-compute-1 evicted pod/is-xmetadocker-pod-5ff59fff46-tkmqn evicted pod/is-servicesdocker-pod-7b4d9d5c48-vvfn6 evicted node/k8stest2.fyre.ibm.com evicted
When kubectl drain
returns successfully, that indicates that all of the pods (except the ones excluded as described in the previous paragraph) have been safely evicted (respecting the desired graceful termination period, and without violating any application-level disruption SLOs). It is then safe to bring down the node by powering down its physical machine or, if running on a cloud platform, deleting its virtual machine.less
Let's ssh to k8stest2
node and see what happens here, the payloads were gone:ssh
ssh k8stest2.fyre.ibm.com docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 0fbbb64d93d0 fa6f35a1c14d "/install-cni.sh" 6 hours ago Up 6 hours k8s_install-cni_calico-node-txjpn_kube-system_4b916269-3d49-11e9-b6b3-00163e01eecc_0 b78013d4f454 427a0694c75c "start_runit" 6 hours ago Up 6 hours k8s_calico-node_calico-node-txjpn_kube-system_4b916269-3d49-11e9-b6b3-00163e01eecc_0 c6aaf7cbf713 01cfa56edcfc "/usr/local/bin/kube..." 6 hours ago Up 6 hours k8s_kube-proxy_kube-proxy-52njn_kube-system_4b944a11-3d49-11e9-b6b3-00163e01eecc_0 542bc4662ee4 k8s.gcr.io/pause:3.1 "/pause" 6 hours ago Up 6 hours k8s_POD_calico-node-txjpn_kube-system_4b916269-3d49-11e9-b6b3-00163e01eecc_0 86ee508f0aa1 k8s.gcr.io/pause:3.1 "/pause" 6 hours ago Up 6 hours k8s_POD_kube-proxy-52njn_kube-system_4b944a11-3d49-11e9-b6b3-00163e01eecc_0
The given node will be marked unschedulable
to prevent new pods from arriving.ide
kubectl get nodes NAME STATUS ROLES AGE VERSION k8stest1.fyre.ibm.com Ready master 6h11m v1.13.2 k8stest2.fyre.ibm.com Ready,SchedulingDisabled <none> 5h57m v1.13.2 k8stest3.fyre.ibm.com Ready <none> 5h57m v1.13.2
Because the dedicated node k8stest2
was drained, so is-servicesdocker
and is-xmetadocker
keep pending:ui
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES is-en-conductor-0 1/1 Running 0 6h3m 192.168.1.2 k8stest3.fyre.ibm.com <none> <none> is-engine-compute-0 1/1 Running 0 6h2m 192.168.1.3 k8stest3.fyre.ibm.com <none> <none> is-engine-compute-1 1/1 Running 0 9m26s 192.168.1.4 k8stest3.fyre.ibm.com <none> <none> is-servicesdocker-pod-7b4d9d5c48-vz7x4 0/1 Pending 0 9m39s <none> <none> <none> <none> is-xmetadocker-pod-5ff59fff46-m4xj2 0/1 Pending 0 9m39s <none> <none> <none> <none>
Now it's safe to delete node:this
kubectl delete node k8stest2.fyre.ibm.com node "k8stest2.fyre.ibm.com" deleted
kubectl get nodes NAME STATUS ROLES AGE VERSION k8stest1.fyre.ibm.com Ready master 6h22m v1.13.2 k8stest3.fyre.ibm.com Ready <none> 6h8m v1.13.2
Repeat the steps above for worker node k8stest3
then only master node survives:
kubectl get nodes NAME STATUS ROLES AGE VERSION k8stest1.fyre.ibm.com Ready master 6h25m v1.13.2
It's time to deal with master node:
kubectl drain k8stest1.fyre.ibm.com --delete-local-data --force --ignore-daemonsets node/k8stest1.fyre.ibm.com cordoned WARNING: Ignoring DaemonSet-managed pods: calico-node-vlqh5, kube-proxy-5tfgr pod/docker-registry-85577757d5-952wq evicted pod/coredns-86c58d9df4-kwjr8 evicted pod/coredns-86c58d9df4-4p7g2 evicted node/k8stest1.fyre.ibm.com evicted
Let's see what happens for infrastructure pods, some of them were gone:
kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE calico-node-vlqh5 2/2 Running 0 6h31m coredns-86c58d9df4-5ctw2 0/1 Pending 0 2m15s coredns-86c58d9df4-mg8rf 0/1 Pending 0 2m15s etcd-k8stest1.fyre.ibm.com 1/1 Running 0 6h31m kube-apiserver-k8stest1.fyre.ibm.com 1/1 Running 0 6h31m kube-controller-manager-k8stest1.fyre.ibm.com 1/1 Running 0 6h30m kube-proxy-5tfgr 1/1 Running 0 6h31m kube-scheduler-k8stest1.fyre.ibm.com 1/1 Running 0 6h31m
Note that don't do delete for master node.
Run this in every node to revert any changes made by kubeadm init
or kubeadm join
:
kubeadm reset -f
All container were gone and also check if kubectl
still works?
docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
kubectl get nodes The connection to the server 9.30.219.224:6443 was refused - did you specify the right host or port?
Finally, we need to delete rpms and remove residue in every node:
yum erase -y kubeadm.x86_64 kubectl.x86_64 kubelet.x86_64 kubernetes-cni.x86_64 cri-tools socat
## calico /bin/rm -rf /opt/cni/bin/* /bin/rm -rf /var/lib/calico /bin/rm -rf /run/calico ## config /bin/rm -rf /root/.kube ## etcd /bin/rm -rf /var/lib/etcd/* ## kubernetes /bin/rm -rf /etc/kubernetes/