出於某些緣由刪除了k8s-001節點,如今須要將k8s-001節點從新做爲控制平面加入集羣,在加入集羣過程當中出錯node
集羣版本:1.13.1docker
3個控制平面,2個worker節點bootstrap
通常直接從新加入集羣的話會出現下面的問題api
[kubeconfig] Writing "controller-manager.conf" kubeconfig file [kubeconfig] Writing "scheduler.conf" kubeconfig file [etcd] Checking Etcd cluster health error syncing endpoints with etc: dial tcp 10.0.3.4:2379: connect: connection refused
這是由於控制平面10.0.3.4(k8s-001)已經被刪除了,可是configmap:kubeadm-config中存在未刪除的狀態dom
root@k8s-002:/home# kubectl get configmaps -n kube-system kubeadm-config -oyaml . . . ClusterStatus: | apiEndpoints: k8s-001: advertiseAddress: 10.0.3.4 bindPort: 6443 k8s-002: advertiseAddress: 10.0.3.5 bindPort: 6443 k8s-003: advertiseAddress: 10.0.3.6 bindPort: 6443 apiVersion: kubeadm.k8s.io/v1beta1 kind: ClusterStatus . . .
能夠看到集羣信息中k8s-001仍然存在,在使用kubeadm從新加入集羣時會檢測節點上的etcd健康狀態tcp
所以要從配置文件中刪掉k8s-001spa
root@k8s-002:/home# kubectl edit configmaps -n kube-system kubeadm-config
刪除以下的k8s-001內容,保存code
k8s-001: advertiseAddress: 10.0.3.4 bindPort: 6443
用kubeadm搭建的集羣,若是是非手動部署etcd(kubeadm自動搭建)的話,etcd是在每一個控制平面都啓動一個實例的,當刪除k8s-001節點時,etcd集羣未自動刪除此節點上的etcd成員,所以須要手動刪除orm
首先查看etcd集羣成員信息server
先設置快捷方式
root@k8s-002:/home# export ETCDCTL_API=3
root@k8s-002:/home# alias etcdctl='etcdctl --endpoints=https://10.0.3.5:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key'
查看etcd集羣成員信息
root@k8s-002:/home# etcdctl member list 57b3a6dc282908df, started, k8s-003, https://10.0.3.6:2380, https://10.0.3.6:2379 58bfa292d53697d0, started, k8s-001, https://10.0.3.4:2380, https://10.0.3.4:2379 f38fd5735de92e88, started, k8s-002, https://10.0.3.5:2380, https://10.0.3.5:2379
雖然看起來集羣很健康,但實際上k8s-001已經不存在了,若是這時加入集羣,就會報以下錯誤
[kubeconfig] Writing "admin.conf" kubeconfig file [kubeconfig] Writing "controller-manager.conf" kubeconfig file [kubeconfig] Writing "scheduler.conf" kubeconfig file [etcd] Checking Etcd cluster health [kubelet] Downloading configuration for the kubelet from the "kubelet-config-1.13" ConfigMap in the kube-system namespace [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml" [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env" [kubelet-start] Activating the kubelet service [tlsbootstrap] Waiting for the kubelet to perform the TLS Bootstrap... [patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "k8s-001" as an annotation error creating local etcd static pod manifest file: etcdserver: unhealthy cluster
刪除失效成員(k8s-001)
root@k8s-002:/home# etcdctl member remove 58bfa292d53697d0 Member 58bfa292d53697d0 removed from cluster f06e01da83f7000d
root@k8s-002:/home# etcdctl member list 57b3a6dc282908df, started, k8s-003, https://10.0.3.6:2380, https://10.0.3.6:2379 f38fd5735de92e88, started, k8s-002, https://10.0.3.5:2380, https://10.0.3.5:2379
一切正常
root@k8s-002:/home# kubectl get pod --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-system calico-node-4956t 1/1 Running 0 128m kube-system calico-node-hkcmq 1/1 Running 0 5h58m kube-system calico-node-lsqsg 1/1 Running 0 5h58m kube-system calico-node-q2zpt 1/1 Running 0 5h58m kube-system calico-node-qdg49 1/1 Running 0 5h58m kube-system coredns-89cc84847-sl2s5 1/1 Running 0 6h3m kube-system coredns-89cc84847-x57kv 1/1 Running 0 6h3m kube-system etcd-k8s-001 1/1 Running 0 39m kube-system etcd-k8s-002 1/1 Running 1 3h8m kube-system etcd-k8s-003 1/1 Running 0 3h7m kube-system kube-apiserver-k8s-001 1/1 Running 0 128m kube-system kube-apiserver-k8s-002 1/1 Running 1 6h1m kube-system kube-apiserver-k8s-003 1/1 Running 2 6h kube-system kube-controller-manager-k8s-001 1/1 Running 0 128m kube-system kube-controller-manager-k8s-002 1/1 Running 1 6h1m kube-system kube-controller-manager-k8s-003 1/1 Running 0 6h kube-system kube-proxy-5stnn 1/1 Running 0 5h59m kube-system kube-proxy-92vtd 1/1 Running 0 6h1m kube-system kube-proxy-sz998 1/1 Running 0 5h59m kube-system kube-proxy-wp2jx 1/1 Running 0 6h kube-system kube-proxy-xl5nn 1/1 Running 0 128m kube-system kube-scheduler-k8s-001 1/1 Running 0 128m kube-system kube-scheduler-k8s-002 1/1 Running 0 6h1m kube-system kube-scheduler-k8s-003 1/1 Running 1 6h
root@k8s-002:/home# etcdctl member list 57b3a6dc282908df, started, k8s-003, https://10.0.3.6:2380, https://10.0.3.6:2379 f38fd5735de92e88, started, k8s-002, https://10.0.3.5:2380, https://10.0.3.5:2379 fc790bd58a364c97, started, k8s-001, https://10.0.3.4:2380, https://10.0.3.4:2379
每次k8s-001執行kubeadm join失敗後,須要執行kubeadm reset重置節點狀態,重置狀態後,若是要從新做爲控制平面加入集羣的話,須要從其它健康的控制平面節點的/etc/kubernetes/pki目錄下向k8s-001拷貝證書,具體證書以下:
打印加入集羣的kubeadm join
命令
root@master:~# kubeadm token create --print-join-command kubeadm join your.k8s.domain:6443 --token xxxxxx.xxxxxxxxxxxxxxxx --discovery-token-ca-cert-hash sha256:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
做爲普通節點加入集羣
kubeadm join your.k8s.domain:6443 --token xxxxxx.xxxxxxxxxxxxxxxx --discovery-token-ca-cert-hash sha256:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
做爲控制平面加入集羣
kubeadm join your.k8s.domain:6443 --token xxxxxx.xxxxxxxxxxxxxxxx --discovery-token-ca-cert-hash sha256:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx --experimental-control-plane
注意,
--experimental-control-plane
參數在1.15+版本須要替換爲--control-plane