在《Kubernetes探祕-多master節點容錯部署》中介紹了經過部署多個主節點來提升Kubernetes的容錯能力。其中,最爲關鍵是存儲集羣控制數據的etcd服務必須在多個副節點間實時同步,而kube-apiserver經過keepalived進行IP主地址的切換。 在《Kubernetes探祕-etcd節點和實例擴容》中已經詳細介紹了etcd多節點擴容的步驟,但在實際操做中發現,必須要遵循必定的操做順序、注意細節,不然會致使擴容失敗,並且很容易形成整個etcd集羣沒法訪問。這裏將etcd集羣擴容的實際操做經驗整理出來分享。git
所有使用https進行etcd集羣的鏈接,須要生成和配置證書,能夠參考《Kubernetes探祕-etcd節點和實例擴容》裏的方法,這些證書文件須要複製到每個節點的/etc/kubernetes/pki目錄下。須要所有使用固定IP地址,在Ubuntu 18.06中使用 Netplan進行配置(參考《Ubuntu 18.04設置靜態IP》),使用 sudo netplan apply讓其當即生效(須要稍等會兒配置完畢)。github
上傳目錄,示例:docker
sudo scp -r root@10.1.1.201:/etc/kubernetes/pki /etc/kubernetes
咱們首先安裝一個Kubernetes主節點,其它節點配置參數將從其衍生出來。參考:數據庫
準備工做完成後,使用下面的命令安裝Kubernetes的單實例Master節點。api
sudo kubeadm init --kubernetes-version=v1.13.1 --apiserver-advertise-address=10.1.1.199
由於個人機器上有多塊網卡,使用 --apiserver-advertise-address=10.1.1.199 指定apiserver的服務地址,這個地址也是keepalived的虛擬IP地址(須要提早安裝,參考《Keepalived快速使用》),將會在遇故障時在多個節點間自動漂移該主地址,使其它節點能夠訪問到。app
輸入 kubectl get pod --all-namespaces檢查該單實例集羣是否運行正常。工具
主節點已經安裝了一個etcd的實例,而且存放了集羣運行的最基礎參數。爲了防止etcd集羣擴容過程數據丟失,咱們將其備份。具體操做參見《Kubernetes的etcd數據查看和遷移》。須要注意的是,etcd api2和api3的備份和恢復方法不一樣,由於從Kubernetes 1.13.0開始已經使用API3,下面介紹的都是API3的方法。url
ETCDCTL_API=3 etcdctl --endpoints=https://[10.1.1.202]:2379 \ --cacert=/etc/kubernetes/pki/etcd-certs/ca.pem \ --cert=/etc/kubernetes/pki/etcd-certs/client.pem \ --key=/etc/kubernetes/pki/etcd-certs/client-key.pem \ snapshot save /home/supermap/openthings/etcd$(date +%Y%m%d_%H%M%S)_snapshot.db
ETCDCTL_API=3 etcdctl --endpoints=https://[10.1.1.199]:2379 \ --cacert=/etc/kubernetes/pki/etcd-certs/ca.pem \ --cert=/etc/kubernetes/pki/etcd-certs/client.pem \ --key=/etc/kubernetes/pki/etcd-certs/client-key.pem \ --data-dir=/var/lib/etcd \ --initial-advertise-peer-urls=https://10.1.1.199:2380 \ --initial-cluster=podc01=https://10.1.1.199:2380 \ --initial-cluster-token=etcd-cluster \ --name=podc01 \ snapshot restore /home/supermap/etcd_snapshot.db
上面的備份文件名能夠本身起,恢復時能對上就行。spa
包括:.net
首先,檢查etcd集羣的運行狀態:
ETCDCTL_API=3 etcdctl --endpoints=https://[10.1.1.199]:2379 \ --cacert=/etc/kubernetes/pki/etcd-certs/ca.pem \ --cert=/etc/kubernetes/pki/etcd-certs/client.pem \ --key=/etc/kubernetes/pki/etcd-certs/client-key.pem \ endpoint status -w table
而後,更新etcd實例的peer-urls:
ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd-certs/ca.pem \ --cert=/etc/kubernetes/pki/etcd-certs/client.pem \ --key=/etc/kubernetes/pki/etcd-certs/client-key.pem \ --endpoints=https://[10.1.1.199]:2379 \ member update podc01 --peer-urls=https://10.1.1.201:2380
第三,修改etcd實例的client-urls。
sudo systemctl stop kubelet
apiVersion: v1 kind: Pod metadata: annotations: scheduler.alpha.kubernetes.io/critical-pod: "" creationTimestamp: null labels: component: etcd tier: control-plane name: etcd namespace: kube-system spec: containers: - command: - etcd - --advertise-client-urls=https://10.1.1.201:2379 - --cert-file=/etc/kubernetes/pki/etcd/server.pem - --client-cert-auth=true - --data-dir=/var/lib/etcd - --initial-advertise-peer-urls=https://10.1.1.201:2380 - --initial-cluster=podc01=https://10.1.1.201:2380 - --key-file=/etc/kubernetes/pki/etcd/server-key.pem - --listen-client-urls=https://127.0.0.1:2379,https://10.1.1.201:2379 - --listen-peer-urls=https://10.1.1.201:2380 - --name=podc01 - --peer-cert-file=/etc/kubernetes/pki/etcd/peer1.pem - --peer-client-cert-auth=true - --peer-key-file=/etc/kubernetes/pki/etcd/peer1-key.pem - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.pem - --snapshot-count=10000 - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.pem image: k8s.gcr.io/etcd:3.2.24 imagePullPolicy: IfNotPresent livenessProbe: exec: command: - /bin/sh - -ec - ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.pem --cert=/etc/kubernetes/pki/etcd/client.pem --key=/etc/kubernetes/pki/etcd/client-key.pem get foo failureThreshold: 8 initialDelaySeconds: 15 timeoutSeconds: 15 name: etcd resources: {} volumeMounts: - mountPath: /var/lib/etcd name: etcd-data - mountPath: /etc/kubernetes/pki/etcd name: etcd-certs hostNetwork: true priorityClassName: system-cluster-critical volumes: - hostPath: path: /etc/kubernetes/pki/etcd-certs type: DirectoryOrCreate name: etcd-certs - hostPath: path: /var/lib/etcd type: DirectoryOrCreate name: etcd-data status: {}
sudo systemctl start kubelet
檢查一下etcd的服務狀態:
ETCDCTL_API=3 etcdctl --endpoints=https://[10.1.1.201]:2379 --cacert=/etc/kubernetes/pki/etcd-certs/ca.pem --cert=/etc/kubernetes/pki/etcd-certs/client.pem --key=/etc/kubernetes/pki/etcd-certs/client-key.pem endpoint status -w table
修改/etc/kubernetes/manifests/api-server.yaml文件,以下:
# - --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt # - --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt # - --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key # - --etcd-servers=https://127.0.0.1:2379 - --etcd-cafile=/etc/kubernetes/pki/etcd-certs/ca.pem - --etcd-certfile=/etc/kubernetes/pki/etcd-certs/client.pem - --etcd-keyfile=/etc/kubernetes/pki/etcd-certs/client-key.pem - --etcd-servers=https://10.1.1.201:2379
將上面的新的etcd服務地址配置給kube-apiserver。
重啓 kubelet,以下:
#從新啓動kubelet服務。 sudo systemctl restart kubelet #查看運行的容器實例。 docker ps #查看全部運行的容器實例。 #包含已中止的,若是etcd啓動失敗退出,可經過該命令查看。 docker ps -a #查看特定容器實例的日誌。 docker logs idxxxx
再次檢查etcd狀態:
ETCDCTL_API=3 etcdctl --endpoints=https://[10.1.1.201]:2379 \ --cacert=/etc/kubernetes/pki/etcd-certs/ca.pem \ --cert=/etc/kubernetes/pki/etcd-certs/client.pem \ --key=/etc/kubernetes/pki/etcd-certs/client-key.pem \ endpoint status -w table
檢查kubernetes集羣狀態(kubectl get pod --all-namespaces)。
下面將節點逐個加入(etcd節點的IP地址必須在前面的證書生成時加入)。
我使用Kubernetes的kubelet來託管etcd的運行(也能夠部署爲獨立的系統服務,如使用systemd)。
使用kubeadm join加入新的節點(將會建立kubelet基礎服務,並且etcd節點和kubernetes節點同時可用)。在主節點獲取添加命令,以下:
#在主節點上執行 kubeadm token create --print-join-command
直接將master節點的/etc/kubernetes/pki目錄複製到子節點。以下:
#在子節點上執行 sudo scp -r root@10.1.1.201:/etc/kubernetes/pki /etc/kubernetes/
命令爲:
sudo systemctl stop kubelet
使用etcdctl的member add命令添加節點:
#在子節點上執行,將子節點peer-urls添加到etcd集羣中。 ETCDCTL_API=3 etcdctl \ --cacert=/etc/kubernetes/pki/etcd-certs/ca.pem \ --cert=/etc/kubernetes/pki/etcd-certs/client.pem \ --key=/etc/kubernetes/pki/etcd-certs/client-key.pem \ --endpoints=https://[10.1.1.201]:2379 \ member add podc02 --peer-urls=https://10.1.1.202:2380
此時,etcdctl member list查當作員爲unstarted狀態。命令以下:
ETCDCTL_API=3 etcdctl --endpoints=https://[10.1.1.201]:2379 \ --cacert=/etc/kubernetes/pki/etcd-certs/ca.pem \ --cert=/etc/kubernetes/pki/etcd-certs/client.pem \ --key=/etc/kubernetes/pki/etcd-certs/client-key.pem \ member list -w table
將etcd.yaml文件放入各個子節點的/etc/kubernetes/manifests目錄下,跟master節點同樣,而後sudo systemctl restart kubelet重啓kubelet服務,kubelet啓動時將會自動啓動/etc/kubernetes/manifests下的全部*.yaml實例爲靜態pod(靜態pod在Dashboard刪除時會刪除當前的運行實例,而後被kubelet自動重啓,不會永久刪除)。
#在子節點上執行 sudo scp -r root@10.1.1.201:/etc/kubernetes/manifests/etcd.yaml /etc/kubernetes/manifests/
sudo nano /etc/kubernetes/manifest/etcd.yaml
#子節點podc02上的/etc/kubernetes/manifests/etcd.yaml apiVersion: v1 kind: Pod metadata: annotations: scheduler.alpha.kubernetes.io/critical-pod: "" creationTimestamp: null labels: component: etcd tier: control-plane name: etcd namespace: kube-system spec: containers: - command: - etcd - --advertise-client-urls=https://10.1.1.202:2379 - --cert-file=/etc/kubernetes/pki/etcd/server.pem - --client-cert-auth=true - --data-dir=/var/lib/etcd - --initial-advertise-peer-urls=https://10.1.1.202:2380 - --initial-cluster=podc01=https://10.1.1.201:2380,podc02=https://10.1.1.202:2380 - --initial-cluster-token=etcd-cluster - --initial-cluster-state=existing - --key-file=/etc/kubernetes/pki/etcd/server-key.pem - --listen-client-urls=https://127.0.0.1:2379,https://10.1.1.202:2379 - --listen-peer-urls=https://10.1.1.202:2380 - --name=podc02 - --peer-cert-file=/etc/kubernetes/pki/etcd/peer2.pem - --peer-client-cert-auth=true - --peer-key-file=/etc/kubernetes/pki/etcd/peer2-key.pem - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.pem - --snapshot-count=10000 - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.pem image: k8s.gcr.io/etcd:3.2.24 imagePullPolicy: IfNotPresent livenessProbe: exec: command: - /bin/sh - -ec - ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.pem --cert=/etc/kubernetes/pki/etcd/client.pem --key=/etc/kubernetes/pki/etcd/client-key.pem get foo failureThreshold: 8 initialDelaySeconds: 15 timeoutSeconds: 15 name: etcd resources: {} volumeMounts: - mountPath: /var/lib/etcd name: etcd-data - mountPath: /etc/kubernetes/pki/etcd name: etcd-certs hostNetwork: true priorityClassName: system-cluster-critical volumes: - hostPath: path: /etc/kubernetes/pki/etcd-certs type: DirectoryOrCreate name: etcd-certs - hostPath: path: /var/lib/etcd type: DirectoryOrCreate name: etcd-data status: {}
確認etcd參數正確,如今能夠啓動kubelet服務了。命令爲:
sudo systemctl start kubelet
! 參照上面的3.1-3.6方法將全部etcd集羣子節點加入到集羣中(注意嚴格按照順序操做)。
能夠在主機安裝etcd-client,而後etcdctl能夠直接鏈接到容器中的etcd服務。
查看etcd集羣成員列表:
# etcd cluster member list echo "" echo "=============================" echo "+ etcd cluster member list..." ETCDCTL_API=3 etcdctl \ --cacert=/etc/kubernetes/pki/etcd-certs/ca.pem \ --cert=/etc/kubernetes/pki/etcd-certs/client.pem \ --key=/etc/kubernetes/pki/etcd-certs/client-key.pem \ member list -w table --endpoints=https://[10.1.1.201]:2379
輸出以下:
============================= + etcd cluster member list... +------------------+---------+--------+-------------------------+-------------------------+ | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | +------------------+---------+--------+-------------------------+-------------------------+ | 741ead392743e35 | started | podc02 | https://10.1.1.202:2380 | https://10.1.1.202:2379 | | 72077d56570df47f | started | podc01 | https://10.1.1.201:2380 | https://10.1.1.201:2379 | | dfc70cacefa4fbbb | started | podc04 | https://10.1.1.204:2380 | https://10.1.1.204:2379 | | e3ecb8f6d5866785 | started | podc03 | https://10.1.1.203:2380 | https://10.1.1.203:2379 | +------------------+---------+--------+-------------------------+-------------------------+
查看etcd集羣成員狀態:
# member list, local echo "" echo "=========================" echo "+ etcd cluster status... " ETCDCTL_API=3 etcdctl \ --cacert=/etc/kubernetes/pki/etcd-certs/ca.pem \ --cert=/etc/kubernetes/pki/etcd-certs/client.pem \ --key=/etc/kubernetes/pki/etcd-certs/client-key.pem \ --endpoints=https://[10.1.1.201]:2379,https://[10.1.1.202]:2379,https://[10.1.1.203]:2379,https://[10.1.1.204]:2379 \ endpoint status -w table
輸出以下:
========================= + etcd cluster status... +---------------------------+------------------+---------+---------+-----------+-----------+------------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX | +---------------------------+------------------+---------+---------+-----------+-----------+------------+ | https://[10.1.1.201]:2379 | 72077d56570df47f | 3.2.24 | 4.2 MB | true | 1875 | 253980 | | https://[10.1.1.202]:2379 | 741ead392743e35 | 3.2.24 | 4.2 MB | false | 1875 | 253980 | | https://[10.1.1.203]:2379 | e3ecb8f6d5866785 | 3.2.24 | 4.2 MB | false | 1875 | 253980 | | https://[10.1.1.204]:2379 | dfc70cacefa4fbbb | 3.2.24 | 4.2 MB | false | 1875 | 253980 | +---------------------------+------------------+---------+---------+-----------+-----------+------------+
修改/etc/kubernetes/manifests/api-server.yaml文件,以下:
# - --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt # - --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt # - --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key # - --etcd-servers=https://127.0.0.1:2379 - --etcd-cafile=/etc/kubernetes/pki/etcd-certs/ca.pem - --etcd-certfile=/etc/kubernetes/pki/etcd-certs/client.pem - --etcd-keyfile=/etc/kubernetes/pki/etcd-certs/client-key.pem - --etcd-servers=https://10.1.1.201:2379
將上面的新的etcd服務地址配置給kube-apiserver。
⚠️提示:
下一步:
參考: