一點題外話:kubernetes官方3月25號發佈1.14,本文28號完成。1.14升級安裝中文指南,目前全網大概最新吧,支持請賞個贊。node
本次升級主要參考官方Upgrading kubeadm clusters from v1.13 to v1.14linux
升級以前注意事項(翻譯自官方文檔):git
按照要求查看kubernetes 1.14 更改說明重點閱讀Urgent Upgrade Notes,結合本身業務,並無發現特別重大的變更,能夠放心升級。github
kubernetes集羣是按照kubernetes 1.13 全新安裝指南搭建,以下:bootstrap
[hall@192-168-10-21 ~]$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
192-168-10-14 Ready master 36h v1.13.0
192-168-10-18 Ready <none> 103d v1.13.0
192-168-10-21 Ready master 104d v1.13.0
複製代碼
業務數據備份,就不用介紹了。實際上安全起見最好先在測試集羣上進行升級,經過後再考慮正式集羣的升級。api
升級過程主要變化的是kubernetes系統服務,重點是kubelet,因此將kubelet配置備份一下更爲穩妥,方法以下:安全
1 查看kubelet服務配置:bash
[root@192-168-10-94 ~]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since 二 2019-03-19 18:38:46 CST; 1 weeks 1 days ago
Docs: https://kubernetes.io/docs/
Main PID: 6033 (kubelet)
Tasks: 17
Memory: 59.1M
CGroup: /system.slice/kubelet.service
└─6033 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=cgroupfs --network-plugin=cni --pod-infra-container-image=re...
複製代碼
2 查看服務的配置文件 10-kubeadm.conf
:網絡
[root@192-168-10-94 ~]# cat /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
# Note: This dropin only works with kubeadm and kubelet v1.11+
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
# This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically
EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env
# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use
# the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.
EnvironmentFile=-/etc/sysconfig/kubelet
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS
複製代碼
3 備份涉及的配置文件架構
/etc/kubernetes/bootstrap-kubelet.conf (可能並不存在,沒有也沒有關係)
/etc/kubernetes/kubelet.conf
/var/lib/kubelet/kubeadm-flags.env
/etc/sysconfig/kubelet
複製代碼
下面正式開始升級過程。
升級以前,必定要確保具備多個控制節點,這樣能夠保障集羣的可用。單一控制節點,升級萬一掛了,怕是比較麻煩。添加控制節點的方法,參考上文的kubernetes 1.13 全新安裝指南。
若是沒有特殊說明本文除 kubectl
之外的命令,都是使用 root 帳號執行。
1 先檢查一下repo源中kubeadm是否更新到 1.14.0 的版本
yum list --showduplicates kubeadm --disableexcludes=kubernetes
複製代碼
我本地的源沒有找到 1.14.0 。 使用下面命令清理,後再行檢查能夠獲得 1.14.0
yum --disablerepo=\* --enablerepo=kubernetes clean all
2 再次查看kubeadm版本信息
[root@192-168-10-21 ~]# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.4", GitCommit:"c27b913fddd1a6c480c229191a087698aa92f0b1", GitTreeState:"clean", BuildDate:"2019-02-28T13:35:32Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
複製代碼
3 安裝kubeadm工具
yum install -y kubeadm-1.14.0-0 --disableexcludes=kubernetes
複製代碼
4 確認kubeadm版本升級完成
[root@192-168-10-21 ~]# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.0", GitCommit:"641856db18352033a0d96dbc99153fa3b27298e5", GitTreeState:"clean", BuildDate:"2019-03-25T15:51:21Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
複製代碼
5 升級檢查和方案
[root@192-168-10-21 ~]# kubeadm upgrade plan
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: v1.13.0
[upgrade/versions] kubeadm version: v1.14.0
Awesome, you're up-to-date! Enjoy! kubeadm upgrade apply v1.14.0 複製代碼
這裏的提示信息和官方文檔有出入,不過這是正常的信息。
6 升級kubeadm到1.14
kubeadm upgrade apply v1.14.0
這個執行過程,視集羣狀況,大概會執行幾分鐘,輸出信息也比較多,大概以下:
[root@192-168-10-21 ~]# kubeadm upgrade apply v1.14.0
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
.....
[upgrade/staticpods] Component "kube-scheduler" upgraded successfully!
[upload-config] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.14" in namespace kube-system with the configuration for the kubelets in the cluster
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.14" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy
[upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.14.0". Enjoy!
[upgrade/kubelet] Now that your control plane is upgraded, please proceed with upgrading your kubelets if you haven't already done so. 複製代碼
7 檢查CNI狀況,肯定是否要升級
kubernetes架構,網絡部分肯定Container Network Interface接口,具體實現交由其它組件。個人集羣使用flannel,檢查一下:
kubectl get pods -n kube-system
...
kubectl describe pod/kube-flannel-ds-amd64-5xxh7 -n kube-system
...
Image: quay.io/coreos/flannel:v0.11.0-amd64
複製代碼
使用的是 0.11, 查看flannel主頁得知已是最新版,這一步不用處理。
8 升級kubectl和kubelet
yum install -y kubelet-1.14.0-0 kubectl-1.14.0-0 --disableexcludes=kubernetes
複製代碼
9 重啓kubelet
[root@192-168-10-21 ~]# systemctl restart kubelet
Warning: kubelet.service changed on disk. Run 'systemctl daemon-reload' to reload units.
[root@192-168-10-21 ~]# systemctl daemon-reload
[root@192-168-10-21 ~]# systemctl restart kubelet
複製代碼
實際上,重啓kubelet失敗,報錯:Failed to start ContainerManager failed to initialise top level QOS containers。 排查過程請見附1
10 檢查升級結果
[hall@192-168-10-21 ~]$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
192-168-10-14 Ready master 38h v1.13.0
192-168-10-18 Ready <none> 103d v1.13.0
192-168-10-21 Ready master 104d v1.14.0
複製代碼
192-168-10-21的狀態爲 Ready ,版本也變爲 1.14.0 ,主控節點升級成功。
參考上文升級好kubeadm,kubectl和kubelet工具。
升級到1.14
主控節點已經執行了檢查和升級,192-168-10-14只須要執行 kubeadm upgrade apply v1.14.0
。
不幸的是,又遇到了一點情況 failed to get APIEndpoint information for this node,排查過程請見附2
重啓kubelet
檢查升級結果
[hall@192-168-10-21 ~]$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
192-168-10-14 Ready master 39h v1.14.0
192-168-10-18 Ready <none> 103d v1.13.0
192-168-10-21 Ready master 104d v1.14.0
複製代碼
1 臨時備份
由於集羣就一個業務節點,爲安全起見,先調整一個控制節點,用於臨時支撐業務:
[tyhall51@192-168-10-21 ~]$ kubectl taint node 192-168-10-14 node-role.kubernetes.io/master-
node/192-168-10-14 untainted
複製代碼
而後業務節點臨時增長污點,防止升級期間調度:
[tyhall51@192-168-10-21 ~]$ kubectl drain 192-168-10-18 --ignore-daemonsets
node/192-168-10-18 cordoned
error: unable to drain node "192-168-10-18", aborting command...
There are pending nodes to be drained:
192-168-10-18
error: cannot delete Pods with local storage (use --delete-local-data to override): kube-system/elasticsearch-logging-0, kube-system/elasticsearch-logging-1, kube-system/monitoring-influxdb-8b7d57f5c-2bhlw
複製代碼
2 安裝kubeadm工具
3 升級kubedam到1.14
[root@192-168-10-18 ~]# kubeadm upgrade node config --kubelet-version v1.14.0
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.14" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[upgrade] The configuration for this node was successfully updated!
[upgrade] Now you should go ahead and upgrade the kubelet package using your package manager.
複製代碼
4 升級kubectl和kubelet
kubelet一樣須要重啓
5 還原臨時備份
先取消業務節點污點
[tyhall51@192-168-10-21 ~]$ kubectl uncordon 192-168-10-18
node/192-168-10-18 uncordoned
複製代碼
而後還原master節點
[tyhall51@192-168-10-21 ~]$ kubectl taint node 192-168-10-14 node-role.kubernetes.io/master=:NoSchedule
node/192-168-10-14 tainted
複製代碼
6 檢查結果
[tyhall51@192-168-10-21 ~]$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
192-168-10-14 Ready master 40h v1.14.0
192-168-10-18 Ready <none> 103d v1.14.0
192-168-10-21 Ready master 104d v1.14.0
複製代碼
以上,完成了kubernetes從1.13到1.14的升級,總體上講,升級過程比較輕鬆。總結一下升級過程:
kubelet重啓失敗,systemctl status kubelet
中錯誤信息:
Failed to start ContainerManager failed to initialise top level QOS containers
複製代碼
參考https://github.com/kubernetes/kubernetes/issues/43704提示在kubelet啓動時候增長 --cgroups-per-qos=false --enforce-node-allocatable="" 便可解決。以前備份kubelet的配置時候知道 /var/lib/kubelet/kubeadm-flags.env
中定義kubelet的啓動參數,在其中加上,重啓kubelet,恢復正常。
[root@192-168-10-14 ~]# kubeadm upgrade apply v1.14.0
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[upgrade/config] FATAL: failed to getAPIEndpoint: failed to get APIEndpoint information for this node
複製代碼
根據提示,使用編輯 kubectl -n kube-system edit cm kubeadm-config -oyaml
kubeadm-config, 調整apiEndpoints爲:
apiEndpoints:
192-168-10-21:
advertiseAddress: 192.168.10.21
bindPort: 6443
192-168-10-14:
advertiseAddress: 192.168.10.14
bindPort: 6443
複製代碼
繼續執行kubeadm upgrade apply v1.14.0,正常完成。