詳細聊聊k8s deployment的滾動更新（二）

時間 2019-11-10

標籤詳細聊聊 k8s deployment 滾動更新简体版

原文原文鏈接

1、知識準備

● 本文詳細探索deployment在滾動更新時候的行爲
● 相關的參數介紹：
livenessProbe：存活性探測。判斷pod是否已經中止
readinessProbe：就緒性探測。判斷pod是否可以提供正常服務
maxSurge：在滾動更新過程當中最多能夠存在的pod數
maxUnavailable：在滾動更新過程當中最多不可用的pod數node

2、環境準備

組件	版本
OS	Ubuntu 18.04.1 LTS
docker	18.06.0-ce

3、準備鏡像、yaml文件

首先準備2個不一樣版本的鏡像，用於測試（已經在阿里雲上建立好2個不一樣版本的nginx鏡像）nginx

docker pull registry.cn-beijing.aliyuncs.com/mrvolleyball/nginx:v1
docker pull registry.cn-beijing.aliyuncs.com/mrvolleyball/nginx:delay_v1

2個鏡像都提供相同的服務，只不過nginx:delay_v1會延遲啓動20才啓動nginxdocker

root@k8s-master:~# docker run -d --rm -p 10080:80 nginx:v1
e88097841c5feef92e4285a2448b943934ade5d86412946bc8d86e262f80a050
root@k8s-master:~# curl http://127.0.0.1:10080
----------
version: v1
hostname: f5189a5d3ad3

yaml文件：後端

root@k8s-master:~# more roll_update.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: update-deployment
spec:
  replicas: 3
  template:
    metadata:
      labels:
        app: roll-update
    spec:
      containers:
      - name: nginx
        image: registry.cn-beijing.aliyuncs.com/mrvolleyball/nginx:v1
        imagePullPolicy: Always
---
apiVersion: v1
kind: Service
metadata:
  name: nginx-service
spec:
    selector:
      app: roll-update
    ports:
    - protocol: TCP
      port: 10080
      targetPort: 80

4、livenessProbe與readinessProbe

livenessProbe：存活性探測，最主要是用來探測pod是否須要重啓
readinessProbe：就緒性探測，用來探測pod是否已經可以提供服務api

● 在滾動更新的過程當中，pod會動態的被delete，而後又被create出來。存活性探測保證了始終有足夠的pod存活提供服務，一旦出現pod數量不足，k8s會當即拉起新的pod
● 可是在pod啓動的過程當中，服務正在打開，並不可用，這時候若是有流量打過來，就會形成報錯app

下面來模擬一下這個場景：curl

首先apply上述的配置文件tcp

root@k8s-master:~# kubectl apply -f roll_update.yaml
deployment.extensions "update-deployment" created
service "nginx-service" created
root@k8s-master:~# kubectl get pod -owide
NAME                                 READY     STATUS    RESTARTS   AGE       IP              NODE
update-deployment-7db77f7cc6-c4s2v   1/1       Running   0          28s       10.10.235.232   k8s-master
update-deployment-7db77f7cc6-nfgtd   1/1       Running   0          28s       10.10.36.82     k8s-node1
update-deployment-7db77f7cc6-tflfl   1/1       Running   0          28s       10.10.169.158   k8s-node2
root@k8s-master:~# kubectl get svc
NAME            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)     AGE
nginx-service   ClusterIP   10.254.254.199   <none>        10080/TCP   1m

從新打開終端，測試當前服務的可用性（每秒作一次循環去獲取nginx的服務內容）：ide

root@k8s-master:~# while :; do curl http://10.254.254.199:10080; sleep 1; done
----------
version: v1
hostname: update-deployment-7db77f7cc6-nfgtd
----------
version: v1
hostname: update-deployment-7db77f7cc6-c4s2v
----------
version: v1
hostname: update-deployment-7db77f7cc6-tflfl
----------
version: v1
hostname: update-deployment-7db77f7cc6-nfgtd
...

這時候把鏡像版本更新到nginx:delay_v1，這個鏡像會延遲啓動nginx，也就是說，會先sleep 20s，而後纔去啓動nginx服務。這就模擬了在服務啓動過程當中，雖然pod已是存在的狀態，可是並無真正提供服務測試

root@k8s-master:~# kubectl patch deployment update-deployment --patch '{"metadata":{"annotations":{"kubernetes.io/change-cause":"update version to v2"}} ,"spec": {"template": {"spec": {"containers": [{"name": "nginx","image":"registry.cn-beijing.aliyuncs.com/mrvolleyball/nginx:delay_v1"}]}}}}'
deployment.extensions "update-deployment" patched

...
----------
version: v1
hostname: update-deployment-7db77f7cc6-h6hvt
curl: (7) Failed to connect to 10.254.254.199 port 10080: Connection refused
curl: (7) Failed to connect to 10.254.254.199 port 10080: Connection refused
curl: (7) Failed to connect to 10.254.254.199 port 10080: Connection refused
curl: (7) Failed to connect to 10.254.254.199 port 10080: Connection refused
curl: (7) Failed to connect to 10.254.254.199 port 10080: Connection refused
curl: (7) Failed to connect to 10.254.254.199 port 10080: Connection refused
curl: (7) Failed to connect to 10.254.254.199 port 10080: Connection refused
curl: (7) Failed to connect to 10.254.254.199 port 10080: Connection refused
curl: (7) Failed to connect to 10.254.254.199 port 10080: Connection refused
curl: (7) Failed to connect to 10.254.254.199 port 10080: Connection refused
curl: (7) Failed to connect to 10.254.254.199 port 10080: Connection refused
curl: (7) Failed to connect to 10.254.254.199 port 10080: Connection refused
----------
version: delay_v1
hostname: update-deployment-d788c7dc6-6th87
----------
version: delay_v1
hostname: update-deployment-d788c7dc6-n22vz
----------
version: delay_v1
hostname: update-deployment-d788c7dc6-njmpz
----------
version: delay_v1
hostname: update-deployment-d788c7dc6-6th87

能夠看到，因爲延遲啓動，nginx並無真正作好準備提供服務，此時流量已經發到後端，致使服務不可用的狀態

因此，加入readinessProbe是很是必要的手段：

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: update-deployment
spec:
  replicas: 3
  template:
    metadata:
      labels:
        app: roll-update
    spec:
      containers:
      - name: nginx
        image: registry.cn-beijing.aliyuncs.com/mrvolleyball/nginx:v1
        imagePullPolicy: Always
        readinessProbe:
          tcpSocket:
            port: 80
          initialDelaySeconds: 5
          periodSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
  name: nginx-service
spec:
    selector:
      app: roll-update
    ports:
    - protocol: TCP
      port: 10080
      targetPort: 80

重複上述步驟，先建立nginx:v1，而後patch到nginx:delay_v1

root@k8s-master:~# kubectl apply -f roll_update.yaml
deployment.extensions "update-deployment" created
service "nginx-service" created
root@k8s-master:~# kubectl patch deployment update-deployment --patch '{"metadata":{"annotations":{"kubernetes.io/change-cause":"update version to v2"}} ,"spec": {"template": {"spec": {"containers": [{"name": "nginx","image":"registry.cn-beijing.aliyuncs.com/mrvolleyball/nginx:delay_v1"}]}}}}'
deployment.extensions "update-deployment" patched

root@k8s-master:~# kubectl get pod -owide
NAME                                 READY     STATUS        RESTARTS   AGE       IP              NODE
busybox                              1/1       Running       0          45d       10.10.235.255   k8s-master
lifecycle-demo                       1/1       Running       0          32d       10.10.169.186   k8s-node2
private-reg                          1/1       Running       0          92d       10.10.235.209   k8s-master
update-deployment-54d497b7dc-4mlqc   0/1       Running       0          13s       10.10.169.178   k8s-node2
update-deployment-54d497b7dc-pk4tb   0/1       Running       0          13s       10.10.36.98     k8s-node1
update-deployment-6d5d7c9947-l7dkb   1/1       Terminating   0          1m        10.10.169.177   k8s-node2
update-deployment-6d5d7c9947-pbzmf   1/1       Running       0          1m        10.10.36.97     k8s-node1
update-deployment-6d5d7c9947-zwt4z   1/1       Running       0          1m        10.10.235.246   k8s-master

● 因爲設置了readinessProbe，雖然pod已經啓動起來了，可是並不會當即投入使用，因此出現了 READY: 0/1 的狀況
● 而且有pod出現了一直持續Terminating狀態，由於滾動更新的限制，至少要保證有pod可用

再查看curl的狀態，image的版本平滑更新到了nginx:delay_v1，沒有出現報錯的情況

root@k8s-master:~# while :; do curl http://10.254.66.136:10080; sleep 1; done
...
version: v1
hostname: update-deployment-6d5d7c9947-pbzmf
----------
version: v1
hostname: update-deployment-6d5d7c9947-zwt4z
----------
version: v1
hostname: update-deployment-6d5d7c9947-pbzmf
----------
version: v1
hostname: update-deployment-6d5d7c9947-zwt4z
----------
version: delay_v1
hostname: update-deployment-54d497b7dc-pk4tb
----------
version: delay_v1
hostname: update-deployment-54d497b7dc-4mlqc
----------
version: delay_v1
hostname: update-deployment-54d497b7dc-pk4tb
----------
version: delay_v1
hostname: update-deployment-54d497b7dc-4mlqc
...

5、maxSurge與maxUnavailable

● 在滾動更新中，有幾種更新方案：先刪除老的pod，而後添加新的pod；先添加新的pod，而後刪除老的pod。在這個過程當中，服務必須是可用的（也就是livenessProbe與readiness必須檢測經過）
● 在具體的實施中，由maxSurge與maxUnavailable來控制到底是先刪老的仍是先加新的以及粒度
● 若指定的副本數爲3：
maxSurge=1 maxUnavailable=0：最多容許存在4個（3+1）pod，必須有3個pod（3-0）同時提供服務。先建立一個新的pod，可用以後刪除老的pod，直至所有更新完畢
maxSurge=0 maxUnavailable=1：最多容許存在3個（3+0）pod，必須有2個pod（3-1）同時提供服務。先刪除一個老的pod，而後建立新的pod，直至所有更新完畢
● 歸根結底，必須知足maxSurge與maxUnavailable的條件，若是maxSurge與maxUnavailable同時爲0，那就無法更新了，由於又不讓刪除，也不讓添加，這種條件是沒法知足的

6、小結

● 本文介紹了deployment滾動更新過程當中，maxSurge、maxUnavailable、liveness、readiness等參數的使用
● 在滾動更新過程當中，還有留有一個問題。好比在一個大型的系統中，某個業務的pod數不少（100個），執行一次滾動更新時，勢必會形成pod版本不一致（有些pod是老版本，有些pod是新版本），用戶訪問頗有可能會形成屢次結果不一致的現象，直至版本更新完畢。關於這個問題待以後慢慢討論

至此，本文結束在下才疏學淺，有撒湯漏水的，請各位不吝賜教...

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。