Liveness和Readiness兩種Health Check手段在Kubernetes中的使用

一.概述

強大的自愈能力是Kubernetes這一類容器編排管理引擎的一個重要特性。一般狀況下,Kubernetes經過重啓發生故障的容器來實現自愈。除此以外,咱們還有其餘方式來實現基於Kubernetes編排的容器的健康檢查嗎?Liveness和Readiness就是不錯的選擇。git

二.實踐步驟

2.1 系統默認的健康檢查。github

apiVersion: v1
kind: Pod
metadata:
  labels:
    test: healthcheck
  name: healthcheck
spec:
  restartPolicy: OnFailure
  containers:
  - name: healthcheck
    image: busybox
    args:
    - /bin/sh
    - -c
    - sleep 10;exit 1

建立一個內容如上所述的yaml文件,命名爲HealthCheck.yaml,apply:web

[root@k8s-m health-check]# kubectl apply -f HealthCheck.yaml
pod/healthcheck created
[root@k8s-m health-check]# kubectl get pod
NAME          READY   STATUS             RESTARTS   AGE
healthcheck   0/1     CrashLoopBackOff   3          4m52s

咱們能夠看到,這個pod並未正常運行,重啓了3次。具體的重啓日誌咱們能夠經過describe命令來查看,此處再也不贅述。咱們來執行一下如下命令:api

[root@k8s-m health-check]# sh -c "sleep 2;exit 1"
[root@k8s-m health-check]# echo $?
1

咱們能夠看到,命令正常執行,返回值爲1。默認狀況下,Linux命令執行以後返回值爲0說明命令執行成功。由於執行成功後的返回值不爲0,Kubernetes默認爲容器發生故障,不斷重啓。然而,也有很多狀況是服務實際發生了故障,可是進程未退出。這種狀況下,重啓每每是簡單而有效的手段。例如:訪問web服務時顯示500服務器內部錯誤,不少緣由會形成這樣的故障,重啓可能就能迅速修復故障。bash

2.2 在Kubernetes中,能夠經過Liveness探測告訴kebernetes何時實現重啓自愈。服務器

apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness
  name: liveness
spec:
  restartPolicy: OnFailure
  containers:
  - name: liveness
    image: busybox
    args:
    - /bin/sh
    - -c
    - touch /tmp/healthcheck;sleep 30; rm -rf /tmp/healthcheck;sleep 600
    livenessProbe:
      exec:
        command:
        - cat
        - /tmp/healthcheck
      initialDelaySeconds: 10
      periodSeconds: 5

建立名爲Liveness.yaml的文件,建立Pod:app

[root@k8s-m health-check]# kubectl apply -f Liveness.yaml
pod/liveness created
[root@k8s-m health-check]# kubectl get pod
NAME       READY   STATUS    RESTARTS   AGE
liveness   1/1     Running   1          5m50s

從yaml文件中,咱們能夠看出,容器啓動後建立/tmp/healthcheck文件,30s後刪除,刪除後sleep該進程600s。經過cat /tmp/healthcheck來探測容器是否發生故障。若是該文件存在,則說明容器正常,該文件不存在,則殺該容器並重啓。負載均衡

initialDelaySeconds:10指定容器啓動10s以後執行探測。通常該值要大於容器的啓動時間。periodSeconds:5表示每5s執行一次探測,若是連續三次執行Liveness探測均失敗,那麼會殺死該容器並重啓。tcp

2.3 Readiness則能夠告訴Kubenentes何時能夠將容器加入到Service的負載均衡池中,對外提供服務。ide

apiVersion: v1
kind: Pod
metadata:
  labels:
    test: readiness
  name: readiness
spec:
  restartPolicy: OnFailure
  containers:
  - name: readiness
    image: busybox
    args:
    - /bin/sh
    - -c
    - touch /tmp/healthcheck;sleep 30; rm -rf /tmp/healthcheck;sleep 600
    readinessProbe:
      exec:
        command:
        - cat
        - /tmp/healthcheck
      initialDelaySeconds: 10
      periodSeconds: 5

apply該文件:

[root@k8s-m health-check]# kubectl apply -f Readiness.yaml
pod/readiness created
[root@k8s-m health-check]# kubectl get pod
NAME        READY   STATUS    RESTARTS   AGE
readiness   0/1     Running   0          84s
[root@k8s-m health-check]# kubectl get pod
NAME        READY   STATUS      RESTARTS   AGE
readiness   0/1     Completed   0          23m

從yaml文件中咱們能夠看出,Readiness和Liveness兩種探測的配置基本是同樣的,只需稍加改動就能夠套用。經過kubectl get pod咱們發現這兩種Health Check主要不一樣在於輸出的第二列和第三列。Readiness第三列一直都是running,第二列一段時間後由1/1變爲0/1。當第二列爲0/1時,則說明容器不可用。具體能夠經過如下命令來查看一下:

[root@k8s-m health-check]# while true;do kubectl describe pod readiness;done

Liveness和Readiness是兩種Health Check機制,不互相依賴,能夠同時使用。

三.拓展

3.1 Health Check在Scale Up中的應用。

apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: web
spec:
  replicas: 3
  template:
    metadata:
      labels:
        run: web
    spec:
      containers:
      - name: web
        image: httpd
        ports:
        - containerPort: 8080
        readinessProbe:
          httpGet:
            scheme: HTTP
            path: /health-check
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
  name: web-svc
spec:
  selector:
    run: web
  ports:
  - protocol: TCP
    port: 8080
    targetPort: 80

經過以上yaml,建立了一個名爲web-svc的服務和名爲web的Deployment。

[root@k8s-m health-check]# kubectl apply -f HealthCheck-web-deployment.yaml
deployment.apps/web unchanged
service/web-svc created
[root@k8s-m health-check]# kubectl get service web-svc
NAME      TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)    AGE
web-svc   ClusterIP   10.101.1.6   <none>        8080/TCP   2m20s
[root@k8s-m health-check]# kubectl get deployment web
NAME   DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
web    3         3         3            0           3m26s
[root@k8s-m health-check]# kubectl get pod
NAME                   READY   STATUS    RESTARTS   AGE
web-7d96585f7f-q5p4d   0/1     Running   0          3m35s
web-7d96585f7f-w6tqx   0/1     Running   0          3m35s
web-7d96585f7f-xrqwm   0/1     Running   0          3m35s

重點關注一下17-23行,第17行指出本案例中使用的Health Check機制爲Readiness,探測方法爲httpGet。Kubernetes對於該方法探測成功的判斷條件時http請求返回值在200-400之間。schema指定了協議,能夠爲http(默認)和https。path指定訪問路徑,port指定端口。

容器啓動10s後開始探測,若是 http://container_ip:8080/health-check 的返回值不是200-400,表示容器沒有準備就緒,不接收Service web-svc的請求。/health-check則是咱們實現探測的代碼。探測結果示例以下:

[root@k8s-m health-check]# kubectl describe pod web
Warning  Unhealthy  57s (x219 over 19m)  kubelet, k8s-n2    Readiness probe failed: Get http://10.244.2.61:8080/healthy: dial tcp 10.244.2.61:8080: connect: connection refused

3.2 Health Check在滾動更新(Rolling Update)中的應用。

默認狀況下,在Rolling Update過程當中,Kubernetes會認爲容器已經準備就緒,進而會逐步替換舊副本。若是新版本的容器出現故障,那麼在版本更新完成以後可能致使整個應用沒法處理請求,沒法對外提供服務。此類事件若發生在生產環境中,後果會很是嚴重。正確配置了Health Check,只有經過了Readiness探測的新副本才能添加到Service,若是沒有經過探測,現有副本就不會唄替換,業務依然正常運行。

接下來,咱們分別建立yaml文件app.v1.yaml和app.v2.yaml:

apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: app
spec:
  replicas: 8
  template:
    metadata:
      labels:
        run: app
    spec:
      containers:
      - name: app
        image: busybox
        args:
        - /bin/sh
        - -c
        - sleep 10;touch /tmp/health-check;sleep 30000
        readinessProbe:
          exec:
            command:
            - cat
            - /tmp/health-check
          initialDelaySeconds: 10
          periodSeconds: 5
apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: app
spec:
  replicas: 8
  template:
    metadata:
      labels:
        run: app
    spec:
      containers:
      - name: app
        image: busybox
        args:
        - /bin/sh
        - -c
        - sleep 3000
        readinessProbe:
          exec:
            command:
            - cat
            - /tmp/health-check
          initialDelaySeconds: 10
          periodSeconds: 5

apply文件app.v1.yaml:

[root@k8s-m health-check]# kubectl apply -f app.v1.yaml --record
deployment.apps/app created
[root@k8s-m health-check]# kubectl get pod
NAME                  READY   STATUS    RESTARTS   AGE
app-844b9b5bf-9nnrb   1/1     Running   0          2m52s
app-844b9b5bf-b8tw2   1/1     Running   0          2m52s
app-844b9b5bf-j2n9c   1/1     Running   0          2m52s
app-844b9b5bf-ml8c5   1/1     Running   0          2m52s
app-844b9b5bf-mtgr9   1/1     Running   0          2m52s
app-844b9b5bf-n4dn8   1/1     Running   0          2m52s
app-844b9b5bf-ppzh6   1/1     Running   0          2m52s
app-844b9b5bf-z55d4   1/1     Running   0          2m52s

更新到app.v2.yaml:

[root@k8s-m health-check]# kubectl apply -f app.v2.yaml --record
deployment.apps/app configured
[root@k8s-m health-check]# kubectl get pod
NAME                  READY   STATUS              RESTARTS   AGE
app-844b9b5bf-9nnrb   1/1     Running             0          3m30s
app-844b9b5bf-b8tw2   1/1     Running             0          3m30s
app-844b9b5bf-j2n9c   1/1     Running             0          3m30s
app-844b9b5bf-ml8c5   1/1     Terminating         0          3m30s
app-844b9b5bf-mtgr9   1/1     Running             0          3m30s
app-844b9b5bf-n4dn8   1/1     Running             0          3m30s
app-844b9b5bf-ppzh6   1/1     Terminating         0          3m30s
app-844b9b5bf-z55d4   1/1     Running             0          3m30s
app-cd49b84-bxvtc     0/1     ContainerCreating   0          6s
app-cd49b84-gkkj8     0/1     ContainerCreating   0          6s
app-cd49b84-jfzcm     0/1     ContainerCreating   0          6s
app-cd49b84-xl8ws     0/1     ContainerCreating   0          6s

稍後再觀察:

[root@k8s-m health-check]# kubectl get pod
NAME                  READY   STATUS    RESTARTS   AGE
app-844b9b5bf-9nnrb   1/1     Running   0          4m59s
app-844b9b5bf-b8tw2   1/1     Running   0          4m59s
app-844b9b5bf-j2n9c   1/1     Running   0          4m59s
app-844b9b5bf-mtgr9   1/1     Running   0          4m59s
app-844b9b5bf-n4dn8   1/1     Running   0          4m59s
app-844b9b5bf-z55d4   1/1     Running   0          4m59s
app-cd49b84-bxvtc     0/1     Running   0          95s
app-cd49b84-gkkj8     0/1     Running   0          95s
app-cd49b84-jfzcm     0/1     Running   0          95s
app-cd49b84-xl8ws     0/1     Running   0          95s

此刻狀態所有爲running,可是依然有4個Pod的READY爲0/1,再看一下:

[root@k8s-m health-check]# kubectl get deployment app
NAME   DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
app    8         10        4            6           7m20s

DESIRED表示期待的副本數爲8,CURRENT表示當前副本數爲10,UP-TO-DATE表示升級了的副本數爲4,AVAILABLE表示可用的副本數爲6。若是不進行更改,該狀態將一直保持下去。在此,須要注意的是,Rolling Update中刪除了2箇舊副本,建立建了4個新副本。這裏留到最後再討論。

版本回滾到v1:

[root@k8s-m health-check]# kubectl rollout history deployment app
deployment.extensions/app
REVISION  CHANGE-CAUSE
1         kubectl apply --filename=app.v1.yaml --record=true
2         kubectl apply --filename=app.v2.yaml --record=true

[root@k8s-m health-check]# kubectl rollout undo deployment app --to-revision=1
deployment.extensions/app
[root@k8s-m health-check]# kubectl get pod
NAME                  READY   STATUS    RESTARTS   AGE
app-844b9b5bf-8qqhk   1/1     Running   0          2m37s
app-844b9b5bf-9nnrb   1/1     Running   0          18m
app-844b9b5bf-b8tw2   1/1     Running   0          18m
app-844b9b5bf-j2n9c   1/1     Running   0          18m
app-844b9b5bf-mtgr9   1/1     Running   0          18m
app-844b9b5bf-n4dn8   1/1     Running   0          18m
app-844b9b5bf-pqpm5   1/1     Running   0          2m37s
app-844b9b5bf-z55d4   1/1     Running   0          18m

四.總結

4.1 Liveness和Readiness是Kubernetes中兩種不一樣的Health Check方式,他們很是相似,但又有區別。能夠二者同時使用,也能夠單獨使用。具體差別在上文已經說起。

4.2 在上一篇關於Rolling Update的文章中,我曾經提到滾動更新過程當中的替換規則。在本文中咱們依然使用了默認方式進行更新。maxSurge和maxUnavailable兩個參數決定了更新過程當中各個狀態下的副本個數,這兩個參數的默認值都是25%。更新後,總副本數=8+8*0.25=10;可用副本數:8-8*0.25=6。此過程當中,銷燬了2個副本,建立了4個新副本。

4.3 在通常生產環境上線時,儘可能使用Health Check來確保業務不受影響。這個過程的實現手段多樣化,須要根據實際狀況進行總結和選用。

五.相關資料

5.1 官方文檔:關於Liveness和Readiness

5.2 官方文檔:關於maxSurge和maxUnavailable

5.3 文中涉及到的代碼

相關文章
相關標籤/搜索