Kubernetes 配置 Taint 和 Toleration（污點和容忍）

時間 2019-12-05

標籤 kubernetes 配置 taint toleration 污點容忍简体版

原文原文鏈接

經過污點和容忍讓pod運行在特定節點上node

參考官網：https://k8smeetup.github.io/docs/concepts/configuration/taint-and-toleration/nginx

一、taint 排斥效果

taint的effect定義對Pod排斥效果：git

NoSchedule：僅影響調度過程，對現存的Pod對象不產生影響；但容忍的pod同時也可以被分配到集羣中的其它節點
NoExecute：既影響調度過程，也影響如今的Pod對象；不容忍的Pod對象將被驅逐
PreferNoSchedule：NoSchedule的柔性版本，最好別調度過來，實在沒地方運行調過來也行

二、添加污點

給 worker1，worker2，worker3 三個節點添加污點github

kubectl taint node rancher-k8s-worker1 item-name=assistant:NoExecute
kubectl taint node rancher-k8s-worker2 item-name=sca:NoExecute
kubectl taint node rancher-k8s-worker3 item-name=kuiyuan:NoExecute

說明：api

1）給 worker1 節點設置 key 爲 item-name，value 爲 assistant 的 taint（污點），只要擁有和這個 taint 相匹配的 toleration（容忍）的 pod 纔可以被分配到 worker1 這個節點上。worker2 和 worker3 同理。網絡

2）taint 的 effect 值 NoExecute ，它會影響已經在節點上運行的 pod ：app

若是 pod 不能忍受effect 值爲 NoExecute 的 taint，那麼 pod 將立刻被驅逐
若是 pod 可以忍受effect 值爲 NoExecute 的 taint，可是在 toleration 定義中沒有指定 tolerationSeconds，則 pod 還會一直在這個節點上運行。
若是 pod 可以忍受effect 值爲 NoExecute 的 taint，並且指定了 tolerationSeconds，則 pod 還能在這個節點上繼續運行這個指定的時間長度。

附：刪除污點命令ide

kubectl taint node rancher-k8s-worker1 item-name-
kubectl taint node rancher-k8s-worker2 item-name-
kubectl taint node rancher-k8s-worker3 item-name-

三、pod添加容忍

分別在三個節點上運行對應容忍的pod。ui

1）pod 定義 toleration，匹配 key 爲 item-name，value 爲 assistant 的 taintspa

cat > nginx-assistant.yaml <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: nginx-assistant
  labels:
    app: nginx-assistant
spec:
  containers:
  - name: nginx-assistant
    image: nginx
    resources:
      limits:
        cpu: 30m
        memory: 20Mi
      requests:
        cpu: 20m
        memory: 10Mi
  tolerations:
  - key: item-name
    value: assistant operator: Equal
    effect: NoExecute
EOF

2）pod 定義 toleration，匹配 key 爲 item-name，value 爲 sca 的 taint

cat > nginx-sca.yaml <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: nginx-sca
  labels:
    app: nginx-sca
spec:
  containers:
  - name: nginx-sca
    image: nginx
    resources:
      limits:
        cpu: 30m
        memory: 20Mi
      requests:
        cpu: 20m
        memory: 10Mi
  tolerations:
  - key: item-name
    value: sca operator: Equal
    effect: NoExecute
EOF

3）pod 定義 toleration，匹配 key 爲 item-name，value 爲 kuiyuan 的 taint

cat > nginx-kuiyuan.yaml <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: nginx-kuiyuan
  labels:
    app: nginx-kuiyuan
spec:
  containers:
  - name: nginx-kuiyuan
    image: nginx
    resources:
      limits:
        cpu: 30m
        memory: 20Mi
      requests:
        cpu: 20m
        memory: 10Mi
  tolerations:
  - key: item-name
    value: kuiyuan operator: Equal
    effect: NoExecute
EOF

#建立pod

kubectl apply -f ./

四、查看 pod 運行主機

kubectl get pod -o wide

NAME                    READY   STATUS    RESTARTS   AGE    IP            NODE                  NOMINATED NODE   READINESS GATES
nginx-79748b4cb-25cqr   1/1     Running   0          111s   10.42.10.25   rancher-k8s-worker4   <none>           <none>
nginx-79748b4cb-tnknc   1/1     Running   0          107s   10.42.10.26   rancher-k8s-worker4   <none>           <none>
nginx-79748b4cb-xpx76   1/1     Running   0          101s   10.42.10.27   rancher-k8s-worker4   <none>           <none>
nginx-assistant         1/1     Running   0          33m    10.42.4.246   rancher-k8s-worker1   <none>           <none>
nginx-kuiyuan           1/1     Running   0          33m    10.42.5.239   rancher-k8s-worker3   <none>           <none>
nginx-sca               1/1     Running   0          33m    10.42.3.203   rancher-k8s-worker2   <none>           <none>

能夠看到三個pod都運行到了對應的節點上，而未定義容忍度的 nginx 的三個pod都被驅逐到了worker4上。（若是沒有匹配到對應的污點，則會調度到未配置污點的節點上）

五、基於 taint 的驅逐（alpha 特性）

這是在每一個 pod 中配置的在節點出現問題時的驅逐行爲。

1）當某種條件爲真時，node controller會自動給節點添加一個 taint。

當前內置的 taint 包括：

node.kubernetes.io/not-ready：節點未準備好。這至關於節點狀態 Ready 的值爲 「False「。
node.alpha.kubernetes.io/unreachable：node controller 訪問不到節點. 這至關於節點狀態 Ready 的值爲 「Unknown「。
node.kubernetes.io/out-of-disk：節點磁盤耗盡。
node.kubernetes.io/memory-pressure：節點存在內存壓力。
node.kubernetes.io/disk-pressure：節點存在磁盤壓力。
node.kubernetes.io/network-unavailable：節點網絡不可用。
node.cloudprovider.kubernetes.io/uninitialized：若是 kubelet 啓動時指定了一個 「外部」 cloud provider，它將給當前節點添加一個 taint 將其標誌爲不可用。在 cloud-controller-manager 的一個 controller 初始化這個節點後，kubelet 將刪除這個 taint。

在啓用了 TaintBasedEvictions 這個 alpha 功能特性後，NodeController 會自動給節點添加這類 taint，上述基於節點狀態 Ready 對 pod 進行驅逐的邏輯會被禁用。

注意：爲了保證因爲節點問題引發的 pod 驅逐rate limiting行爲正常，系統實際上會以 rate-limited 的方式添加 taint。在像 master 和 node 通信中斷等場景下，這避免了 pod 被大量驅逐。使用這個 alpha 功能特性，結合 tolerationSeconds ，pod 就能夠指定當節點出現一個或所有上述問題時還將在這個節點上運行多長的時間。

好比：能夠查看以前建立的 nginx-assistant 的 tolerations：

kubectl describe nginx-assistant

Tolerations:     item-name=assistant:NoExecute
 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s

除了咱們定義的容忍匹配的 taint 外，還默認匹配了 not-ready，unreachable 這兩個 taint，而且指定 tolerationSeconds 爲 5 分鐘。這種自動添加 toleration 機制保證了在其中一種問題被檢測到時 pod 默認可以繼續停留在當前節點運行 5 分鐘；這兩個默認 toleration 是由 DefaultTolerationSeconds admission controller添加的。

另外：咱們能夠指定這個時間，在網絡斷開時，仍然但願停留在當前節點上運行一段較長的時間，願意等待網絡恢復以免被驅逐。在這種狀況下，pod 的 toleration 多是下面這樣的：

tolerations:
- key: "node.alpha.kubernetes.io/unreachable"
  operator: "Exists" effect: "NoExecute" tolerationSeconds: 6000

2）DaemonSet 中的 pod 被建立時，針對 taint 自動添加的 NoExecute 的 toleration 將不會指定 tolerationSeconds。

好比：系統pod（canal，dns等）不會指定 tolerationSeconds

kubectl describe pod/canal-75pct -n kube-system

Tolerations:     :NoSchedule
                 :NoExecute
                 CriticalAddonsOnly
                 node.kubernetes.io/disk-pressure:NoSchedule
                 node.kubernetes.io/memory-pressure:NoSchedule
                 node.kubernetes.io/network-unavailable:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute
                 node.kubernetes.io/unreachable:NoExecute
                 node.kubernetes.io/unschedulable:NoSchedule

這保證了出現上述問題時 DaemonSet 中的 pod 永遠不會被驅逐，這和 TaintBasedEvictions 這個特性被禁用後的行爲是同樣的。

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。