前文咱們瞭解了k8s上的NetworkPolicy資源的使用和工做邏輯,回顧請參考:http://www.javashuo.com/article/p-aukervvv-nz.html;今天咱們來聊一聊Pod調度策略相關話題;html
在k8s上有一個很是重要的組件kube-scheduler,它主要做用是監聽apiserver上的pod資源中的nodename字段是否爲空,若是該字段爲空就表示對應pod尚未被調度,此時kube-scheduler就會從k8s衆多節點中,根據pod資源的定義相關屬性,從衆多節點中挑選一個最佳運行pod的節點,並把對應主機名稱填充到對應pod的nodename字段,而後把pod定義資源存回apiserver;此時apiserver就會根據pod資源上的nodename字段中的主機名,通知對應節點上的kubelet組件來讀取對應pod資源定義,kubelet從apiserver讀取對應pod資源定義清單,根據資源清單中定義的屬性,調用本地docker把對應pod運行起來;而後把pod狀態反饋給apiserver,由apiserver把對應pod的狀態信息存回etcd中;整個過程,kube-scheduler主要做用是調度pod,並把調度信息反饋給apiserver,那麼問題來了,kube-scheduler它是怎麼評判衆多節點哪一個節點最適合運行對應pod的呢?node
在k8s上調度器的工做邏輯是根據調度算法來實現對應pod的調度的;不一樣的調度算法,調度結果也有所不一樣,其評判的標準也有所不一樣,當調度器發現apiserver上有未被調度的pod時,它會把k8s上全部節點信息,挨個套進對應的預選策略函數中進行篩選,把不符合運行pod的節點淘汰掉,咱們把這個過程叫作調度器的預選階段(Predicate);剩下符合運行pod的節點會進入下一個階段優選(Priority),所謂優選是在這些符合運行pod的節點中根據各個優選函數的評分,最後把每一個節點經過各個優選函數評分加起來,選擇一個最高分,這個最高分對應的節點就是調度器最後調度結果,若是最高分有多個節點,此時調度器會從最高分相同的幾個節點隨機挑選一個節點看成最後運行pod的節點;咱們把這個這個過程叫作pod選定過程(select);簡單講調度器的調度過程會經過三個階段,第一階段是預選階段,此階段主要是篩選不符合運行pod節點,並將這些節點淘汰掉;第二階段是優選,此階段是經過各個優選函數對節點評分,篩選出得分最高的節點;第三階段是節點選定,此階段是從多個高分節點中隨機挑選一個做爲最終運行pod的節點;大概過程以下圖所示linux
提示:預選過程是一票否決機制,只要其中一個預選函數不經過,對應節點則直接被淘汰;剩下經過預選的節點會進入優選階段,此階段每一個節點會經過對應的優選函數來對各個節點評分,並計算每一個節點的總分;最後調度器會根據每一個節點的最後總分來挑選一個最高分的節點,做爲最終調度結果;若是最高分有多個節點,此時調度器會從對應節點集合中隨機挑選一個做爲最後調度結果,並把最後調度結果反饋給apiserver;nginx
影響調度的因素redis
NodeName:nodename是最直接影響pod調度的方式,咱們知道調度器評判pod是否被調度,就是根據nodename字段是否爲空來進行判斷,若是對應pod資源清單中,用戶明肯定義了nodename字段,則表示不使用調度器調度,此時調度器也不會調度此類pod資源,緣由是對應nodename非空,調度器認爲該pod是已經調度過了;這種方式是用戶手動將pod綁定至某個節點的方式;算法
NodeSelector:nodeselector相比nodename,這種方式要寬鬆一些,它也是影響調度器調度的一個重要因素,咱們在定義pod資源時,若是指定了nodeselector,就表示只有符合對應node標籤選擇器定義的標籤的node才能運行對應pod;若是沒有節點知足節點選擇器,對應pod就只能處於pending狀態;docker
Node Affinity:node affinity是用來定義pod對節點的親和性,所謂pod對節點的親和性是指,pod更願意或更不肯意運行在那些節點;這種方式相比前面的nodename和nodeselector在調度邏輯上要精細一些;api
Pod Affinity:pod affinity是用來定義pod與pod間的親和性,所謂pod與pod的親和性是指,pod更願意和那個或那些pod在一塊兒;與之相反的也有pod更不肯意和那個或那些pod在一塊兒,這種咱們叫作pod anti affinity,即pod與pod間的反親和性;所謂在一塊兒是指和對應pod在同一個位置,這個位置能夠是按主機名劃分,也能夠按照區域劃分,這樣一來咱們要定義pod和pod在一塊兒或不在一塊兒,定義位置就顯得尤其重要,也是評判對應pod可以運行在哪裏標準;bash
taint和tolerations:taint是節點上的污點,tolerations是對應pod對節點上的污點的容忍度,即pod可以容忍節點的污點,那麼對應pod就可以運行在對應節點,反之Pod就不能運行在對應節點;這種方式是結合節點的污點,以及pod對節點污點的容忍度來調度的;app
示例:使用nodename調度策略
[root@master01 ~]# cat pod-demo.yaml apiVersion: v1 kind: Pod metadata: name: nginx-pod spec: nodeName: node01.k8s.org containers: - name: nginx image: nginx:1.14-alpine imagePullPolicy: IfNotPresent ports: - name: http containerPort: 80 [root@master01 ~]#
提示:nodename能夠直接指定對應pod運行在那個節點上,無需默認調度器調度;以上資源表示把nginx-pod運行在node01.k8s.org這個節點上;
應用清單
[root@master01 ~]# kubectl apply -f pod-demo.yaml pod/nginx-pod created [root@master01 ~]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-pod 1/1 Running 0 10s 10.244.1.28 node01.k8s.org <none> <none> [root@master01 ~]#
提示:能夠看到對應pod必定運行在咱們手動指定的節點上;
示例:使用nodeselector調度策略
[root@master01 ~]# cat pod-demo-nodeselector.yaml apiVersion: v1 kind: Pod metadata: name: nginx-pod-nodeselector spec: nodeSelector: disktype: ssd containers: - name: nginx image: nginx:1.14-alpine imagePullPolicy: IfNotPresent ports: - name: http containerPort: 80 [root@master01 ~]#
提示:nodeselector使用來定義對對應node的標籤進行匹配,若是對應節點有此對應標籤,則對應pod就能被調度到對應節點運行,反之則不能被調度到對應節點運行;若是全部節點都不知足,此時pod會處於pending狀態,直到有對應節點擁有對應標籤時,pod纔會被調度到對應節點運行;
應用清單
[root@master01 ~]# kubectl apply -f pod-demo-nodeselector.yaml pod/nginx-pod-nodeselector created [root@master01 ~]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-pod 1/1 Running 0 9m38s 10.244.1.28 node01.k8s.org <none> <none> nginx-pod-nodeselector 0/1 Pending 0 16s <none> <none> <none> <none> [root@master01 ~]#
提示:能夠看到對應pod的狀態一直處於pending狀態,其緣由是對應k8s節點沒有一個節點知足對應節點選擇器標籤;
驗證:給node02打上對應標籤,看看對應pod是否會被調度到node02上呢?
[root@master01 ~]# kubectl get nodes --show-labels NAME STATUS ROLES AGE VERSION LABELS master01.k8s.org Ready control-plane,master 29d v1.20.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=master01.k8s.org,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master= node01.k8s.org Ready <none> 29d v1.20.0 app=nginx-1.14-alpine,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node01.k8s.org,kubernetes.io/os=linux node02.k8s.org Ready <none> 29d v1.20.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node02.k8s.org,kubernetes.io/os=linux node03.k8s.org Ready <none> 29d v1.20.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node03.k8s.org,kubernetes.io/os=linux node04.k8s.org Ready <none> 19d v1.20.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node04.k8s.org,kubernetes.io/os=linux [root@master01 ~]# kubectl label node node02.k8s.org disktype=ssd node/node02.k8s.org labeled [root@master01 ~]# kubectl get nodes --show-labels NAME STATUS ROLES AGE VERSION LABELS master01.k8s.org Ready control-plane,master 29d v1.20.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=master01.k8s.org,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master= node01.k8s.org Ready <none> 29d v1.20.0 app=nginx-1.14-alpine,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node01.k8s.org,kubernetes.io/os=linux node02.k8s.org Ready <none> 29d v1.20.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disktype=ssd,kubernetes.io/arch=amd64,kubernetes.io/hostname=node02.k8s.org,kubernetes.io/os=linux node03.k8s.org Ready <none> 29d v1.20.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node03.k8s.org,kubernetes.io/os=linux node04.k8s.org Ready <none> 19d v1.20.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node04.k8s.org,kubernetes.io/os=linux [root@master01 ~]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-pod 1/1 Running 0 12m 10.244.1.28 node01.k8s.org <none> <none> nginx-pod-nodeselector 1/1 Running 0 3m26s 10.244.2.18 node02.k8s.org <none> <none> [root@master01 ~]#
提示:能夠看到給node02節點打上disktype=ssd標籤之後,對應pod就被調度在node02上運行;
示例:使用affinity中的nodeaffinity調度策略
[root@master01 ~]# cat pod-demo-affinity-nodeaffinity.yaml apiVersion: v1 kind: Pod metadata: name: nginx-pod-nodeaffinity spec: containers: - name: nginx image: nginx:1.14-alpine imagePullPolicy: IfNotPresent ports: - name: http containerPort: 80 affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: foo operator: Exists values: [] - matchExpressions: - key: disktype operator: Exists values: [] preferredDuringSchedulingIgnoredDuringExecution: - weight: 10 preference: matchExpressions: - key: foo operator: Exists values: [] - weight: 2 preference: matchExpressions: - key: disktype operator: Exists values: [] [root@master01 ~]#
提示:對於nodeaffinity來講,它有兩種限制,一種是硬限制,用requiredDuringSchedulingIgnoredDuringExecution字段來定義,該字段爲一個對象,其裏面只有nodeSelectorTerms一個字段能夠定義,該字段爲一個列表對象,可使用matchExpressions字段來定義匹配對應節點標籤的表達式(其中對應表達式中可使用的操做符有In、NotIn、Exists、DoesNotExists、Lt、Gt;Lt和Gt用於字符串比較,Exists和DoesNotExists用來判斷對應標籤key是否存在,In和NotIn用來判斷對應標籤的值是否在某個集合中),也可使用matchFields字段來定義對應匹配節點字段;所謂硬限制是指必須知足對應定義的節點標籤選擇表達式或節點字段選擇器,對應pod纔可以被調度在對應節點上運行,不然對應pod不能被調度到節點上運行,若是沒有知足對應的節點標籤表達式或節點字段選擇器,則對應pod會一直被掛起;第二種是軟限制,用preferredDuringSchedulingIgnoredDuringExecution字段定義,該字段爲一個列表對象,裏面能夠用weight來定義對應軟限制的權重,該權重會被調度器在最後計算node得分時加入到對應節點總分中;preference字段是用來定義對應軟限制匹配條件;即知足對應軟限制的節點在調度時會被調度器把對應權重加入對應節點總分;對於軟限制來講,只有當硬限制匹配有多個node時,對應軟限制纔會生效;即軟限制是在硬限制的基礎上作的第二次限制,它表示在硬限制匹配多個node,優先使用軟限制中匹配的node,若是軟限制中給定的權重和匹配條件不能讓多個node決勝出最高分,即便用默認調度調度機制,從多個最高分node中隨機挑選一個node做爲最後調度結果;若是在軟限制中給定權重和對應匹配條件可以決勝出對應node最高分,則對應node就爲最後調度結果;簡單講軟限制和硬限制一塊兒使用,軟限制是輔助硬限制對node進行挑選;若是隻是單純的使用軟限制,則優先把pod調度到權重較高對應條件匹配的節點上;若是權重同樣,則調度器會根據默認規則從最後得分中挑選一個最高分,做爲最後調度結果;以上示例表示運行pod的硬限制必須是對應節點上知足有key爲foo的節點標籤或者key爲disktype的節點標籤;若是對應硬限制沒有匹配到任何節點,則對應pod不作任何調度,即處於pending狀態,若是對應硬限制都匹配,則在軟限制中匹配key爲foo的節點將在總分中加上10,對key爲disktype的節點總分加2分;即軟限制中,pod更傾向key爲foo的節點標籤的node上;這裏須要注意的是nodeAffinity沒有node anti Affinity,要想實現反親和性可使用NotIn或者DoesNotExists操做符來匹配對應條件;
應用資源清單
[root@master01 ~]# kubectl get nodes -L foo,disktype NAME STATUS ROLES AGE VERSION FOO DISKTYPE master01.k8s.org Ready control-plane,master 29d v1.20.0 node01.k8s.org Ready <none> 29d v1.20.0 node02.k8s.org Ready <none> 29d v1.20.0 ssd node03.k8s.org Ready <none> 29d v1.20.0 node04.k8s.org Ready <none> 19d v1.20.0 [root@master01 ~]# kubectl apply -f pod-demo-affinity-nodeaffinity.yaml pod/nginx-pod-nodeaffinity created [root@master01 ~]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-pod 1/1 Running 0 122m 10.244.1.28 node01.k8s.org <none> <none> nginx-pod-nodeaffinity 1/1 Running 0 7s 10.244.2.22 node02.k8s.org <none> <none> nginx-pod-nodeselector 1/1 Running 0 113m 10.244.2.18 node02.k8s.org <none> <none> [root@master01 ~]#
提示:能夠看到應用清單之後對應pod被調度到node02上運行了,之因此調度到node02是由於對應節點上有key爲disktype的節點標籤,該條件知足對應運行pod的硬限制;
驗證:刪除pod和對應node02上的key爲disktype的節點標籤,再次應用資源清單,看看對應pod怎麼調度?
[root@master01 ~]# kubectl delete -f pod-demo-affinity-nodeaffinity.yaml pod "nginx-pod-nodeaffinity" deleted [root@master01 ~]# kubectl label node node02.k8s.org disktype- node/node02.k8s.org labeled [root@master01 ~]# kubectl get pods NAME READY STATUS RESTARTS AGE nginx-pod 1/1 Running 0 127m nginx-pod-nodeselector 1/1 Running 0 118m [root@master01 ~]# kubectl get node -L foo,disktype NAME STATUS ROLES AGE VERSION FOO DISKTYPE master01.k8s.org Ready control-plane,master 29d v1.20.0 node01.k8s.org Ready <none> 29d v1.20.0 node02.k8s.org Ready <none> 29d v1.20.0 node03.k8s.org Ready <none> 29d v1.20.0 node04.k8s.org Ready <none> 19d v1.20.0 [root@master01 ~]# kubectl apply -f pod-demo-affinity-nodeaffinity.yaml pod/nginx-pod-nodeaffinity created [root@master01 ~]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-pod 1/1 Running 0 128m 10.244.1.28 node01.k8s.org <none> <none> nginx-pod-nodeaffinity 0/1 Pending 0 9s <none> <none> <none> <none> nginx-pod-nodeselector 1/1 Running 0 118m 10.244.2.18 node02.k8s.org <none> <none> [root@master01 ~]#
提示:能夠看到刪除原有pod和node2上面的標籤後,再次應用資源清單,pod就一直處於pending狀態;其緣由是對應k8s節點沒有知足對應pod運行時的硬限制;因此對應pod沒法進行調度;
驗證:刪除pod,分別給node01和node03打上key爲foo和key爲disktype的節點標籤,看看而後再次應用清單,看看對應pod會這麼調度?
[root@master01 ~]# kubectl delete -f pod-demo-affinity-nodeaffinity.yaml pod "nginx-pod-nodeaffinity" deleted [root@master01 ~]# kubectl label node node01.k8s.org foo=bar node/node01.k8s.org labeled [root@master01 ~]# kubectl label node node03.k8s.org disktype=ssd node/node03.k8s.org labeled [root@master01 ~]# kubectl get nodes -L foo,disktype NAME STATUS ROLES AGE VERSION FOO DISKTYPE master01.k8s.org Ready control-plane,master 29d v1.20.0 node01.k8s.org Ready <none> 29d v1.20.0 bar node02.k8s.org Ready <none> 29d v1.20.0 node03.k8s.org Ready <none> 29d v1.20.0 ssd node04.k8s.org Ready <none> 19d v1.20.0 [root@master01 ~]# kubectl apply -f pod-demo-affinity-nodeaffinity.yaml pod/nginx-pod-nodeaffinity created [root@master01 ~]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-pod 1/1 Running 0 132m 10.244.1.28 node01.k8s.org <none> <none> nginx-pod-nodeaffinity 1/1 Running 0 5s 10.244.1.29 node01.k8s.org <none> <none> nginx-pod-nodeselector 1/1 Running 0 123m 10.244.2.18 node02.k8s.org <none> <none> [root@master01 ~]#
提示:能夠看到當硬限制中的條件被多個node匹配時,優先調度對應軟限制條件匹配權重較大的節點上,即硬限制不能正常抉擇出調度節點,則軟限制中對應權重大的匹配條件有限被調度;
驗證:刪除node01上的節點標籤,看看對應pod是否會被移除,或被調度其餘節點?
[root@master01 ~]# kubectl get nodes -L foo,disktype NAME STATUS ROLES AGE VERSION FOO DISKTYPE master01.k8s.org Ready control-plane,master 29d v1.20.0 node01.k8s.org Ready <none> 29d v1.20.0 bar node02.k8s.org Ready <none> 29d v1.20.0 node03.k8s.org Ready <none> 29d v1.20.0 ssd node04.k8s.org Ready <none> 19d v1.20.0 [root@master01 ~]# kubectl label node node01.k8s.org foo- node/node01.k8s.org labeled [root@master01 ~]# kubectl get nodes -L foo,disktype NAME STATUS ROLES AGE VERSION FOO DISKTYPE master01.k8s.org Ready control-plane,master 29d v1.20.0 node01.k8s.org Ready <none> 29d v1.20.0 node02.k8s.org Ready <none> 29d v1.20.0 node03.k8s.org Ready <none> 29d v1.20.0 ssd node04.k8s.org Ready <none> 19d v1.20.0 [root@master01 ~]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-pod 1/1 Running 0 145m 10.244.1.28 node01.k8s.org <none> <none> nginx-pod-nodeaffinity 1/1 Running 0 12m 10.244.1.29 node01.k8s.org <none> <none> nginx-pod-nodeselector 1/1 Running 0 135m 10.244.2.18 node02.k8s.org <none> <none> [root@master01 ~]#
提示:能夠看到當pod正常運行之後,即使後來對應節點不知足對應pod運行的硬限制,對應pod也不會被移除或調度到其餘節點,說明節點親和性是在調度時發生做用,一旦調度完成,即使後來節點不知足pod運行節點親和性,對應pod也不會被移除或再次調度;簡單講nodeaffinity對pod調度既成事實沒法作二次調度;
node Affinity規則生效方式
一、nodeAffinity和nodeSelector一塊兒使用時,二者間關係取「與」關係,即二者條件必須同時知足,對應節點才知足調度運行或不運行對應pod;
示例:使用nodeaffinity和nodeselector定義pod調度策略
[root@master01 ~]# cat pod-demo-affinity-nodesector.yaml apiVersion: v1 kind: Pod metadata: name: nginx-pod-nodeaffinity-nodeselector spec: containers: - name: nginx image: nginx:1.14-alpine imagePullPolicy: IfNotPresent ports: - name: http containerPort: 80 affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: foo operator: Exists values: [] nodeSelector: disktype: ssd [root@master01 ~]#
提示:以上清單表示對應pod傾向運行在節點上有節點標籤key爲foo的節點而且對應節點上還有disktype=ssd節點標籤
應用清單
[root@master01 ~]# kubectl get nodes -L foo,disktype NAME STATUS ROLES AGE VERSION FOO DISKTYPE master01.k8s.org Ready control-plane,master 29d v1.20.0 node01.k8s.org Ready <none> 29d v1.20.0 node02.k8s.org Ready <none> 29d v1.20.0 node03.k8s.org Ready <none> 29d v1.20.0 ssd node04.k8s.org Ready <none> 19d v1.20.0 [root@master01 ~]# kubectl apply -f pod-demo-affinity-nodesector.yaml pod/nginx-pod-nodeaffinity-nodeselector created [root@master01 ~]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-pod 1/1 Running 0 168m 10.244.1.28 node01.k8s.org <none> <none> nginx-pod-nodeaffinity 1/1 Running 0 35m 10.244.1.29 node01.k8s.org <none> <none> nginx-pod-nodeaffinity-nodeselector 0/1 Pending 0 7s <none> <none> <none> <none> nginx-pod-nodeselector 1/1 Running 0 159m 10.244.2.18 node02.k8s.org <none> <none> [root@master01 ~]#
提示:能夠看到對應pod被建立之後,一直處於pengding狀態,緣由是沒有節點知足同時有節點標籤key爲foo而且disktype=ssd的節點,因此對應pod就沒法正常被調度,只好掛起;
二、多個nodeaffinity同時指定多個nodeSelectorTerms時,相互之間取「或」關係;即便用多個matchExpressions列表分別指定對應的匹配條件;
[root@master01 ~]# cat pod-demo-affinity2.yaml apiVersion: v1 kind: Pod metadata: name: nginx-pod-nodeaffinity2 spec: containers: - name: nginx image: nginx:1.14-alpine imagePullPolicy: IfNotPresent ports: - name: http containerPort: 80 affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: foo operator: Exists values: [] - matchExpressions: - key: disktype operator: Exists values: [] [root@master01 ~]#
提示:以上示例表示運行pod節點傾向對應節點上有節點標籤key爲foo或key爲disktype的節點;
應用清單
[root@master01 ~]# kubectl get nodes -L foo,disktype NAME STATUS ROLES AGE VERSION FOO DISKTYPE master01.k8s.org Ready control-plane,master 29d v1.20.0 node01.k8s.org Ready <none> 29d v1.20.0 node02.k8s.org Ready <none> 29d v1.20.0 node03.k8s.org Ready <none> 29d v1.20.0 ssd node04.k8s.org Ready <none> 19d v1.20.0 [root@master01 ~]# kubectl apply -f pod-demo-affinity2.yaml pod/nginx-pod-nodeaffinity2 created [root@master01 ~]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-pod 1/1 Running 0 179m 10.244.1.28 node01.k8s.org <none> <none> nginx-pod-nodeaffinity 1/1 Running 0 46m 10.244.1.29 node01.k8s.org <none> <none> nginx-pod-nodeaffinity-nodeselector 0/1 Pending 0 10m <none> <none> <none> <none> nginx-pod-nodeaffinity2 1/1 Running 0 6s 10.244.3.21 node03.k8s.org <none> <none> nginx-pod-nodeselector 1/1 Running 0 169m 10.244.2.18 node02.k8s.org <none> <none> [root@master01 ~]#
提示:能夠看到對應pod被調度node03上運行了,之因此能在node03運行是由於對應node03知足節點標籤key爲foo或key爲disktype條件;
三、同一個matchExpressions,多個條件取「與」關係;即便用多個key列表分別指定對應的匹配條件;
示例:在一個matchExpressions下指定多個條件
[root@master01 ~]# cat pod-demo-affinity3.yaml apiVersion: v1 kind: Pod metadata: name: nginx-pod-nodeaffinity3 spec: containers: - name: nginx image: nginx:1.14-alpine imagePullPolicy: IfNotPresent ports: - name: http containerPort: 80 affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: foo operator: Exists values: [] - key: disktype operator: Exists values: [] [root@master01 ~]#
提示:上述清單表示pod傾向運行在節點標籤key爲foo和節點標籤key爲disktype的節點上;
應用清單
[root@master01 ~]# kubectl get nodes -L foo,disktype NAME STATUS ROLES AGE VERSION FOO DISKTYPE master01.k8s.org Ready control-plane,master 29d v1.20.0 node01.k8s.org Ready <none> 29d v1.20.0 node02.k8s.org Ready <none> 29d v1.20.0 node03.k8s.org Ready <none> 29d v1.20.0 ssd node04.k8s.org Ready <none> 19d v1.20.0 [root@master01 ~]# kubectl apply -f pod-demo-affinity3.yaml pod/nginx-pod-nodeaffinity3 created [root@master01 ~]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-pod 1/1 Running 0 3h8m 10.244.1.28 node01.k8s.org <none> <none> nginx-pod-nodeaffinity 1/1 Running 0 56m 10.244.1.29 node01.k8s.org <none> <none> nginx-pod-nodeaffinity-nodeselector 0/1 Pending 0 20m <none> <none> <none> <none> nginx-pod-nodeaffinity2 1/1 Running 0 9m38s 10.244.3.21 node03.k8s.org <none> <none> nginx-pod-nodeaffinity3 0/1 Pending 0 7s <none> <none> <none> <none> nginx-pod-nodeselector 1/1 Running 0 179m 10.244.2.18 node02.k8s.org <none> <none> [root@master01 ~]#
提示:能夠看到對應pod建立之後,一直處於pengding狀態;緣由是沒有符合節點標籤同時知足key爲foo和key爲disktyp的節點;
pod affinity 的工做邏輯和使用方式同node affinity相似,pod affinity也有硬限制和軟限制,其邏輯和nodeaffinity同樣,即定義了硬親和,軟親和規則就是輔助硬親和規則挑選對應pod運行節點;若是硬親和不知足條件,對應pod只能掛起;若是隻是使用軟親和規則,則對應pod會優先運行在匹配軟親和規則中權重較大的節點上,若是軟親和規則也沒有節點知足,則使用默認調度規則從中挑選一個得分最高的節點運行pod;
示例:使用Affinity中的PodAffinity中的硬限制調度策略
[root@master01 ~]# cat require-podaffinity.yaml apiVersion: v1 kind: Pod metadata: name: with-pod-affinity-1 spec: affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - {key: app, operator: In, values: ["nginx"]} topologyKey: kubernetes.io/hostname containers: - name: myapp image: ikubernetes/myapp:v1 [root@master01 ~]#
提示:上述清單是podaffinity中的硬限制使用方式,其中定義podaffinity須要在spec.affinity字段中使用podAffinity字段來定義;requiredDuringSchedulingIgnoredDuringExecution字段是定義對應podAffinity的硬限制所使用的字段,該字段爲一個列表對象,其中labelSelector用來定義和對應pod在一塊兒pod的標籤選擇器;topologyKey字段是用來定義對應在一塊兒的位置以那個什麼來劃分,該位置能夠是對應節點上的一個節點標籤key;上述清單表示運行myapp這個pod的硬限制條件是必須知足對應對應節點上必須運行的有一個pod,這個pod上有一個app=nginx的標籤;即標籤爲app=nginx的pod運行在那個節點,對應myapp就運行在那個節點;若是沒有對應pod存在,則該pod也會處於pending狀態;
應用清單
[root@master01 ~]# kubectl get pods -L app -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES APP nginx-pod 1/1 Running 0 8m25s 10.244.4.25 node04.k8s.org <none> <none> nginx [root@master01 ~]# kubectl apply -f require-podaffinity.yaml pod/with-pod-affinity-1 created [root@master01 ~]# kubectl get pods -L app -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES APP nginx-pod 1/1 Running 0 8m43s 10.244.4.25 node04.k8s.org <none> <none> nginx with-pod-affinity-1 1/1 Running 0 6s 10.244.4.26 node04.k8s.org <none> <none> [root@master01 ~]#
提示:能夠看到對應pod運行在node04上了,其緣由對應節點上有一個app=nginx標籤的pod存在,知足對應podAffinity中的硬限制;
驗證:刪除上述兩個pod,而後再次應用清單,看看對應pod是否可以正常運行?
[root@master01 ~]# kubectl delete all --all pod "nginx-pod" deleted pod "with-pod-affinity-1" deleted service "kubernetes" deleted [root@master01 ~]# kubectl apply -f require-podaffinity.yaml pod/with-pod-affinity-1 created [root@master01 ~]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES with-pod-affinity-1 0/1 Pending 0 8s <none> <none> <none> <none> [root@master01 ~]#
提示:能夠看到對應pod處於pending狀態,其緣由是沒有一個節點上運行的有app=nginx pod標籤,不知足podAffinity中的硬限制;
示例:使用Affinity中的PodAffinity中的軟限制調度策略
[root@master01 ~]# cat prefernece-podaffinity.yaml apiVersion: v1 kind: Pod metadata: name: with-pod-affinity-2 spec: affinity: podAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 80 podAffinityTerm: labelSelector: matchExpressions: - {key: app, operator: In, values: ["db"]} topologyKey: rack - weight: 20 podAffinityTerm: labelSelector: matchExpressions: - {key: app, operator: In, values: ["db"]} topologyKey: zone containers: - name: myapp image: ikubernetes/myapp:v1 [root@master01 ~]#
提示:podAffinity中的軟限制須要用preferredDuringSchedulingIgnoredDuringExecution字段定義;其中weight用來定義對應軟限制條件的權重,即知足對應軟限制的node,最後得分會加上這個權重;上述清單表示以節點標籤key=rack來劃分位置,若是對應節點上運行的有對應pod標籤爲app=db的pod,則對應節點總分加80;若是以節點標籤key=zone來劃分位置,若是對應節點上運行的有pod標籤爲app=db的pod,對應節點總分加20;若是沒有知足的節點,則使用默認調度規則進行調度;
應用清單
[root@master01 ~]# kubectl get node -L rack,zone NAME STATUS ROLES AGE VERSION RACK ZONE master01.k8s.org Ready control-plane,master 30d v1.20.0 node01.k8s.org Ready <none> 30d v1.20.0 node02.k8s.org Ready <none> 30d v1.20.0 node03.k8s.org Ready <none> 30d v1.20.0 node04.k8s.org Ready <none> 20d v1.20.0 [root@master01 ~]# kubectl get pods -o wide -L app NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES APP with-pod-affinity-1 0/1 Pending 0 22m <none> <none> <none> <none> [root@master01 ~]# kubectl apply -f prefernece-podaffinity.yaml pod/with-pod-affinity-2 created [root@master01 ~]# kubectl get pods -o wide -L app NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES APP with-pod-affinity-1 0/1 Pending 0 22m <none> <none> <none> <none> with-pod-affinity-2 1/1 Running 0 6s 10.244.4.28 node04.k8s.org <none> <none> [root@master01 ~]#
提示:能夠看到對應pod正常運行起來,並調度到node04上;從上面的示例來看,對應pod的運行並無走軟限制條件進行調度,而是走默認調度法則;其緣由是對應節點沒有知足對應軟限制中的條件;
驗證:刪除pod,在node01上打上rack節點標籤,在node03上打上zone節點標籤,再次運行pod,看看對應pod會怎麼調度?
[root@master01 ~]# kubectl delete -f prefernece-podaffinity.yaml pod "with-pod-affinity-2" deleted [root@master01 ~]# kubectl label node node01.k8s.org rack=group1 node/node01.k8s.org labeled [root@master01 ~]# kubectl label node node03.k8s.org zone=group2 node/node03.k8s.org labeled [root@master01 ~]# kubectl get node -L rack,zone NAME STATUS ROLES AGE VERSION RACK ZONE master01.k8s.org Ready control-plane,master 30d v1.20.0 node01.k8s.org Ready <none> 30d v1.20.0 group1 node02.k8s.org Ready <none> 30d v1.20.0 node03.k8s.org Ready <none> 30d v1.20.0 group2 node04.k8s.org Ready <none> 20d v1.20.0 [root@master01 ~]# kubectl apply -f prefernece-podaffinity.yaml pod/with-pod-affinity-2 created [root@master01 ~]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES with-pod-affinity-1 0/1 Pending 0 27m <none> <none> <none> <none> with-pod-affinity-2 1/1 Running 0 9s 10.244.4.29 node04.k8s.org <none> <none> [root@master01 ~]#
提示:能夠看到對應pod仍是被調度到node04上運行,說明節點上的位置標籤不影響其調度結果;
驗證:刪除pod,在node01和node03上分別建立一個標籤爲app=db的pod,而後再次應用清單,看看對應pod會這麼調度?
[root@master01 ~]# kubectl apply -f prefernece-podaffinity.yaml pod/with-pod-affinity-2 created [root@master01 ~]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES with-pod-affinity-1 0/1 Pending 0 27m <none> <none> <none> <none> with-pod-affinity-2 1/1 Running 0 9s 10.244.4.29 node04.k8s.org <none> <none> [root@master01 ~]# [root@master01 ~]# kubectl delete -f prefernece-podaffinity.yaml pod "with-pod-affinity-2" deleted [root@master01 ~]# cat pod-demo.yaml apiVersion: v1 kind: Pod metadata: name: redis-pod1 labels: app: db spec: nodeSelector: rack: group1 containers: - name: redis image: redis:4-alpine imagePullPolicy: IfNotPresent ports: - name: redis containerPort: 6379 --- apiVersion: v1 kind: Pod metadata: name: redis-pod2 labels: app: db spec: nodeSelector: zone: group2 containers: - name: redis image: redis:4-alpine imagePullPolicy: IfNotPresent ports: - name: redis containerPort: 6379 [root@master01 ~]# kubectl apply -f pod-demo.yaml pod/redis-pod1 created pod/redis-pod2 created [root@master01 ~]# kubectl get pods -L app -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES APP redis-pod1 1/1 Running 0 34s 10.244.1.35 node01.k8s.org <none> <none> db redis-pod2 1/1 Running 0 34s 10.244.3.24 node03.k8s.org <none> <none> db with-pod-affinity-1 0/1 Pending 0 34m <none> <none> <none> <none> [root@master01 ~]# kubectl apply -f prefernece-podaffinity.yaml pod/with-pod-affinity-2 created [root@master01 ~]# kubectl get pods -L app -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES APP redis-pod1 1/1 Running 0 52s 10.244.1.35 node01.k8s.org <none> <none> db redis-pod2 1/1 Running 0 52s 10.244.3.24 node03.k8s.org <none> <none> db with-pod-affinity-1 0/1 Pending 0 35m <none> <none> <none> <none> with-pod-affinity-2 1/1 Running 0 9s 10.244.1.36 node01.k8s.org <none> <none> [root@master01 ~]#
提示:能夠看到對應pod運行在node01上,其緣由是對應node01上有一個pod標籤爲app=db的pod運行,知足對應軟限制條件,而且對應節點上有key爲rack的節點標籤;即知足對應權重爲80的條件,因此對應pod更傾向運行在node01上;
示例:使用Affinity中的PodAffinity中的硬限制和軟限制調度策略
[root@master01 ~]# cat require-preference-podaffinity.yaml apiVersion: v1 kind: Pod metadata: name: with-pod-affinity-3 spec: affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - {key: app, operator: In, values: ["db"]} topologyKey: kubernetes.io/hostname preferredDuringSchedulingIgnoredDuringExecution: - weight: 80 podAffinityTerm: labelSelector: matchExpressions: - {key: app, operator: In, values: ["db"]} topologyKey: rack - weight: 20 podAffinityTerm: labelSelector: matchExpressions: - {key: app, operator: In, values: ["db"]} topologyKey: zone containers: - name: myapp image: ikubernetes/myapp:v1 [root@master01 ~]#
提示:上述清單表示對應pod必須運行在對應節點上運行的有標籤爲app=db的pod,若是沒有節點知足,則對應pod只能掛起;若是知足的節點有多個,則對應知足軟限制中的要求;若是知足硬限制的同時也知足對應節點上有key爲rack的節點標籤,則對應節點總分加80,若是對應節點有key爲zone的節點標籤,則對應節點總分加20;
應用清單
[root@master01 ~]# kubectl get pods -o wide -L app NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES APP redis-pod1 1/1 Running 0 13m 10.244.1.35 node01.k8s.org <none> <none> db redis-pod2 1/1 Running 0 13m 10.244.3.24 node03.k8s.org <none> <none> db with-pod-affinity-1 0/1 Pending 0 48m <none> <none> <none> <none> with-pod-affinity-2 1/1 Running 0 13m 10.244.1.36 node01.k8s.org <none> <none> [root@master01 ~]# kubectl apply -f require-preference-podaffinity.yaml pod/with-pod-affinity-3 created [root@master01 ~]# kubectl get pods -o wide -L app NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES APP redis-pod1 1/1 Running 0 14m 10.244.1.35 node01.k8s.org <none> <none> db redis-pod2 1/1 Running 0 14m 10.244.3.24 node03.k8s.org <none> <none> db with-pod-affinity-1 0/1 Pending 0 48m <none> <none> <none> <none> with-pod-affinity-2 1/1 Running 0 13m 10.244.1.36 node01.k8s.org <none> <none> with-pod-affinity-3 1/1 Running 0 6s 10.244.1.37 node01.k8s.org <none> <none> [root@master01 ~]#
提示:能夠看到對應pod被調度到node01上運行,其緣由是對應節點知足硬限制條件的同時也知足對應權重最大的軟限制條件;
驗證:刪除上述pod,從新應用清單看看對應pod是否還會正常運行?
[root@master01 ~]# kubectl delete all --all pod "redis-pod1" deleted pod "redis-pod2" deleted pod "with-pod-affinity-1" deleted pod "with-pod-affinity-2" deleted pod "with-pod-affinity-3" deleted service "kubernetes" deleted [root@master01 ~]# kubectl apply -f require-preference-podaffinity.yaml pod/with-pod-affinity-3 created [root@master01 ~]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES with-pod-affinity-3 0/1 Pending 0 5s <none> <none> <none> <none> [root@master01 ~]#
提示:能夠看到對應pod建立出來處於pending狀態,其緣由是沒有任何節點知足對應pod調度的硬限制;因此對應pod無法調度,只能被掛起;
示例:使用Affinity中的podAntiAffinity調度策略
[root@master01 ~]# cat require-preference-podantiaffinity.yaml apiVersion: v1 kind: Pod metadata: name: with-pod-affinity-4 spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - {key: app, operator: In, values: ["db"]} topologyKey: kubernetes.io/hostname preferredDuringSchedulingIgnoredDuringExecution: - weight: 80 podAffinityTerm: labelSelector: matchExpressions: - {key: app, operator: In, values: ["db"]} topologyKey: rack - weight: 20 podAffinityTerm: labelSelector: matchExpressions: - {key: app, operator: In, values: ["db"]} topologyKey: zone containers: - name: myapp image: ikubernetes/myapp:v1 [root@master01 ~]#
提示:podantiaffinity的使用和podaffinity的使用方式同樣,只是其對應的邏輯相反,podantiaffinity是定義知足條件的節點不運行對應pod,podaffinity是知足條件運行pod;上述清單表示對應pod必定不能運行在有標籤爲app=db的pod運行的節點,而且對應節點上若是有key爲rack和key爲zone的節點標籤,這類節點也不運行;即只能運行在上述三個條件都知足的節點上;若是全部節點都知足上述三個條件,則對應pod只能掛;若是單單使用軟限制,則pod會勉強運行在對應節點得分較低的節點上運行;
應用清單
[root@master01 ~]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES with-pod-affinity-3 0/1 Pending 0 22m <none> <none> <none> <none> [root@master01 ~]# kubectl apply -f require-preference-podantiaffinity.yaml pod/with-pod-affinity-4 created [root@master01 ~]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES with-pod-affinity-3 0/1 Pending 0 22m <none> <none> <none> <none> with-pod-affinity-4 1/1 Running 0 6s 10.244.4.30 node04.k8s.org <none> <none> [root@master01 ~]# kubectl get node -L rack,zone NAME STATUS ROLES AGE VERSION RACK ZONE master01.k8s.org Ready control-plane,master 30d v1.20.0 node01.k8s.org Ready <none> 30d v1.20.0 group1 node02.k8s.org Ready <none> 30d v1.20.0 node03.k8s.org Ready <none> 30d v1.20.0 group2 node04.k8s.org Ready <none> 20d v1.20.0 [root@master01 ~]#
提示:能夠看到對應pod被調度到node04上運行;其緣由是node04上沒有上述三個條件;固然node02也是符合運行對應pod的節點;
驗證:刪除上述pod,在四個節點上各自運行一個app=db標籤的pod,再次應用清單,看看對用pod怎麼調度?
[root@master01 ~]# kubectl delete all --all pod "with-pod-affinity-3" deleted pod "with-pod-affinity-4" deleted service "kubernetes" deleted [root@master01 ~]# cat pod-demo.yaml apiVersion: apps/v1 kind: DaemonSet metadata: name: redis-ds labels: app: db spec: selector: matchLabels: app: db template: metadata: labels: app: db spec: containers: - name: redis image: redis:4-alpine ports: - name: redis containerPort: 6379 [root@master01 ~]# kubectl apply -f pod-demo.yaml daemonset.apps/redis-ds created [root@master01 ~]# kubectl get pods -L app -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES APP redis-ds-4bnmv 1/1 Running 0 44s 10.244.2.26 node02.k8s.org <none> <none> db redis-ds-c2h77 1/1 Running 0 44s 10.244.1.38 node01.k8s.org <none> <none> db redis-ds-mbxcd 1/1 Running 0 44s 10.244.4.32 node04.k8s.org <none> <none> db redis-ds-r2kxv 1/1 Running 0 44s 10.244.3.25 node03.k8s.org <none> <none> db [root@master01 ~]# kubectl apply -f require-preference-podantiaffinity.yaml pod/with-pod-affinity-5 created [root@master01 ~]# kubectl get pods -o wide -L app NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES APP redis-ds-4bnmv 1/1 Running 0 2m29s 10.244.2.26 node02.k8s.org <none> <none> db redis-ds-c2h77 1/1 Running 0 2m29s 10.244.1.38 node01.k8s.org <none> <none> db redis-ds-mbxcd 1/1 Running 0 2m29s 10.244.4.32 node04.k8s.org <none> <none> db redis-ds-r2kxv 1/1 Running 0 2m29s 10.244.3.25 node03.k8s.org <none> <none> db with-pod-affinity-5 0/1 Pending 0 9s <none> <none> <none> <none> [root@master01 ~]#
提示:能夠看到對應pod沒有節點能夠運行,處於pending狀態,其緣由對應節點都知足排斥運行對應pod的硬限制;
經過上述驗證過程能夠總結,不論是pod與節點的親和性仍是pod與pod的親和性,只要在調度策略中定義了硬親和,對應pod必定會運行在知足硬親和條件的節點上,若是沒有節點知足硬親和條件,則對應pod掛起;若是隻是定義了軟親和,則對應pod會優先運行在匹配權重較大軟限制條件的節點上,若是沒有節點知足軟限制,對應調度就走默認調度策略,找得分最高的節點運行;對於反親和性也是一樣的邏輯;不一樣的是反親和知足對應硬限制或軟限制,對應pod不會運行在對應節點上;這裏還須要注意一點,使用pod與pod的親和調度策略,若是節點較多,其規則不該該設置的過於精細,顆粒度應該適立即可,過分精細會致使pod在調度時,篩選節點消耗更多的資源,致使整個集羣性能降低;建議在大規模集羣中使用node affinity;