Kubernetes K8S在IPVS代理模式下Service服務的ClusterIP類型訪問失敗處理

時間 2020-09-09

標籤 kubernetes k8s ipvs 代理模式 service 服務 clusterip 類型訪問失敗處理简体版

原文原文鏈接

Kubernetes K8S使用IPVS代理模式，當Service的類型爲ClusterIP時，如何處理訪問service卻不能訪問後端pod的狀況。html

背景現象

Kubernetes K8S使用IPVS代理模式，當Service的類型爲ClusterIP時，出現訪問service卻不能訪問後端pod的狀況。node

主機配置規劃

服務器名稱(hostname)	系統版本	配置	內網IP	外網IP(模擬)
k8s-master	CentOS7.7	2C/4G/20G	172.16.1.110	10.0.0.110
k8s-node01	CentOS7.7	2C/4G/20G	172.16.1.111	10.0.0.111
k8s-node02	CentOS7.7	2C/4G/20G	172.16.1.112	10.0.0.112

場景復現

Deployment的yaml信息

yaml文件後端

 1 [root@k8s-master service]# pwd
 2 /root/k8s_practice/service
 3 [root@k8s-master service]# cat myapp-deploy.yaml 
 4 apiVersion: apps/v1
 5 kind: Deployment
 6 metadata:
 7   name: myapp-deploy
 8   namespace: default
 9 spec:
10   replicas: 3
11   selector:
12     matchLabels:
13       app: myapp
14       release: v1
15   template:
16     metadata:
17       labels:
18         app: myapp
19         release: v1
20         env: test
21     spec:
22       containers:
23       - name: myapp
24         image: registry.cn-beijing.aliyuncs.com/google_registry/myapp:v1
25         imagePullPolicy: IfNotPresent
26         ports:
27         - name: http
28           containerPort: 80

啓動Deployment並查看狀態api

 1 [root@k8s-master service]# kubectl apply -f myapp-deploy.yaml 
 2 deployment.apps/myapp-deploy created
 3 [root@k8s-master service]# 
 4 [root@k8s-master service]# kubectl get deploy -o wide
 5 NAME           READY   UP-TO-DATE   AVAILABLE   AGE   CONTAINERS   IMAGES                                                      SELECTOR
 6 myapp-deploy   3/3     3            3           14s   myapp        registry.cn-beijing.aliyuncs.com/google_registry/myapp:v1   app=myapp,release=v1
 7 [root@k8s-master service]# kubectl get rs -o wide
 8 NAME                      DESIRED   CURRENT   READY   AGE   CONTAINERS   IMAGES                                                      SELECTOR
 9 myapp-deploy-5695bb5658   3         3         3       21s   myapp        registry.cn-beijing.aliyuncs.com/google_registry/myapp:v1   app=myapp,pod-template-hash=5695bb5658,release=v1
10 [root@k8s-master service]#
11 [root@k8s-master service]# kubectl get pod -o wide --show-labels
12 NAME                            READY   STATUS    RESTARTS   AGE     IP             NODE         NOMINATED NODE   READINESS GATES   LABELS
13 myapp-deploy-5695bb5658-7tgfx   1/1     Running   0          39s     10.244.2.111   k8s-node02   <none>           <none>            app=myapp,env=test,pod-template-hash=5695bb5658,release=v1
14 myapp-deploy-5695bb5658-95zxm   1/1     Running   0          39s     10.244.3.165   k8s-node01   <none>           <none>            app=myapp,env=test,pod-template-hash=5695bb5658,release=v1
15 myapp-deploy-5695bb5658-xtxbp   1/1     Running   0          39s     10.244.3.164   k8s-node01   <none>           <none>            app=myapp,env=test,pod-template-hash=5695bb5658,release=v1

curl訪問服務器

1 [root@k8s-master service]# curl 10.244.2.111/hostname.html
2 myapp-deploy-5695bb5658-7tgfx
3 [root@k8s-master service]# 
4 [root@k8s-master service]# curl 10.244.3.165/hostname.html
5 myapp-deploy-5695bb5658-95zxm
6 [root@k8s-master service]# 
7 [root@k8s-master service]# curl 10.244.3.164/hostname.html
8 myapp-deploy-5695bb5658-xtxbp

Service的ClusterIP類型信息

yaml文件網絡

 1 [root@k8s-master service]# pwd
 2 /root/k8s_practice/service
 3 [root@k8s-master service]# cat myapp-svc-ClusterIP.yaml 
 4 apiVersion: v1
 5 kind: Service
 6 metadata:
 7   name: myapp-clusterip
 8   namespace: default
 9 spec:
10   type: ClusterIP  # 能夠不寫，爲默認類型
11   selector:
12     app: myapp
13     release: v1
14   ports:
15   - name: http
16     port: 8080  # 對外暴露端口
17     targetPort: 80  # 轉發到後端端口

啓動Service並查看狀態app

1 [root@k8s-master service]# kubectl apply -f myapp-svc-ClusterIP.yaml 
2 service/myapp-clusterip created
3 [root@k8s-master service]# 
4 [root@k8s-master service]# kubectl get svc -o wide
5 NAME              TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE   SELECTOR
6 kubernetes        ClusterIP   10.96.0.1        <none>        443/TCP    16d   <none>
7 myapp-clusterip   ClusterIP   10.102.246.104   <none>        8080/TCP   6s    app=myapp,release=v1

查看ipvs信息

1 [root@k8s-master service]# ipvsadm -Ln
2 IP Virtual Server version 1.2.1 (size=4096)
3 Prot LocalAddress:Port Scheduler Flags
4   -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
5 ………………
6 TCP  10.102.246.104:8080 rr
7   -> 10.244.2.111:80              Masq    1      0          0  
8   -> 10.244.3.164:80              Masq    1      0          0  
9   -> 10.244.3.165:80              Masq    1      0          0

因而可知，正常狀況下：當咱們訪問Service時，訪問鏈路是可以傳遞到後端的Pod並返回信息。curl

Curl訪問結果

直接訪問Pod，以下所示是可以正常訪問的。tcp

1 [root@k8s-master service]# curl 10.244.2.111/hostname.html
2 myapp-deploy-5695bb5658-7tgfx
3 [root@k8s-master service]# 
4 [root@k8s-master service]# curl 10.244.3.165/hostname.html
5 myapp-deploy-5695bb5658-95zxm
6 [root@k8s-master service]# 
7 [root@k8s-master service]# curl 10.244.3.164/hostname.html
8 myapp-deploy-5695bb5658-xtxbp

但經過Service訪問結果異常，信息以下。ide

1 [root@k8s-master service]# curl 10.102.246.104:8080
2 curl: (7) Failed connect to 10.102.246.104:8080; Connection timed out

處理過程

抓包覈實

使用以下命令進行抓包，並經過Wireshark工具進行分析。

tcpdump -i any -n -nn port 80 -w ./$(date +%Y%m%d%H%M%S).pcap

結果以下圖：

可見，已經向Pod發了請求，可是沒有獲得回覆。結果TCP又重傳了【TCP Retransmission】。

查看kube-proxy日誌

 1 [root@k8s-master service]# kubectl get pod -A | grep 'kube-proxy'
 2 kube-system            kube-proxy-6bfh7                             1/1     Running   1          3h52m
 3 kube-system            kube-proxy-6vfkf                             1/1     Running   1          3h52m
 4 kube-system            kube-proxy-bvl9n                             1/1     Running   1          3h52m
 5 [root@k8s-master service]# 
 6 [root@k8s-master service]# kubectl logs -n kube-system kube-proxy-6bfh7
 7 W0601 13:01:13.170506       1 feature_gate.go:235] Setting GA feature gate SupportIPVSProxyMode=true. It will be removed in a future release.
 8 I0601 13:01:13.338922       1 node.go:135] Successfully retrieved node IP: 172.16.1.112
 9 I0601 13:01:13.338960       1 server_others.go:172] Using ipvs Proxier.  ##### 可見使用的是ipvs模式
10 W0601 13:01:13.339400       1 proxier.go:420] IPVS scheduler not specified, use rr by default
11 I0601 13:01:13.339638       1 server.go:571] Version: v1.17.4
12 I0601 13:01:13.340126       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
13 I0601 13:01:13.340159       1 conntrack.go:52] Setting nf_conntrack_max to 131072
14 I0601 13:01:13.340500       1 conntrack.go:83] Setting conntrack hashsize to 32768
15 I0601 13:01:13.346991       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
16 I0601 13:01:13.347035       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
17 I0601 13:01:13.347703       1 config.go:313] Starting service config controller
18 I0601 13:01:13.347718       1 shared_informer.go:197] Waiting for caches to sync for service config
19 I0601 13:01:13.347736       1 config.go:131] Starting endpoints config controller
20 I0601 13:01:13.347743       1 shared_informer.go:197] Waiting for caches to sync for endpoints config
21 I0601 13:01:13.448223       1 shared_informer.go:204] Caches are synced for endpoints config 
22 I0601 13:01:13.448236       1 shared_informer.go:204] Caches are synced for service config

可見kube-proxy日誌無異常

網卡設置並修改

備註：在k8s-master節點操做的

以後進一步搜索代表，這多是因爲「Checksum offloading」形成的。信息以下：

1 [root@k8s-master service]# ethtool -k flannel.1 | grep checksum
2 rx-checksumming: on
3 tx-checksumming: on     ##### 當前爲 on
4     tx-checksum-ipv4: off [fixed]
5     tx-checksum-ip-generic: on    ##### 當前爲 on
6     tx-checksum-ipv6: off [fixed]
7     tx-checksum-fcoe-crc: off [fixed]
8     tx-checksum-sctp: off [fixed]

flannel的網絡設置將發送端的checksum打開了，而實際應該關閉，從而讓物理網卡校驗。操做以下：

 1 # 臨時關閉操做
 2 [root@k8s-master service]# ethtool -K flannel.1 tx-checksum-ip-generic off  
 3 Actual changes:
 4 tx-checksumming: off
 5     tx-checksum-ip-generic: off
 6 tcp-segmentation-offload: off
 7     tx-tcp-segmentation: off [requested on]
 8     tx-tcp-ecn-segmentation: off [requested on]
 9     tx-tcp6-segmentation: off [requested on]
10     tx-tcp-mangleid-segmentation: off [requested on]
11 udp-fragmentation-offload: off [requested on]
12 [root@k8s-master service]# 
13 # 再次查詢結果
14 [root@k8s-master service]# ethtool -k flannel.1 | grep checksum
15 rx-checksumming: on
16 tx-checksumming: off     ##### 當前爲 off
17     tx-checksum-ipv4: off [fixed]
18     tx-checksum-ip-generic: off     ##### 當前爲 off
19     tx-checksum-ipv6: off [fixed]
20     tx-checksum-fcoe-crc: off [fixed]
21     tx-checksum-sctp: off [fixed]

固然上述操做只能臨時生效。機器重啓後flannel虛擬網卡還會開啓Checksum校驗。

以後咱們再次curl嘗試

 1 [root@k8s-master ~]# curl 10.102.246.104:8080
 2 Hello MyApp | Version: v1 | <a href="hostname.html">Pod Name</a>
 3 [root@k8s-master ~]# 
 4 [root@k8s-master ~]# curl 10.102.246.104:8080/hostname.html
 5 myapp-deploy-5695bb5658-7tgfx
 6 [root@k8s-master ~]# 
 7 [root@k8s-master ~]# curl 10.102.246.104:8080/hostname.html
 8 myapp-deploy-5695bb5658-95zxm
 9 [root@k8s-master ~]# 
10 [root@k8s-master ~]# curl 10.102.246.104:8080/hostname.html
11 myapp-deploy-5695bb5658-xtxbp
12 [root@k8s-master ~]# 
13 [root@k8s-master ~]# curl 10.102.246.104:8080/hostname.html
14 myapp-deploy-5695bb5658-7tgfx

由上可見，可以正常訪問了。

永久關閉flannel網卡發送校驗

備註：全部機器都操做

使用如下代碼建立服務

 1 [root@k8s-node02 ~]# cat /etc/systemd/system/k8s-flannel-tx-checksum-off.service 
 2 [Unit]
 3 Description=Turn off checksum offload on flannel.1
 4 After=sys-devices-virtual-net-flannel.1.device
 5 
 6 [Install]
 7 WantedBy=sys-devices-virtual-net-flannel.1.device
 8 
 9 [Service]
10 Type=oneshot
11 ExecStart=/sbin/ethtool -K flannel.1 tx-checksum-ip-generic off

開機自啓動，並啓動服務

1 systemctl enable k8s-flannel-tx-checksum-off
2 systemctl start  k8s-flannel-tx-checksum-off