Kubernetes網絡分析-Container間通訊

本文假設你已經搭建好了Kubernetes集羣,那麼請求是如何到達POD,而後被Container處理的。都是乾貨。
若是沒據說過Kubernetes,不知道POD是什麼,請先參考: http://www.infoq.com/cn/articles/Kubernetes-system-architecture-introduction
如何進行基本的Kubernetes管理請見個人這篇博客 http://my.oschina.net/xue777hua/blog/514816node

1. 引言##

Kubernetes物理部署圖

上圖顯示了Kubernetes的基本結構圖。git

  • Master管理多個Slave節點
  • Slave節點上面能夠運行多個Pod
  • Pod能夠部署多個副本,多個副本能夠運行在不一樣的Node上
  • 一個Pod能夠包含多個Container,一個Pod內的Container共享一樣的網絡地址空間

最重要的是最後一句話:一個Pod內的Container共享一樣的網絡地址空間。這是經過Mapped Container作到的。github

2. Mapped Container##

基本描述爲下:docker

  • 容器 A 的網絡模式爲正常docker的網絡模式
  • 容器 B 的網絡模式爲應用容器A的網絡模式

###2.1 共享網絡模式###ubuntu

下面是一個例子來驗證,我這裏建立了一個busybox的Pod。centos

[root@centos7-node-221 ~]$ kubectl get po
NAME      READY     STATUS    RESTARTS   AGE
busybox   1/1       Running   224        9d
[root@centos7-node-221 ~]$ kubectl describe po busybox
Name:				busybox
Namespace:			default
Image(s):			busybox
Node:				centos7-node-226/192.168.1.226
Labels:				<none>
Status:				Running
Reason:				
Message:			
IP:				172.16.58.6
Replication Controllers:	<none>
Containers:
  busybox:
    Image:			busybox
    State:			Running
      Started:			Thu, 08 Oct 2015 08:20:30 -0400
    Last Termination State:	Terminated
      Exit Code:		0
      Started:			Thu, 08 Oct 2015 07:20:26 -0400
      Finished:			Thu, 08 Oct 2015 08:20:26 -0400
    Ready:			True
    Restart Count:		224
    Variables:
Conditions:
  Type		Status
  Ready 	True 
Volumes:
  default-token-lv94w:
    Type:	Secret (a secret that should populate this volume)
    SecretName:	default-token-lv94w
Events:
  FirstSeen	LastSeen	Count	From				SubobjectPath			Reason	Message
  9d		37m		225	{kubelet centos7-node-226}	spec.containers{busybox}	pulled	Container image "busybox" already present on machine
  37m		37m		1	{kubelet centos7-node-226}	spec.containers{busybox}	Created	Created with docker id fc8580292210
  37m		37m		1	{kubelet centos7-node-226}	spec.containers{busybox}	Started	Started with docker id fc8580292210

咱們去192.168.1.226看下這個Pod和其Container.bash

[root@centos7-node-226 ~]$ docker ps | grep busybox
fc8580292210        busybox                                               "sleep 3600"        37 minutes ago      Up 37 minutes                           k8s_busybox.62fa0587_busybox_default_86e98e8c-665f-11e5-af98-525400d7abb6_7f734c4d                                                
02d259dc8ab5        gcr.io/google_containers/pause:0.8.0                  "/pause"            9 days ago          Up 9 days                               k8s_POD.7be6d81d_busybox_default_86e98e8c-665f-11e5-af98-525400d7abb6_ff9224f5

發現有兩個容器,一個是pause容器,一個是busybox容器。其中pause容器爲主網絡容器,其餘容器都共享pause容器的網絡模式。咱們分別看下其網絡模式。下面是兩個容器的網絡模式。網絡

[root@centos7-node-226 ~]$ docker inspect 02d259dc8ab5  | grep NetworkMode
        "NetworkMode": "bridge",
[root@centos7-node-226 ~]$ docker inspect fc8580292210 | grep NetworkMode
        "NetworkMode": "container:02d259dc8ab59c1746d54d2df24d8733b2b9379a9fdfbfdc2066429b4a934a04", # 這個container的id號碼就是上一個container的id的long形式

因此能夠看到fc8580292210(busybox)使用的是pause容器的網絡空間。app

讓咱們進一步驗證。dom

2.2 IP地址和hostname、網絡IO###

下面我在 192.168.1.224 搭建了一個dns的pod,裏面有4個容器,共享一個網絡空間,咱們採用查看其ip地址、hostname和網絡IO的方式來鑑定。 下面是容器的id號

[root@centos7-node-224 ~]$ docker ps | grep dns
b00a08d078d6        dockerimages.yinnut.com:15043/skydns:2015-03-11-001   "/skydns -machines=h   8 hours ago         Up 8 hours                              k8s_skydns.c878079e_kube-dns-v9-y05vd_kube-system_12725077-64c0-11e5-9309-525400d7abb6_46f95e60                                   
4e843585b938        dockerimages.yinnut.com:15043/exechealthz:1.0         "/exechealthz '-cmd=   11 days ago         Up 11 days                              k8s_healthz.8ab20f84_kube-dns-v9-y05vd_kube-system_12725077-64c0-11e5-9309-525400d7abb6_f7c469e5                                  
296ff779abb2        dockerimages.yinnut.com:15043/kube2sky:1.11           "/kube2sky -domain=c   11 days ago         Up 11 days                              k8s_kube2sky.2a46d768_kube-dns-v9-y05vd_kube-system_12725077-64c0-11e5-9309-525400d7abb6_349c7246                                 
f0118fac6952        dockerimages.yinnut.com:15043/etcd:2.0.9              "/usr/local/bin/etcd   11 days ago         Up 11 days                              k8s_etcd.64e02c2f_kube-dns-v9-y05vd_kube-system_12725077-64c0-11e5-9309-525400d7abb6_9235054b                                     
f281dbf1ec41        gcr.io/google_containers/pause:0.8.0                  "/pause"               11 days ago         Up 11 days                              k8s_POD.6e934112_kube-dns-v9-y05vd_kube-system_12725077-64c0-11e5-9309-525400d7abb6_a8ea96d0

咱們查看前三個 b00a08d078d6 4e843585b938 296ff779abb2 的上述屬性。

2.2.1 dns設置和hostname####

[root@centos7-node-224 ~]$ for id in b00a08d078d6 4e843585b938 296ff779abb2 ; do echo $id; docker exec $id cat /etc/hosts ; docker exec $id cat /etc/resolv.conf ; echo  "" ; done
b00a08d078d6
172.16.60.4	kube-dns-v9-y05vd
127.0.0.1	localhost
::1	localhost ip6-localhost ip6-loopback
fe00::0	ip6-localnet
ff00::0	ip6-mcastprefix
ff02::1	ip6-allnodes
ff02::2	ip6-allrouters
nameserver 192.168.1.208
search 8.8.8.8
options ndots:5

4e843585b938
172.16.60.4	kube-dns-v9-y05vd
127.0.0.1	localhost
::1	localhost ip6-localhost ip6-loopback
fe00::0	ip6-localnet
ff00::0	ip6-mcastprefix
ff02::1	ip6-allnodes
ff02::2	ip6-allrouters
nameserver 192.168.1.208
search 8.8.8.8
options ndots:5

296ff779abb2
172.16.60.4	kube-dns-v9-y05vd
127.0.0.1	localhost
::1	localhost ip6-localhost ip6-loopback
fe00::0	ip6-localnet
ff00::0	ip6-mcastprefix
ff02::1	ip6-allnodes
ff02::2	ip6-allrouters
nameserver 192.168.1.208
search 8.8.8.8
options ndots:5

2.2.2 IP地址####

[root@centos7-node-224 ~]$ for id in b00a08d078d6 4e843585b938 296ff779abb2 ; do echo $id; docker exec $id ip a  ; echo  "" ; done
b00a08d078d6
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever

4e843585b938
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever

296ff779abb2
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever

2.2.3 網絡鏈接IO####

[root@centos7-node-224 ~]$ for id in b00a08d078d6 4e843585b938 296ff779abb2 ; do echo $id; docker exec $id netstat -lan  ; echo  "" ; done
b00a08d078d6
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       
tcp        0      0 127.0.0.1:4001          0.0.0.0:*               LISTEN      
tcp        0      0 127.0.0.1:2379          0.0.0.0:*               LISTEN      
tcp        0      0 127.0.0.1:2380          0.0.0.0:*               LISTEN      
tcp        0      0 127.0.0.1:7001          0.0.0.0:*               LISTEN      
tcp        0      0 172.16.60.4:48582       10.254.0.1:443          ESTABLISHED 
tcp        0      0 127.0.0.1:4001          127.0.0.1:35394         ESTABLISHED 
tcp        0      0 172.16.60.4:48584       10.254.0.1:443          ESTABLISHED 
tcp        0      0 127.0.0.1:60161         127.0.0.1:2379          ESTABLISHED 
tcp        0      0 127.0.0.1:51445         127.0.0.1:4001          ESTABLISHED 
tcp        0      0 127.0.0.1:4001          127.0.0.1:51445         ESTABLISHED 
tcp        0      0 127.0.0.1:35550         127.0.0.1:4001          ESTABLISHED 
tcp        0      0 127.0.0.1:35394         127.0.0.1:4001          ESTABLISHED 
tcp        0      0 127.0.0.1:2379          127.0.0.1:60161         ESTABLISHED 
tcp        0      0 127.0.0.1:4001          127.0.0.1:51433         ESTABLISHED 
tcp        0      0 127.0.0.1:51433         127.0.0.1:4001          ESTABLISHED 
tcp        0      0 127.0.0.1:4001          127.0.0.1:35550         ESTABLISHED 
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags       Type       State         I-Node Path

4e843585b938
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       
tcp        0      0 127.0.0.1:4001          0.0.0.0:*               LISTEN      
tcp        0      0 127.0.0.1:2379          0.0.0.0:*               LISTEN      
tcp        0      0 127.0.0.1:2380          0.0.0.0:*               LISTEN      
tcp        0      0 127.0.0.1:7001          0.0.0.0:*               LISTEN      
tcp        0      0 172.16.60.4:48582       10.254.0.1:443          ESTABLISHED 
tcp        0      0 127.0.0.1:4001          127.0.0.1:35394         ESTABLISHED 
tcp        0      0 172.16.60.4:48584       10.254.0.1:443          ESTABLISHED 
tcp        0      0 127.0.0.1:60161         127.0.0.1:2379          ESTABLISHED 
tcp        0      0 127.0.0.1:51445         127.0.0.1:4001          ESTABLISHED 
tcp        0      0 127.0.0.1:4001          127.0.0.1:51445         ESTABLISHED 
tcp        0      0 127.0.0.1:35550         127.0.0.1:4001          ESTABLISHED 
tcp        0      0 127.0.0.1:35394         127.0.0.1:4001          ESTABLISHED 
tcp        0      0 127.0.0.1:2379          127.0.0.1:60161         ESTABLISHED 
tcp        0      0 127.0.0.1:4001          127.0.0.1:51433         ESTABLISHED 
tcp        0      0 127.0.0.1:51433         127.0.0.1:4001          ESTABLISHED 
tcp        0      0 127.0.0.1:4001          127.0.0.1:35550         ESTABLISHED 
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags       Type       State         I-Node Path

296ff779abb2
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       
tcp        0      0 127.0.0.1:4001          0.0.0.0:*               LISTEN      
tcp        0      0 127.0.0.1:2379          0.0.0.0:*               LISTEN      
tcp        0      0 127.0.0.1:2380          0.0.0.0:*               LISTEN      
tcp        0      0 127.0.0.1:7001          0.0.0.0:*               LISTEN      
tcp        0      0 172.16.60.4:48582       10.254.0.1:443          ESTABLISHED 
tcp        0      0 127.0.0.1:4001          127.0.0.1:35394         ESTABLISHED 
tcp        0      0 172.16.60.4:48584       10.254.0.1:443          ESTABLISHED 
tcp        0      0 127.0.0.1:60161         127.0.0.1:2379          ESTABLISHED 
tcp        0      0 127.0.0.1:51445         127.0.0.1:4001          ESTABLISHED 
tcp        0      0 127.0.0.1:4001          127.0.0.1:51445         ESTABLISHED 
tcp        0      0 127.0.0.1:35550         127.0.0.1:4001          ESTABLISHED 
tcp        0      0 127.0.0.1:35394         127.0.0.1:4001          ESTABLISHED 
tcp        0      0 127.0.0.1:2379          127.0.0.1:60161         ESTABLISHED 
tcp        0      0 127.0.0.1:4001          127.0.0.1:51433         ESTABLISHED 
tcp        0      0 127.0.0.1:51433         127.0.0.1:4001          ESTABLISHED 
tcp        0      0 127.0.0.1:4001          127.0.0.1:35550         ESTABLISHED 
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags       Type       State         I-Node Path

3. Container 之通訊##

下面分析下最複雜的Container之間的通訊。

3.1 Pod 以內 Container 通訊

先說最簡單的, Pod內的Container通訊,因爲共享網絡地址空間,直接訪問127.0.0.1便可。

3.2 跨機器之間Contianer通訊###

3.2.1 背景引導####

例子: 192.168.1.224的fluentd-elasticsearch容器要鏈接192.168.1.223的elasticsearch-logging容器。

  • 192.168.1.223的elasticsearch-logging容器及其IP地址:
[root@centos7-node-223 ~]$ docker ps |grep elasticsearch-logging
667cfd84c979        dockerimages.yinnut.com:15043/elasticsearch:1.7       "/run.sh"              12 days ago          Up 12 days                              k8s_elasticsearch-logging.89fda9f_elasticsearch-logging-v1-i8x6q_kube-system_8b558d2c-62a3-11e5-9d7b-525400d7abb6_2a02a2c8        
5201c8cbdebd        gcr.io/google_containers/pause:0.8.0                  "/pause"               12 days ago          Up 12 days                              k8s_POD.8ecd2043_elasticsearch-logging-v1-i8x6q_kube-system_8b558d2c-62a3-11e5-9d7b-525400d7abb6_4022db35                         
[root@centos7-node-223 ~]$ docker inspect 5201c8cbdebd |grep IPAddress
        "IPAddress": "172.16.77.4", #IP地址
        "SecondaryIPAddresses": null,

能夠看到elasticsearch-logging的容器的Pod的IP地址爲172.16.77.4

  • 192.168.1.223的fluentd-elasticsearch容器及其IP地址:
[root@centos7-node-224 ~]$ docker ps |grep fluentd-elasticsearch
d326d81468b5        gcr.io/google_containers/fluentd-elasticsearch:1.11   "td-agent -q"          12 days ago         Up 12 days                              k8s_fluentd-elasticsearch.27a08aa3_fluentd-elasticsearch-centos7-node-224_kube-system_7dcc6ce562f3742190a876fda85e2359_58c54ef3   
f9b76639d241        gcr.io/google_containers/pause:0.8.0                  "/pause"               12 days ago         Up 12 days                              k8s_POD.7be6d81d_fluentd-elasticsearch-centos7-node-224_kube-system_7dcc6ce562f3742190a876fda85e2359_333e52c0                     
[root@centos7-node-224 ~]$ docker inspect f9b76639d241 | grep IPAddress
        "IPAddress": "172.16.60.2",
        "SecondaryIPAddresses": null,

能夠看到fluentd-elasticsearch的容器的Pod的IP地址爲172.16.60.2

  • 咱們看下 fluentd-elasticsearch 的網絡鏈接狀況
[root@centos7-node-224 ~]$ docker exec d326d81468b5 netstat -nla | grep 172.16.77.4
tcp        0      0 172.16.60.2:56354       172.16.77.4:9200        TIME_WAIT  
tcp        0      0 172.16.60.2:56350       172.16.77.4:9200        TIME_WAIT  
tcp        0      0 172.16.60.2:56347       172.16.77.4:9200        TIME_WAIT  
tcp        0      0 172.16.60.2:56357       172.16.77.4:9200        TIME_WAIT  
tcp        0      0 172.16.60.2:56344       172.16.77.4:9200        TIME_WAIT  
tcp        0      0 172.16.60.2:56352       172.16.77.4:9200        TIME_WAIT

能夠看到其的確是鏈接了 172.16.77.4 的9200端口。 而對方 elasticsearch-logging 容器的確開啓了9200端口

[root@centos7-node-223 ~]$ docker exec  667cfd84c979  ss -l|grep LISTEN
tcp    LISTEN     0      50                  :::9200                 :::*       
tcp    LISTEN     0      50                  :::9300                 :::*

那麼這個過程是如何完成的呢???

3.2.2 Container間通訊流程####

192.168.1.224/fluentd-elasticsearch -> 192.168.1.223/elasticsearch-logging

192.168.1.224/fluentd-elasticsearch 須要鏈接到elasticsearch-logging容器.

  • 域名到IP對應。 elasticsearch-logging -> 解析爲10.254.24.205
root@fluentd-elasticsearch-centos7-node-224:/$ dig elasticsearch-logging

; <<>> DiG 9.9.5-3ubuntu0.5-Ubuntu <<>> elasticsearch-logging
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 39181
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;elasticsearch-logging.		IN	A

;; Query time: 1 msec
;; SERVER: 10.254.0.10#53(10.254.0.10)
;; WHEN: Fri Oct 09 06:04:35 UTC 2015
;; MSG SIZE  rcvd: 39
  • 訪問該IP地址10.254.24.205:9200端口。根據路由,請求將會到達網關172.16.60.1,也就是這個docker的宿主機的docker0網卡地址。
# 容器內
root@fluentd-elasticsearch-centos7-node-224:/$ ip route
default via 172.16.60.1 dev eth0 
172.16.60.0/24 dev eth0  proto kernel  scope link  src 172.16.60.2 
# 物理機
[root@centos7-node-224 ~]$ ifconfig docker0
docker0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 172.16.60.1  netmask 255.255.255.0  broadcast 0.0.0.0
        inet6 fe80::5484:7aff:fefe:9799  prefixlen 64  scopeid 0x20<link>
        ether 56:84:7a:fe:97:99  txqueuelen 0  (Ethernet)
        RX packets 10182154  bytes 1777103288 (1.6 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 11195534  bytes 2271907616 (2.1 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
  • iptables負責轉發請求到192.168.1.224:36967,而這個端口上kube-proxy進程在監聽。
[root@centos7-node-224 ~]$ iptables-save  | grep 10.254.24.205 |grep 9200
-A KUBE-PORTALS-CONTAINER -d 10.254.24.205/32 -p tcp -m comment --comment "kube-system/elasticsearch-logging:" -m tcp --dport 9200 -j REDIRECT --to-ports 36967
-A KUBE-PORTALS-HOST -d 10.254.24.205/32 -p tcp -m comment --comment "kube-system/elasticsearch-logging:" -m tcp --dport 9200 -j DNAT --to-destination 192.168.1.224:36967
[root@centos7-node-224 ~]$ netstat -nlp|grep 36967
tcp6       0      0 :::36967                :::*                    LISTEN      930/kube-proxy
  • 誰負責響應10.254.24.205:9200的請求?由上述分析,看起來是kube-proxy,那麼kube-proxy進程看起來是個proxy,那麼被轉發給誰處理?固然給Pod啦。能夠到這個服務的Selector是k8s-app=elasticsearch-logging
[root@centos7-node-224 ~]$ kubectl get svc --all-namespaces  | grep 10.254.24.205
kube-system   elasticsearch-logging   10.254.24.205    <none>        9200/TCP            k8s-app=elasticsearch-logging   14d
  • 找到其對應的Pod爲elasticsearch-logging-v1-gph4i和elasticsearch-logging-v1-i8x6q
[root@centos7-node-224 ~]$ kubectl get po -l k8s-app=elasticsearch-logging --all-namespaces
NAMESPACE     NAME                             READY     STATUS    RESTARTS   AGE
kube-system   elasticsearch-logging-v1-gph4i   1/1       Running   6          14d
kube-system   elasticsearch-logging-v1-i8x6q   1/1       Running   5          14d
  • 咱們查看其中的elasticsearch-logging-v1-i8x6q容器的IP地址,發現爲172.16.77.4
[root@centos7-node-224 ~]$ kubectl describe po elasticsearch-logging-v1-i8x6q --namespace=kube-system | grep IP
IP:				172.16.77.4
  • 故而很明顯kube-proxy 會把部分請求轉發給 其中的一個Pod來處理,而這個Pod的IP地址是172.16.77.4 . 而 172.16.77.4 這個Pod 在 192.168.1.223 機器上.
[root@centos7-node-223 ~]$ docker inspect 5201c8cbdebd | grep IPAddress
        "IPAddress": "172.16.77.4",
        "SecondaryIPAddresses": null,
  • 那如何與172.16.77.4進行通訊呢?跨機器之間通訊則採用flannel等諸如此類的overlay網絡或者ovs等L2網絡。
    關於Flannel用於跨機器的docker容器間通訊分析:請見我以前的博文: docker下基於flannel的overlay網絡分析

4. 總結##

Kubernetes的Container之間的通訊最爲複雜。下面是一個小結:

  1. 首先須要一個可用的跨機器的容器間網絡請求,我這裏是Flannel,最終的Container間通訊是經過Flannel達成的。
  2. Kubernetes讓在一個Pod的一組Container共享一個網絡空間,從而讓關聯密切的Container之間的通訊變得十分容易。
  3. 若是要訪問Kubernetes集羣的其餘服務,則須要通過宿主機的iptables的NAT規則,請求會發給kube-proxy,kube-proxy知道全部的服務的endpoints,而且知道服務請求是由哪些Pod來處理(而這經過Label Selector完成)。
  4. kube-proxy選擇出相應的Pod,而且把請求發送給他們處理,而這個Pod的地址是 172.16.x.x 也就是docker0網橋橋接的地址,也就是說這時候交給Flannel這一層進行處理(若是跨機器的話)或者直接交給本機處理(不跨機器)。

####參考####

  1. https://github.com/skynetservices/skydns
  2. https://github.com/kubernetes/kubernetes
  3. http://kubernetes.io/
  4. https://www.docker.com/
相關文章
相關標籤/搜索