Libnetwork最初是由libcontainer和Docker Engine中的網絡相關的代碼合併而成的,是Docker容器網絡庫,最核心的內容是其定義的Container Network Model(CNM)。node
Libnetwork CNM 定義了Docker容器的網絡模型,按照該模型開發出的driver就能與docker daemon協同工做,實現容器網絡。docker 原生的driver包括 none、bridge、overlay和macvlan,第三方driver包括flannel、weave、calico等。linux
CNM定義了以下三個組件:
nginx
Sandbox
Sandbox是Docker容器中一個網絡配置的隔離環境,包含容器的interface、路由表和DNS設置。Linux Network Namespace是Sandbox的標準實現。Sandbox能夠包含來自不一樣 Network的Endpoint。docker
Endpoint
Endpoint是一個在Network中進行網絡通信的接口(veth pair),用於將Sandbox接入Network。一個Endpoint只能屬於一個Network,也只能屬於一個Sandbox
Endpoint能夠加入一個network,但多個Endpoint能夠在一個Sandbox中共存。bash
[root@swarm-manager ~]# ll /var/run/docker/netns/ total 0 -r--r--r-- 1 root root 0 Aug 5 10:45 1-i6xug49nwd -r--r--r-- 1 root root 0 Aug 5 10:45 ingress_sbox [root@swarm-manager ~]# nsenter --net=/var/run/docker/netns/ingress_sbox ifconfig eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450 inet 10.255.0.2 netmask 255.255.0.0 broadcast 0.0.0.0 ether 02:42:0a:ff:00:02 txqueuelen 0 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 172.18.0.2 netmask 255.255.0.0 broadcast 0.0.0.0 ether 02:42:ac:12:00:02 txqueuelen 0 (Ethernet) RX packets 90 bytes 75247 (73.4 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 123 bytes 10271 (10.0 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 loop txqueuelen 0 (Local Loopback) RX packets 6 bytes 504 (504.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 6 bytes 504 (504.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 [root@swarm-manager ~]# iptables -t mangle -S -P PREROUTING ACCEPT -P INPUT ACCEPT -P FORWARD ACCEPT -P OUTPUT ACCEPT -P POSTROUTING ACCEPT [root@swarm-manager ~]# ipvsadm IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn [root@swarm-manager ~]# nsenter --net=/var/run/docker/netns/1-i6xug49nwd ifconfig br0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450 inet 10.255.0.1 netmask 255.255.0.0 broadcast 0.0.0.0 ether 3a:31:2c:7f:21:a8 txqueuelen 0 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 loop txqueuelen 0 (Local Loopback) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 veth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450 ether 4e:75:b1:f9:5b:55 txqueuelen 0 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 vxlan0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450 ether 3a:31:2c:7f:21:a8 txqueuelen 0 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
經過nsenter --net=<SandboxKey> ip add
能夠直接進入容器的Sandbox,查詢相關的網絡信息網絡
[root@swarm-manager ~]# docker network create -d overlay --subnet 192.168.10.0/24 my-network [root@swarm-manager ~]# docker network ls -f name=my-network NETWORK ID NAME DRIVER SCOPE nvpbs39b6ctz my-network overlay swarm [root@swarm-manager ~]# docker network inspect -f {{.IPAM.Config}} my-network [{192.168.10.0/24 192.168.10.1 map[]}]
br0除了鏈接全部的endpoint,還會鏈接一個vxlan設備,用於與其餘host創建vxlan tunnel。容器之間的數據就是經過這個tunnel通訊的。
[root@swarm-manager ~]# docker service create --replicas 2 --name nginx-vip --network my-network nginx [root@swarm-manager ~]# docker service inspect -f {{.Endpoint.VirtualIPs}} nginx-vip [{nvpbs39b6ctzrfw6vj809kjbu 192.168.10.2/24}] [root@swarm-node1 ~]# ls -lrt /var/run/docker/netns/ total 0 -r--r--r--. 1 root root 0 Aug 6 11:24 ingress_sbox -r--r--r--. 1 root root 0 Aug 6 11:24 1-i6xug49nwd -r--r--r--. 1 root root 0 Aug 10 16:30 1-vxe1cwk14a -r--r--r--. 1 root root 0 Aug 10 16:30 18531514ffd0 [root@swarm-node1 ~]# nsenter --net=/var/run/docker/netns/1-vxe1cwk14a ifconfig br0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450 inet 192.168.10.1 netmask 255.255.255.0 broadcast 0.0.0.0 ether 5a:f5:4c:f3:98:d5 txqueuelen 0 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 loop txqueuelen 0 (Local Loopback) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 veth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450 ether 96:57:63:17:80:83 txqueuelen 0 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 vxlan0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450 ether 5a:f5:4c:f3:98:d5 txqueuelen 0 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 [root@swarm-node1 ~]# nsenter --net=/var/run/docker/netns/18531514ffd0 ifconfig eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450 inet 192.168.10.4 netmask 255.255.255.0 broadcast 0.0.0.0 ether 02:42:c0:a8:0a:04 txqueuelen 0 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 172.18.0.3 netmask 255.255.0.0 broadcast 0.0.0.0 ether 02:42:ac:12:00:03 txqueuelen 0 (Ethernet) RX packets 7484 bytes 16821353 (16.0 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 4780 bytes 321518 (313.9 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 loop txqueuelen 0 (Local Loopback) RX packets 14 bytes 1717 (1.6 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 14 bytes 1717 (1.6 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
[root@swarm-node1 ~]# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES d66e400533af nginx:latest "nginx -g 'daemon ..." 18 minutes ago Up 18 minutes 80/tcp nginx-vip.2.p52ud6cmmgonl236dtvhuzibk # docker exec -it d66e400533af sh root@d66e400533af:/# nslookup nginx-vip Server: 127.0.0.11 Address: 127.0.0.11#53 Non-authoritative answer: Name: nginx-vip Address: 192.168.10.2 root@d66e400533af:/# nslookup tasks.nginx-vip Server: 127.0.0.11 Address: 127.0.0.11#53 Non-authoritative answer: Name: tasks.nginx-vip Address: 192.168.10.4 Name: tasks.nginx-vip Address: 192.168.10.3
192.168.10.2
在iptables的mangle表的OUTPUT鏈中被標記爲0x112(274),IPVS經過該標記將Service IP轉發到192.168.10.3和192.168.10.4的容器 [root@swarm-node1 ~]# docker inspect -f {{.NetworkSettings.SandboxKey}} d66e400533af /var/run/docker/netns/18531514ffd0 [root@swarm-node1 ~]# nsenter --net=/var/run/docker/netns/18531514ffd0 sh sh-4.2# ip add show eth0 102: eth0@if103: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP link/ether 02:42:c0:a8:0a:04 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 192.168.10.4/24 scope global eth0 valid_lft forever preferred_lft forever inet 192.168.10.2/32 scope global eth0 valid_lft forever preferred_lft forever sh-4.2# ip route default via 172.18.0.1 dev eth1 172.18.0.0/16 dev eth1 proto kernel scope link src 172.18.0.3 192.168.10.0/24 dev eth0 proto kernel scope link src 192.168.10.4 sh-4.2# iptables -t mangle -S -P PREROUTING ACCEPT -P INPUT ACCEPT -P FORWARD ACCEPT -P OUTPUT ACCEPT -P POSTROUTING ACCEPT -A OUTPUT -d 192.168.10.2/32 -j MARK --set-xmark 0x112/0xffffffff sh-4.2# ipvsadm IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn FWM 274 rr -> 192.168.10.3:0 Masq 1 0 0 -> 192.168.10.4:0 Masq 1 0 0
VIP模式中,swarm mode爲容器分配了一個鏈接到overlay網絡(my-network)的網卡"eth0@if103"
且生成了VIP,同時也分配了一個鏈接到docker_gwbridge網絡的網卡"eth1@if105"
用於鏈接外部網絡。
全部鏈接到my-network網絡中的容器能夠經過service-name或者VIP來訪問service,經過service-name訪問時,將先經過內置的DNS服務解析獲取到VIP。負載均衡
建立一個鏈接到自定義的overlay網絡的service,並指定endpoint-mode爲dnsrrtcp
[root@swarm-manager ~]# docker service create --endpoint-mode dnsrr --replicas 2 --name nginx-dnsrr --network my-network nginx [root@swarm-manager ~]# docker service inspect -f {{.Spec.EndpointSpec.Mode}} nginx-dnsrr dnsrr [root@swarm-node1 ~]# docker inspect -f {{.NetworkSettings.SandboxKey}} b68f0b4465b4 /var/run/docker/netns/b4efcf686a74 [root@swarm-node1 ~]# ls -lrt /var/run/docker/netns/ total 0 -r--r--r--. 1 root root 0 Aug 6 11:24 ingress_sbox -r--r--r--. 1 root root 0 Aug 6 11:24 1-i6xug49nwd -r--r--r--. 1 root root 0 Aug 10 16:30 1-vxe1cwk14a -r--r--r--. 1 root root 0 Aug 10 16:30 18531514ffd0 -r--r--r--. 1 root root 0 Aug 10 18:08 b4efcf686a74 [root@swarm-node1 ~]# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES b68f0b4465b4 nginx:latest "nginx -g 'daemon ..." About a minute ago Up About a minute 80/tcp nginx-dnsrr.2.kpx2tqqmdugdpwpdwynbyzf9j d66e400533af nginx:latest "nginx -g 'daemon ..." 2 hours ago Up 2 hours 80/tcp nginx-vip.2.p52ud6cmmgonl236dtvhuzibk
[root@swarm-node1 ~]# docker exec -it b68f0b4465b4 bash root@b68f0b4465b4:/# nslookup nginx-dnsrr Server: 127.0.0.11 Address: 127.0.0.11#53 Non-authoritative answer: Name: my-nginx-dnsrr Address: 192.168.10.5 Name: my-nginx-dnsrr Address: 192.168.10.6 root@b68f0b4465b4:/# ifconfig eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450 inet 192.168.10.6 netmask 255.255.255.0 broadcast 0.0.0.0 ether 02:42:c0:a8:0a:06 txqueuelen 0 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 172.18.0.4 netmask 255.255.0.0 broadcast 0.0.0.0 ether 02:42:ac:12:00:04 txqueuelen 0 (Ethernet) RX packets 7656 bytes 16832156 (16.0 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 5053 bytes 343122 (335.0 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 loop txqueuelen 0 (Local Loopback) RX packets 6 bytes 788 (788.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 6 bytes 788 (788.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
[root@swarm-manager ~]# docker service create --replicas 2 --name nginx-ingress --network my-network --publish 80:80 nginx [root@swarm-manager ~]# docker service ls ID NAME MODE REPLICAS IMAGE PORTS 7yzee08a9ryq nginx-dnsrr replicated 2/2 nginx:latest qx5epc99yu8q nginx-vip replicated 2/2 nginx:latest udiaexlplqq2 nginx-ingress replicated 2/2 nginx:latest *:80->80/tcp [root@swarm-manager ~]# docker service inspect -f {{.Endpoint.VirtualIPs}} nginx-ingress [{i6xug49nwdsxauqqpli3apvym 10.255.0.5/16} {vxe1cwk14avlfp2xjgymhkhdl 192.168.10.7/24}] [root@swarm-node1 ~]# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 42aed469b4cc nginx:latest "nginx -g 'daemon ..." About an hour ago Up About an hour 80/tcp nginx-ingress.2.u00j8ich8cv4bo0grx1hbsr7q b68f0b4465b4 nginx:latest "nginx -g 'daemon ..." About an hour ago Up About an hour 80/tcp nginx-dnsrr.2.kpx2tqqmdugdpwpdwynbyzf9j d66e400533af nginx:latest "nginx -g 'daemon ..." 3 hours ago Up 3 hours 80/tcp nginx-vip.2.p52ud6cmmgonl236dtvhuzibk [root@swarm-node1 ~]# docker exec -it 42aed469b4cc bash root@42aed469b4cc:/# ifconfig eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450 inet 10.255.0.7 netmask 255.255.0.0 broadcast 0.0.0.0 ether 02:42:0a:ff:00:07 txqueuelen 0 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 172.18.0.5 netmask 255.255.0.0 broadcast 0.0.0.0 ether 02:42:ac:12:00:05 txqueuelen 0 (Ethernet) RX packets 4049 bytes 10745529 (10.2 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 3164 bytes 211730 (206.7 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 eth2: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450 inet 192.168.10.9 netmask 255.255.255.0 broadcast 0.0.0.0 ether 02:42:c0:a8:0a:09 txqueuelen 0 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 loop txqueuelen 0 (Local Loopback) RX packets 4 bytes 620 (620.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 4 bytes 620 (620.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 root@42aed469b4cc:/# ip add show eth0 110: eth0@if111: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default link/ether 02:42:0a:ff:00:07 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 10.255.0.7/16 scope global eth0 valid_lft forever preferred_lft forever inet 10.255.0.5/32 scope global eth0 valid_lft forever preferred_lft forever root@42aed469b4cc:/# ip add show eth2 root@42aed469b4cc:/# ip add show eth2 114: eth2@if115: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default link/ether 02:42:c0:a8:0a:09 brd ff:ff:ff:ff:ff:ff link-netnsid 2 inet 192.168.10.9/24 scope global eth2 valid_lft forever preferred_lft forever inet 192.168.10.7/32 scope global eth2 valid_lft forever preferred_lft forever
容器中除了lo,還建立了3個網卡:ide
ingress
的網絡,經過routing mesh提供外部服務訪問。docker_gwbridge
網絡。當容器內部主動往外發送數據時,由docker_gwbridge SNAT轉發至外部網絡。my-network
的overlay網絡。若是在建立Service時映射了端口,swarm mode將會
經過routing mesh在全部節點上監聽80端口,即便節點上未建立相應的容器
,並經過iptables作反向NAT,當客戶端訪問集羣中的任意節點的80端口,swarm負載均衡會將請求路由到一個活動的容器。若不想在未建立容器的節點上監聽published端口,則可在映射端口時經過--publish mode=host,target=80,published=8080
進行指定。oop
建立完Service以後會發現Virtual IPs有兩個入口,其中10.255.0.5
鏈接到ingress網絡,192.168.10.7
是鏈接自定義的my-network網絡。當外部客戶端訪問服務時,swarm負載均衡的流程以下:
一、用戶訪問swarm-node1上的Nginx服務(172.16.100.21:80)
二、iptables將根據-A DOCKER-INGRESS -p tcp -m tcp --dport 80 -j DNAT --to-destination 172.18.0.2:80
的規則將請求轉發至ingress sanbox
中的172.18.0.2:80。
[root@swarm-node1 ~]# iptables -t nat -S -P PREROUTING ACCEPT -P INPUT ACCEPT -P OUTPUT ACCEPT -P POSTROUTING ACCEPT -N DOCKER -N DOCKER-INGRESS -A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER-INGRESS -A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER -A OUTPUT -m addrtype --dst-type LOCAL -j DOCKER-INGRESS -A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER -A POSTROUTING -o docker_gwbridge -m addrtype --src-type LOCAL -j MASQUERADE -A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE -A POSTROUTING -s 172.18.0.0/16 ! -o docker_gwbridge -j MASQUERADE -A DOCKER -i docker0 -j RETURN -A DOCKER -i docker_gwbridge -j RETURN -A DOCKER-INGRESS -p tcp -m tcp --dport 80 -j DNAT --to-destination 172.18.0.2:80 -A DOCKER-INGRESS -j RETURN
三、ingress sanbox中的iptable根據不一樣的端口設置不一樣的mark(0x114/276)。
[root@swarm-node1 ~]# nsenter --net=/var/run/docker/netns/ingress_sbox iptables -t mangle -S [root@swarm-node1 ~]# iptables -t mangle -S -P PREROUTING ACCEPT -P INPUT ACCEPT -P FORWARD ACCEPT -P OUTPUT ACCEPT -P POSTROUTING ACCEPT -A PREROUTING -p tcp -m tcp --dport 80 -j MARK --set-xmark 0x114/0xffffffff -A OUTPUT -d 10.255.0.5/32 -j MARK --set-xmark 0x114/0xffffffff
四、ipvs將根據不一樣的mark轉發到對應的real server(容器namespace);
[root@swarm-node1 ~]# nsenter --net=/var/run/docker/netns/ingress_sbox ipvsadm IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn FWM 276 rr -> 10.255.0.6:0 Masq 1 0 0 -> 10.255.0.7:0 Masq 1 0 0