在《當Docker遇到systemd》一文中,我提到過這兩天兒一直在作的一個task:使用kubeadm在Ubuntu 16.04上安裝部署Kubernetes的最新發布版本-k8s 1.5.1。node
年中,Docker宣佈在Docker engine中集成swarmkit工具包,這一announcement在輕量級容器界引起軒然大波。畢竟開發者是懶惰的^0^,有了docker swarmkit,驅動developer去安裝其餘容器編排工具的動力在哪裏呢?即使docker engine還不是當年那個被人們高頻使用的IE瀏覽器。做爲針對Docker公司這一市場行爲的迴應,容器集羣管理和服務編排領先者Kubernetes在三個月後發佈了Kubernetes1.4.0版本。在這個版本中K8s新增了kubeadm工具。kubeadm的使用方式有點像集成在docker engine中的swarm kit工具,旨在改善開發者在安裝、調試和使用k8s時的體驗,下降安裝和使用門檻。理論上經過兩個命令:init和join便可搭建出一套完整的Kubernetes cluster。linux
不過,和初入docker引擎的swarmkit同樣,kubeadm目前也在active development中,也不是那麼stable,所以即使在當前最新的k8s 1.5.1版本中,它仍然處於Alpha狀態,官方不建議在Production環境下使用。每次執行kubeadm init時,它都會打印以下提醒日誌:nginx
[kubeadm] WARNING: kubeadm is in alpha, please do not use it for production clusters.
不過因爲以前部署的k8s 1.3.7集羣運行良好,這給了咱們在k8s這條路上繼續走下去並走好的信心。但k8s在部署和管理方面的體驗的確是太繁瑣了,因而咱們準備試驗一下kubeadm是否能帶給咱們超出預期的體驗。以前在aliyun ubuntu 14.04上安裝kubernetes 1.3.7的經驗和教訓,讓我略微有那麼一丟丟底氣,但實際安裝過程依舊是一波三折。這既與kubeadm的unstable有關,一樣也與cni、第三方網絡add-ons的質量有關。不管哪一方出現問題都會讓你的install過程異常坎坷曲折。git
在kubeadm支持的Ubuntu 16.04+, CentOS 7 or HypriotOS v1.0.1+三種操做系統中,咱們選擇了Ubuntu 16.04。因爲阿里雲尚無官方16.04 Image可用,咱們新開了兩個Ubuntu 14.04ECS實例,並經過apt-get命令手工將其升級到Ubuntu 16.04.1,詳細版本是:Ubuntu 16.04.1 LTS (GNU/Linux 4.4.0-58-generic x86_64)。github
Ubuntu 16.04使用了systemd做爲init system,在安裝和配置Docker時,能夠參考個人這篇《當Docker遇到system》。Docker版本我選擇了目前能夠獲得的lastest stable release: 1.12.5。docker
# docker version Client: Version: 1.12.5 API version: 1.24 Go version: go1.6.4 Git commit: 7392c3b Built: Fri Dec 16 02:42:17 2016 OS/Arch: linux/amd64 Server: Version: 1.12.5 API version: 1.24 Go version: go1.6.4 Git commit: 7392c3b Built: Fri Dec 16 02:42:17 2016 OS/Arch: linux/amd64
至於Kubernetes版本,前面已經提到過了,咱們就使用最新發布的Kubernetes 1.5.1版本。1.5.1是1.5.0的一個緊急fix版本,主要」to address default flag values which in isolation were not problematic, but in concert could result in an insecure cluster」。官方建議skip 1.5.0,直接用1.5.1。json
這裏再重申一下:Kubernetes的安裝、配置和調通是很難的,在阿里雲上調通就更難了,有時還須要些運氣。Kubernetes、Docker、cni以及各類網絡Add-ons都在active development中,也許今天還好用的step、tip和trick,明天就out-dated,所以在借鑑本文的操做步驟時,請謹記這些^0^。bootstrap
咱們此次新開了兩個ECS實例,一個做爲master node,一個做爲minion node。Kubeadm默認安裝時,master node將不會參與Pod調度,不會承載work load,即不會有非核心組件的Pod在Master node上被建立出來。固然經過kubectl taint命令能夠解除這一限制,不過這是後話了。ubuntu
集羣拓撲:後端
master node:10.47.217.91,主機名:iZ25beglnhtZ minion node:10.28.61.30,主機名:iZ2ze39jeyizepdxhwqci6Z
本次安裝的主參考文檔就是Kubernetes官方的那篇《Installing Kubernetes on Linux with kubeadm》。
本小節,咱們將進行安裝包準備,即將kubeadm以及這次安裝所須要的k8s核心組件通通下載到上述兩個Node上。注意:若是你有加速器,那麼本節下面的安裝過程將尤其順利,反之,…
。如下命令,在兩個Node上均要執行。
# curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - OK
添加Kubernetes源到sources.list.d目錄下:
# cat <<EOF > /etc/apt/sources.list.d/kubernetes.list deb http://apt.kubernetes.io/ kubernetes-xenial main EOF # cat /etc/apt/sources.list.d/kubernetes.list deb http://apt.kubernetes.io/ kubernetes-xenial main
更新包信息:
# apt-get update ... ... Hit:2 http://mirrors.aliyun.com/ubuntu xenial InRelease Hit:3 https://apt.dockerproject.org/repo ubuntu-xenial InRelease Get:4 http://mirrors.aliyun.com/ubuntu xenial-security InRelease [102 kB] Get:1 https://packages.cloud.google.com/apt kubernetes-xenial InRelease [6,299 B] Get:5 https://packages.cloud.google.com/apt kubernetes-xenial/main amd64 Packages [1,739 B] Get:6 http://mirrors.aliyun.com/ubuntu xenial-updates InRelease [102 kB] Get:7 http://mirrors.aliyun.com/ubuntu xenial-proposed InRelease [253 kB] Get:8 http://mirrors.aliyun.com/ubuntu xenial-backports InRelease [102 kB] Fetched 568 kB in 19s (28.4 kB/s) Reading package lists... Done
在這次安裝中,咱們經過apt-get就能夠下載Kubernetes的核心組件,包括kubelet、kubeadm、kubectl和kubernetes-cni等。
# apt-get install -y kubelet kubeadm kubectl kubernetes-cni Reading package lists... Done Building dependency tree Reading state information... Done The following package was automatically installed and is no longer required: libtimedate-perl Use 'apt autoremove' to remove it. The following additional packages will be installed: ebtables ethtool socat The following NEW packages will be installed: ebtables ethtool kubeadm kubectl kubelet kubernetes-cni socat 0 upgraded, 7 newly installed, 0 to remove and 0 not upgraded. Need to get 37.6 MB of archives. After this operation, 261 MB of additional disk space will be used. Get:2 http://mirrors.aliyun.com/ubuntu xenial/main amd64 ebtables amd64 2.0.10.4-3.4ubuntu1 [79.6 kB] Get:6 http://mirrors.aliyun.com/ubuntu xenial/main amd64 ethtool amd64 1:4.5-1 [97.5 kB] Get:7 http://mirrors.aliyun.com/ubuntu xenial/universe amd64 socat amd64 1.7.3.1-1 [321 kB] Get:1 https://packages.cloud.google.com/apt kubernetes-xenial/main amd64 kubernetes-cni amd64 0.3.0.1-07a8a2-00 [6,877 kB] Get:3 https://packages.cloud.google.com/apt kubernetes-xenial/main amd64 kubelet amd64 1.5.1-00 [15.1 MB] Get:4 https://packages.cloud.google.com/apt kubernetes-xenial/main amd64 kubectl amd64 1.5.1-00 [7,954 kB] Get:5 https://packages.cloud.google.com/apt kubernetes-xenial/main amd64 kubeadm amd64 1.6.0-alpha.0-2074-a092d8e0f95f52-00 [7,120 kB] Fetched 37.6 MB in 36s (1,026 kB/s) ... ... Unpacking kubeadm (1.6.0-alpha.0-2074-a092d8e0f95f52-00) ... Processing triggers for systemd (229-4ubuntu13) ... Processing triggers for ureadahead (0.100.0-19) ... Processing triggers for man-db (2.7.5-1) ... Setting up ebtables (2.0.10.4-3.4ubuntu1) ... update-rc.d: warning: start and stop actions are no longer supported; falling back to defaults Setting up ethtool (1:4.5-1) ... Setting up kubernetes-cni (0.3.0.1-07a8a2-00) ... Setting up socat (1.7.3.1-1) ... Setting up kubelet (1.5.1-00) ... Setting up kubectl (1.5.1-00) ... Setting up kubeadm (1.6.0-alpha.0-2074-a092d8e0f95f52-00) ... Processing triggers for systemd (229-4ubuntu13) ... Processing triggers for ureadahead (0.100.0-19) ... ... ...
下載後的kube組件並未自動運行起來。在 /lib/systemd/system下面咱們能看到kubelet.service:
# ls /lib/systemd/system|grep kube kubelet.service //kubelet.service [Unit] Description=kubelet: The Kubernetes Node Agent Documentation=http://kubernetes.io/docs/ [Service] ExecStart=/usr/bin/kubelet Restart=always StartLimitInterval=0 RestartSec=10 [Install] WantedBy=multi-user.target
kubelet的版本:
# kubelet --version Kubernetes v1.5.1
k8s的核心組件都有了,接下來咱們就要boostrap kubernetes cluster了。同時,問題也就隨之而來了,而這些問題以及問題的解決纔是本篇要說明的重點。
前面說過,理論上經過kubeadm使用init和join命令便可創建一個集羣,這init就是在master節點對集羣進行初始化。和k8s 1.4以前的部署方式不一樣的是,kubeadm安裝的k8s核心組件都是以容器的形式運行於master node上的。所以在kubeadm init以前,最好給master node上的docker engine掛上加速器代理,由於kubeadm要從gcr.io/google_containers repository中pull許多核心組件的images,大約有以下一些:
gcr.io/google_containers/kube-controller-manager-amd64 v1.5.1 cd5684031720 2 weeks ago 102.4 MB gcr.io/google_containers/kube-apiserver-amd64 v1.5.1 8c12509df629 2 weeks ago 124.1 MB gcr.io/google_containers/kube-proxy-amd64 v1.5.1 71d2b27b03f6 2 weeks ago 175.6 MB gcr.io/google_containers/kube-scheduler-amd64 v1.5.1 6506e7b74dac 2 weeks ago 53.97 MB gcr.io/google_containers/etcd-amd64 3.0.14-kubeadm 856e39ac7be3 5 weeks ago 174.9 MB gcr.io/google_containers/kubedns-amd64 1.9 26cf1ed9b144 5 weeks ago 47 MB gcr.io/google_containers/dnsmasq-metrics-amd64 1.0 5271aabced07 7 weeks ago 14 MB gcr.io/google_containers/kube-dnsmasq-amd64 1.4 3ec65756a89b 3 months ago 5.13 MB gcr.io/google_containers/kube-discovery-amd64 1.0 c5e0c9a457fc 3 months ago 134.2 MB gcr.io/google_containers/exechealthz-amd64 1.2 93a43bfb39bf 3 months ago 8.375 MB gcr.io/google_containers/pause-amd64 3.0 99e59f495ffa 7 months ago 746.9 kB
在Kubeadm的文檔中,Pod Network的安裝是做爲一個單獨的步驟的。kubeadm init並無爲你選擇一個默認的Pod network進行安裝。咱們將首選Flannel 做爲咱們的Pod network,這不只是由於咱們的上一個集羣用的就是flannel,並且表現穩定。更是因爲Flannel就是coreos爲k8s打造的專屬overlay network add-ons。甚至於flannel repository的readme.md都這樣寫着:「flannel is a network fabric for containers, designed for Kubernetes」。若是咱們要使用Flannel,那麼在執行init時,按照kubeadm文檔要求,咱們必須給init命令帶上option:–pod-network-cidr=10.244.0.0/16。
執行kubeadm init命令:
# kubeadm init --pod-network-cidr=10.244.0.0/16 [kubeadm] WARNING: kubeadm is in alpha, please do not use it for production clusters. [preflight] Running pre-flight checks [preflight] Starting the kubelet service [init] Using Kubernetes version: v1.5.1 [tokens] Generated token: "2e7da9.7fc5668ff26430c7" [certificates] Generated Certificate Authority key and certificate. [certificates] Generated API Server key and certificate [certificates] Generated Service Account signing keys [certificates] Created keys and certificates in "/etc/kubernetes/pki" [kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/kubelet.conf" [kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/admin.conf" [apiclient] Created API client, waiting for the control plane to become ready //若是沒有掛加速器,可能會在這裏hang住。 [apiclient] All control plane components are healthy after 54.789750 seconds [apiclient] Waiting for at least one node to register and become ready [apiclient] First node is ready after 1.003053 seconds [apiclient] Creating a test deployment [apiclient] Test deployment succeeded [token-discovery] Created the kube-discovery deployment, waiting for it to become ready [token-discovery] kube-discovery is ready after 62.503441 seconds [addons] Created essential addon: kube-proxy [addons] Created essential addon: kube-dns Your Kubernetes master has initialized successfully! You should now deploy a pod network to the cluster. Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at: http://kubernetes.io/docs/admin/addons/ You can now join any number of machines by running the following on each node: kubeadm join --token=2e7da9.7fc5668ff26430c7 123.56.200.187
init成功後的master node有啥變化?k8s的核心組件均正常啓動:
# ps -ef|grep kube root 2477 2461 1 16:36 ? 00:00:04 kube-proxy --kubeconfig=/run/kubeconfig root 30860 1 12 16:33 ? 00:01:09 /usr/bin/kubelet --kubeconfig=/etc/kubernetes/kubelet.conf --require-kubeconfig=true --pod-manifest-path=/etc/kubernetes/manifests --allow-privileged=true --network-plugin=cni --cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/cni/bin --cluster-dns=10.96.0.10 --cluster-domain=cluster.local root 30952 30933 0 16:33 ? 00:00:01 kube-scheduler --address=127.0.0.1 --leader-elect --master=127.0.0.1:8080 root 31128 31103 2 16:33 ? 00:00:11 kube-controller-manager --address=127.0.0.1 --leader-elect --master=127.0.0.1:8080 --cluster-name=kubernetes --root-ca-file=/etc/kubernetes/pki/ca.pem --service-account-private-key-file=/etc/kubernetes/pki/apiserver-key.pem --cluster-signing-cert-file=/etc/kubernetes/pki/ca.pem --cluster-signing-key-file=/etc/kubernetes/pki/ca-key.pem --insecure-experimental-approve-all-kubelet-csrs-for-group=system:kubelet-bootstrap --allocate-node-cidrs=true --cluster-cidr=10.244.0.0/16 root 31223 31207 2 16:34 ? 00:00:10 kube-apiserver --insecure-bind-address=127.0.0.1 --admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,ResourceQuota --service-cluster-ip-range=10.96.0.0/12 --service-account-key-file=/etc/kubernetes/pki/apiserver-key.pem --client-ca-file=/etc/kubernetes/pki/ca.pem --tls-cert-file=/etc/kubernetes/pki/apiserver.pem --tls-private-key-file=/etc/kubernetes/pki/apiserver-key.pem --token-auth-file=/etc/kubernetes/pki/tokens.csv --secure-port=6443 --allow-privileged --advertise-address=123.56.200.187 --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --anonymous-auth=false --etcd-servers=http://127.0.0.1:2379 root 31491 31475 0 16:35 ? 00:00:00 /usr/local/bin/kube-discovery
並且是多以container的形式啓動:
# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES c16c442b7eca gcr.io/google_containers/kube-proxy-amd64:v1.5.1 "kube-proxy --kubecon" 6 minutes ago Up 6 minutes k8s_kube-proxy.36dab4e8_kube-proxy-sb4sm_kube-system_43fb1a2c-cb46-11e6-ad8f-00163e1001d7_2ba1648e 9f73998e01d7 gcr.io/google_containers/kube-discovery-amd64:1.0 "/usr/local/bin/kube-" 8 minutes ago Up 8 minutes k8s_kube-discovery.7130cb0a_kube-discovery-1769846148-6z5pw_kube-system_1eb97044-cb46-11e6-ad8f-00163e1001d7_fd49c2e3 dd5412e5e15c gcr.io/google_containers/kube-apiserver-amd64:v1.5.1 "kube-apiserver --ins" 9 minutes ago Up 9 minutes k8s_kube-apiserver.1c5a91d9_kube-apiserver-iz25beglnhtz_kube-system_eea8df1717e9fea18d266103f9edfac3_8cae8485 60017f8819b2 gcr.io/google_containers/etcd-amd64:3.0.14-kubeadm "etcd --listen-client" 9 minutes ago Up 9 minutes k8s_etcd.c323986f_etcd-iz25beglnhtz_kube-system_3a26566bb004c61cd05382212e3f978f_06d517eb 03c2463aba9c gcr.io/google_containers/kube-controller-manager-amd64:v1.5.1 "kube-controller-mana" 9 minutes ago Up 9 minutes k8s_kube-controller-manager.d30350e1_kube-controller-manager-iz25beglnhtz_kube-system_9a40791dd1642ea35c8d95c9e610e6c1_3b05cb8a fb9a724540a7 gcr.io/google_containers/kube-scheduler-amd64:v1.5.1 "kube-scheduler --add" 9 minutes ago Up 9 minutes k8s_kube-scheduler.ef325714_kube-scheduler-iz25beglnhtz_kube-system_dc58861a0991f940b0834f8a110815cb_9b3ccda2 .... ...
不過這些核心組件並非跑在pod network中的(沒錯,此時的pod network尚未建立),而是採用了host network。以kube-apiserver的pod信息爲例:
kube-system kube-apiserver-iz25beglnhtz 1/1 Running 0 1h 10.47.217.91 iz25beglnhtz
kube-apiserver的IP是host ip,從而推斷容器使用的是host網絡,這從其對應的pause容器的network屬性就能夠看出:
# docker ps |grep apiserver a5a76bc59e38 gcr.io/google_containers/kube-apiserver-amd64:v1.5.1 "kube-apiserver --ins" About an hour ago Up About an hour k8s_kube-apiserver.2529402_kube-apiserver-iz25beglnhtz_kube-system_25d646be9a0092138dc6088fae6f1656_ec0079fc ef4d3bf057a6 gcr.io/google_containers/pause-amd64:3.0 "/pause" About an hour ago Up About an hour k8s_POD.d8dbe16c_kube-apiserver-iz25beglnhtz_kube-system_25d646be9a0092138dc6088fae6f1656_bbfd8a31
inspect pause容器,能夠看到pause container的NetworkMode的值:
"NetworkMode": "host",
若是kubeadm init執行過程當中途出現了什麼問題,好比前期忘記掛加速器致使init hang住,你可能會ctrl+c退出init執行。從新配置後,再執行kubeadm init,這時你可能會遇到下面kubeadm的輸出:
# kubeadm init --pod-network-cidr=10.244.0.0/16 [kubeadm] WARNING: kubeadm is in alpha, please do not use it for production clusters. [preflight] Running pre-flight checks [preflight] Some fatal errors occurred: Port 10250 is in use /etc/kubernetes/manifests is not empty /etc/kubernetes/pki is not empty /var/lib/kubelet is not empty /etc/kubernetes/admin.conf already exists /etc/kubernetes/kubelet.conf already exists [preflight] If you know what you are doing, you can skip pre-flight checks with `--skip-preflight-checks`
kubeadm會自動檢查當前環境是否有上次命令執行的「殘留」。若是有,必須清理後再行執行init。咱們能夠經過」kubeadm reset」來清理環境,以備重來。
# kubeadm reset [preflight] Running pre-flight checks [reset] Draining node: "iz25beglnhtz" [reset] Removing node: "iz25beglnhtz" [reset] Stopping the kubelet service [reset] Unmounting mounted directories in "/var/lib/kubelet" [reset] Removing kubernetes-managed containers [reset] Deleting contents of stateful directories: [/var/lib/kubelet /etc/cni/net.d /var/lib/etcd] [reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki] [reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf]
kubeadm init以後,若是你探索一下當前cluster的狀態或者核心組件的日誌,你會發現某些「異常」,好比:從kubelet的日誌中咱們能夠看到一直刷屏的錯誤信息:
Dec 26 16:36:48 iZ25beglnhtZ kubelet[30860]: E1226 16:36:48.365885 30860 docker_manager.go:2201] Failed to setup network for pod "kube-dns-2924299975-pddz5_kube-system(43fd7264-cb46-11e6-ad8f-00163e1001d7)" using network plugins "cni": cni config unintialized; Skipping pod
經過命令kubectl get pod –all-namespaces -o wide,你也會發現kube-dns pod處於ContainerCreating狀態。
這些都不打緊,由於咱們尚未爲cluster安裝Pod network呢。前面說過,咱們要使用Flannel網絡,所以咱們須要執行以下安裝命令:
#kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml configmap "kube-flannel-cfg" created daemonset "kube-flannel-ds" created
稍等片刻,咱們再來看master node上的cluster信息:
# ps -ef|grep kube|grep flannel root 6517 6501 0 17:20 ? 00:00:00 /opt/bin/flanneld --ip-masq --kube-subnet-mgr root 6573 6546 0 17:20 ? 00:00:00 /bin/sh -c set -e -x; cp -f /etc/kube-flannel/cni-conf.json /etc/cni/net.d/10-flannel.conf; while true; do sleep 3600; done # kubectl get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-system dummy-2088944543-s0c5g 1/1 Running 0 50m kube-system etcd-iz25beglnhtz 1/1 Running 0 50m kube-system kube-apiserver-iz25beglnhtz 1/1 Running 0 50m kube-system kube-controller-manager-iz25beglnhtz 1/1 Running 0 50m kube-system kube-discovery-1769846148-6z5pw 1/1 Running 0 50m kube-system kube-dns-2924299975-pddz5 4/4 Running 0 49m kube-system kube-flannel-ds-5ww9k 2/2 Running 0 4m kube-system kube-proxy-sb4sm 1/1 Running 0 49m kube-system kube-scheduler-iz25beglnhtz 1/1 Running 0 49m
至少集羣的核心組件已經所有run起來了。看起來彷佛是成功了。
接下來,就該minion node加入cluster了。這裏咱們用到了kubeadm的第二個命令:kubeadm join。
在minion node上執行(注意:這裏要保證master node的9898端口在防火牆是打開的):
# kubeadm join --token=2e7da9.7fc5668ff26430c7 123.56.200.187 [kubeadm] WARNING: kubeadm is in alpha, please do not use it for production clusters. [preflight] Running pre-flight checks [tokens] Validating provided token [discovery] Created cluster info discovery client, requesting info from "http://123.56.200.187:9898/cluster-info/v1/?token-id=2e7da9" [discovery] Cluster info object received, verifying signature using given token [discovery] Cluster info signature and contents are valid, will use API endpoints [https://123.56.200.187:6443] [bootstrap] Trying to connect to endpoint https://123.56.200.187:6443 [bootstrap] Detected server version: v1.5.1 [bootstrap] Successfully established connection with endpoint "https://123.56.200.187:6443" [csr] Created API client to obtain unique certificate for this node, generating keys and certificate signing request [csr] Received signed certificate from the API server: Issuer: CN=kubernetes | Subject: CN=system:node:iZ2ze39jeyizepdxhwqci6Z | CA: false Not before: 2016-12-26 09:31:00 +0000 UTC Not After: 2017-12-26 09:31:00 +0000 UTC [csr] Generating kubelet configuration [kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/kubelet.conf" Node join complete: * Certificate signing request sent to master and response received. * Kubelet informed of new secure connection details. Run 'kubectl get nodes' on the master to see this machine join.
也很順利。咱們在minion node上看到的k8s組件狀況以下:
d85cf36c18ed gcr.io/google_containers/kube-proxy-amd64:v1.5.1 "kube-proxy --kubecon" About an hour ago Up About an hour k8s_kube-proxy.36dab4e8_kube-proxy-lsn0t_kube-system_b8eddf1c-cb4e-11e6-ad8f-00163e1001d7_5826f32b a60e373b48b8 gcr.io/google_containers/pause-amd64:3.0 "/pause" About an hour ago Up About an hour k8s_POD.d8dbe16c_kube-proxy-lsn0t_kube-system_b8eddf1c-cb4e-11e6-ad8f-00163e1001d7_46bfcf67 a665145eb2b5 quay.io/coreos/flannel-git:v0.6.1-28-g5dde68d-amd64 "/bin/sh -c 'set -e -" About an hour ago Up About an hour k8s_install-cni.17d8cf2_kube-flannel-ds-tr8zr_kube-system_06eca729-cb72-11e6-ad8f-00163e1001d7_01e12f61 5b46f2cb0ccf gcr.io/google_containers/pause-amd64:3.0 "/pause" About an hour ago Up About an hour k8s_POD.d8dbe16c_kube-flannel-ds-tr8zr_kube-system_06eca729-cb72-11e6-ad8f-00163e1001d7_ac880d20
咱們在master node上查看當前cluster狀態:
# kubectl get nodes NAME STATUS AGE iz25beglnhtz Ready,master 1h iz2ze39jeyizepdxhwqci6z Ready 21s
k8s cluster建立」成功」!真的成功了嗎?「折騰」纔剛剛開始:(!
Join成功所帶來的「餘溫」還未散去,我就發現了Flannel pod network的問題,troubleshooting正式開始:(。
剛join時還好好的,可過了沒一下子,咱們就發如今kubectl get pod –all-namespaces中有錯誤出現:
kube-system kube-flannel-ds-tr8zr 1/2 CrashLoopBackOff 189 16h
咱們發現這是minion node上的flannel pod中的一個container出錯致使的,跟蹤到的具體錯誤以下:
# docker logs bc0058a15969 E1227 06:17:50.605110 1 main.go:127] Failed to create SubnetManager: error retrieving pod spec for 'kube-system/kube-flannel-ds-tr8zr': Get https://10.96.0.1:443/api/v1/namespaces/kube-system/pods/kube-flannel-ds-tr8zr: dial tcp 10.96.0.1:443: i/o timeout
10.96.0.1是pod network中apiserver service的cluster ip,而minion node上的flannel組件竟然沒法訪問到這個cluster ip!這個問題的奇怪之處還在於,有些時候這個Pod在被調度restart N屢次後或者被刪除重啓後,又忽然變爲running狀態了,行爲十分怪異。
在flannel github.com issues中,至少有兩個open issue與此問題有密切關係:
https://github.com/coreos/flannel/issues/545
https://github.com/coreos/flannel/issues/535
這個問題暫無明確解。當minion node上的flannel pod自恢復爲running狀態時,咱們又能夠繼續了。
在下面issue中,不少developer討論了minion node上flannel pod啓動失敗的一種可能緣由以及臨時應對方法:
https://github.com/kubernetes/kubernetes/issues/34101
這種說法大體就是minion node上的kube-proxy使用了錯誤的interface,經過下面方法能夠fix這個問題。在minion node上執行:
# kubectl -n kube-system get ds -l 'component=kube-proxy' -o json | jq '.items[0].spec.template.spec.containers[0].command |= .+ ["--cluster-cidr=10.244.0.0/16"]' | kubectl apply -f - && kubectl -n kube-system delete pods -l 'component=kube-proxy' daemonset "kube-proxy" configured pod "kube-proxy-lsn0t" deleted pod "kube-proxy-sb4sm" deleted
執行後,flannel pod的狀態:
kube-system kube-flannel-ds-qw291 2/2 Running 8 17h kube-system kube-flannel-ds-x818z 2/2 Running 17 1h
通過17次restart,minion node上的flannel pod 啓動ok了。其對應的flannel container啓動日誌以下:
# docker logs 1f64bd9c0386 I1227 07:43:26.670620 1 main.go:132] Installing signal handlers I1227 07:43:26.671006 1 manager.go:133] Determining IP address of default interface I1227 07:43:26.670825 1 kube.go:233] starting kube subnet manager I1227 07:43:26.671514 1 manager.go:163] Using 59.110.67.15 as external interface I1227 07:43:26.671575 1 manager.go:164] Using 59.110.67.15 as external endpoint I1227 07:43:26.746811 1 ipmasq.go:47] Adding iptables rule: -s 10.244.0.0/16 -d 10.244.0.0/16 -j RETURN I1227 07:43:26.749785 1 ipmasq.go:47] Adding iptables rule: -s 10.244.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE I1227 07:43:26.752343 1 ipmasq.go:47] Adding iptables rule: ! -s 10.244.0.0/16 -d 10.244.0.0/16 -j MASQUERADE I1227 07:43:26.755126 1 manager.go:246] Lease acquired: 10.244.1.0/24 I1227 07:43:26.755444 1 network.go:58] Watching for L3 misses I1227 07:43:26.755475 1 network.go:66] Watching for new subnet leases I1227 07:43:27.755830 1 network.go:153] Handling initial subnet events I1227 07:43:27.755905 1 device.go:163] calling GetL2List() dev.link.Index: 10 I1227 07:43:27.756099 1 device.go:168] calling NeighAdd: 123.56.200.187, ca:68:7c:9b:cc:67
issue中說到,在kubeadm init時,顯式地指定–advertise-address將會避免這個問題。不過目前不要在–advertise-address後面寫上多個IP,雖然文檔上說是支持的,但實際狀況是,當你顯式指定–advertise-address的值爲兩個或兩個以上IP時,好比下面這樣:
#kubeadm init --api-advertise-addresses=10.47.217.91,123.56.200.187 --pod-network-cidr=10.244.0.0/16
master初始化成功後,當minion node執行join cluster命令時,會panic掉:
# kubeadm join --token=92e977.f1d4d090906fc06a 10.47.217.91 [kubeadm] WARNING: kubeadm is in alpha, please do not use it for production clusters. ... ... [bootstrap] Successfully established connection with endpoint "https://10.47.217.91:6443" [bootstrap] Successfully established connection with endpoint "https://123.56.200.187:6443" E1228 10:14:05.405294 28378 runtime.go:64] Observed a panic: "close of closed channel" (close of closed channel) /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/util/runtime/runtime.go:70 /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/util/runtime/runtime.go:63 /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/util/runtime/runtime.go:49 /usr/local/go/src/runtime/asm_amd64.s:479 /usr/local/go/src/runtime/panic.go:458 /usr/local/go/src/runtime/chan.go:311 /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/node/bootstrap.go:85 /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/util/wait/wait.go:96 /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/util/wait/wait.go:97 /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/util/wait/wait.go:52 /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/node/bootstrap.go:93 /usr/local/go/src/runtime/asm_amd64.s:2086 [csr] Created API client to obtain unique certificate for this node, generating keys and certificate signing request panic: close of closed channel [recovered] panic: close of closed channel goroutine 29 [running]: panic(0x1342de0, 0xc4203eebf0) /usr/local/go/src/runtime/panic.go:500 +0x1a1 k8s.io/kubernetes/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0) /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/util/runtime/runtime.go:56 +0x126 panic(0x1342de0, 0xc4203eebf0) /usr/local/go/src/runtime/panic.go:458 +0x243 k8s.io/kubernetes/cmd/kubeadm/app/node.EstablishMasterConnection.func1.1() /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/node/bootstrap.go:85 +0x29d k8s.io/kubernetes/pkg/util/wait.JitterUntil.func1(0xc420563ee0) /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/util/wait/wait.go:96 +0x5e k8s.io/kubernetes/pkg/util/wait.JitterUntil(0xc420563ee0, 0x12a05f200, 0x0, 0xc420022e01, 0xc4202c2060) /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/util/wait/wait.go:97 +0xad k8s.io/kubernetes/pkg/util/wait.Until(0xc420563ee0, 0x12a05f200, 0xc4202c2060) /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/util/wait/wait.go:52 +0x4d k8s.io/kubernetes/cmd/kubeadm/app/node.EstablishMasterConnection.func1(0xc4203a82f0, 0xc420269b90, 0xc4202c2060, 0xc4202c20c0, 0xc4203d8d80, 0x401, 0x480, 0xc4201e75e0, 0x17, 0xc4201e7560, ...) /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/node/bootstrap.go:93 +0x100 created by k8s.io/kubernetes/cmd/kubeadm/app/node.EstablishMasterConnection /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/node/bootstrap.go:94 +0x3ed
關於join panic這個問題,在這個issue中有詳細討論:https://github.com/kubernetes/kubernetes/issues/36988
前面說過,默認狀況下,考慮安全緣由,master node是不承擔work load的,不參與pod調度。咱們這裏機器少,只能讓master node也辛苦一下。經過下面這個命令可讓master node也參與pod調度:
# kubectl taint nodes --all dedicated- node "iz25beglnhtz" tainted
接下來,咱們create一個deployment,manifest描述文件以下:
//run-my-nginx.yaml apiVersion: extensions/v1beta1 kind: Deployment metadata: name: my-nginx spec: replicas: 2 template: metadata: labels: run: my-nginx spec: containers: - name: my-nginx image: nginx:1.10.1 ports: - containerPort: 80
create後,咱們發現調度到master上的my-nginx pod啓動是ok的,但minion node上的pod則一直失敗,查看到的失敗緣由以下:
Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 28s 28s 1 {default-scheduler } Normal Scheduled Successfully assigned my-nginx-2560993602-0440x to iz2ze39jeyizepdxhwqci6z 27s 1s 26 {kubelet iz2ze39jeyizepdxhwqci6z} Warning FailedSync Error syncing pod, skipping: failed to "SetupNetwork" for "my-nginx-2560993602-0440x_default" with SetupNetworkError: "Failed to setup network for pod \"my-nginx-2560993602-0440x_default(ba5ce554-cbf1-11e6-8c42-00163e1001d7)\" using network plugins \"cni\": open /run/flannel/subnet.env: no such file or directory; Skipping pod"
在minion node上的確沒有找到/run/flannel/subnet.env該文件。但master node上有這個文件:
// /run/flannel/subnet.env FLANNEL_NETWORK=10.244.0.0/16 FLANNEL_SUBNET=10.244.0.1/24 FLANNEL_MTU=1450 FLANNEL_IPMASQ=true
因而手動在minion node上建立一份/run/flannel/subnet.env,並複製master node同名文件的內容,保存。稍許片刻,minion node上的my-nginx pod從error變成running了。
將以前的一個my-nginx deployment的replicas改成3,並建立基於該deployment中pods的my-nginx service:
//my-nginx-svc.yaml apiVersion: v1 kind: Service metadata: name: my-nginx labels: run: my-nginx spec: type: NodePort ports: - port: 80 nodePort: 30062 protocol: TCP selector: run: my-nginx
修改後,經過curl localhost:30062測試服務連通性。發現經過VIP負載均衡到master node上的my-nginx pod的request都成功獲得了Response,可是負載均衡到minion node上pod的request,則阻塞在那裏,直到timeout。查看pod信息才發現,原來新調度到minion node上的my-nginx pod並無啓動ok,錯誤緣由以下:
Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 2m 2m 1 {default-scheduler } Normal Scheduled Successfully assigned my-nginx-1948696469-ph11m to iz2ze39jeyizepdxhwqci6z 2m 0s 177 {kubelet iz2ze39jeyizepdxhwqci6z} Warning FailedSync Error syncing pod, skipping: failed to "SetupNetwork" for "my-nginx-1948696469-ph11m_default" with SetupNetworkError: "Failed to setup network for pod \"my-nginx-1948696469-ph11m_default(3700d74a-cc12-11e6-8c42-00163e1001d7)\" using network plugins \"cni\": no IP addresses available in network: cbr0; Skipping pod"
查看minion node上/var/lib/cni/networks/cbr0目錄,發現該目錄下有以下文件:
10.244.1.10 10.244.1.12 10.244.1.14 10.244.1.16 10.244.1.18 10.244.1.2 10.244.1.219 10.244.1.239 10.244.1.3 10.244.1.5 10.244.1.7 10.244.1.9 10.244.1.100 10.244.1.120 10.244.1.140 10.244.1.160 10.244.1.180 10.244.1.20 10.244.1.22 10.244.1.24 10.244.1.30 10.244.1.50 10.244.1.70 10.244.1.90 10.244.1.101 10.244.1.121 10.244.1.141 10.244.1.161 10.244.1.187 10.244.1.200 10.244.1.220 10.244.1.240 10.244.1.31 10.244.1.51 10.244.1.71 10.244.1.91 10.244.1.102 10.244.1.122 10.244.1.142 10.244.1.162 10.244.1.182 10.244.1.201 10.244.1.221 10.244.1.241 10.244.1.32 10.244.1.52 10.244.1.72 10.244.1.92 10.244.1.103 10.244.1.123 10.244.1.143 10.244.1.163 10.244.1.183 10.244.1.202 10.244.1.222 10.244.1.242 10.244.1.33 10.244.1.53 10.244.1.73 10.244.1.93 10.244.1.104 10.244.1.124 10.244.1.144 10.244.1.164 10.244.1.184 10.244.1.203 10.244.1.223 10.244.1.243 10.244.1.34 10.244.1.54 10.244.1.74 10.244.1.94 10.244.1.105 10.244.1.125 10.244.1.145 10.244.1.165 10.244.1.185 10.244.1.204 10.244.1.224 10.244.1.244 10.244.1.35 10.244.1.55 10.244.1.75 10.244.1.95 10.244.1.106 10.244.1.126 10.244.1.146 10.244.1.166 10.244.1.186 10.244.1.205 10.244.1.225 10.244.1.245 10.244.1.36 10.244.1.56 10.244.1.76 10.244.1.96 10.244.1.107 10.244.1.127 10.244.1.147 10.244.1.167 10.244.1.187 10.244.1.206 10.244.1.226 10.244.1.246 10.244.1.37 10.244.1.57 10.244.1.77 10.244.1.97 10.244.1.108 10.244.1.128 10.244.1.148 10.244.1.168 10.244.1.188 10.244.1.207 10.244.1.227 10.244.1.247 10.244.1.38 10.244.1.58 10.244.1.78 10.244.1.98 10.244.1.109 10.244.1.129 10.244.1.149 10.244.1.169 10.244.1.189 10.244.1.208 10.244.1.228 10.244.1.248 10.244.1.39 10.244.1.59 10.244.1.79 10.244.1.99 10.244.1.11 10.244.1.13 10.244.1.15 10.244.1.17 10.244.1.19 10.244.1.209 10.244.1.229 10.244.1.249 10.244.1.4 10.244.1.6 10.244.1.8 last_reserved_ip 10.244.1.110 10.244.1.130 10.244.1.150 10.244.1.170 10.244.1.190 10.244.1.21 10.244.1.23 10.244.1.25 10.244.1.40 10.244.1.60 10.244.1.80 10.244.1.111 10.244.1.131 10.244.1.151 10.244.1.171 10.244.1.191 10.244.1.210 10.244.1.230 10.244.1.250 10.244.1.41 10.244.1.61 10.244.1.81 10.244.1.112 10.244.1.132 10.244.1.152 10.244.1.172 10.244.1.192 10.244.1.211 10.244.1.231 10.244.1.251 10.244.1.42 10.244.1.62 10.244.1.82 10.244.1.113 10.244.1.133 10.244.1.153 10.244.1.173 10.244.1.193 10.244.1.212 10.244.1.232 10.244.1.252 10.244.1.43 10.244.1.63 10.244.1.83 10.244.1.114 10.244.1.134 10.244.1.154 10.244.1.174 10.244.1.194 10.244.1.213 10.244.1.233 10.244.1.253 10.244.1.44 10.244.1.64 10.244.1.84 10.244.1.115 10.244.1.135 10.244.1.155 10.244.1.175 10.244.1.195 10.244.1.214 10.244.1.234 10.244.1.254 10.244.1.45 10.244.1.65 10.244.1.85 10.244.1.116 10.244.1.136 10.244.1.156 10.244.1.176 10.244.1.196 10.244.1.215 10.244.1.235 10.244.1.26 10.244.1.46 10.244.1.66 10.244.1.86 10.244.1.117 10.244.1.137 10.244.1.157 10.244.1.177 10.244.1.197 10.244.1.216 10.244.1.236 10.244.1.27 10.244.1.47 10.244.1.67 10.244.1.87 10.244.1.118 10.244.1.138 10.244.1.158 10.244.1.178 10.244.1.198 10.244.1.217 10.244.1.237 10.244.1.28 10.244.1.48 10.244.1.68 10.244.1.88 10.244.1.119 10.244.1.139 10.244.1.159 10.244.1.179 10.244.1.199 10.244.1.218 10.244.1.238 10.244.1.29 10.244.1.49 10.244.1.69 10.244.1.89
這已經將10.244.1.x段的全部ip佔滿,天然沒有available的IP可供新pod使用了。至於爲什麼佔滿,這個緣由尚不明朗。下面兩個open issue與這個問題相關:
https://github.com/containernetworking/cni/issues/306
https://github.com/kubernetes/kubernetes/issues/21656
進入到/var/lib/cni/networks/cbr0目錄下,執行下面命令能夠釋放那些多是kubelet leak的IP資源:
for hash in $(tail -n +1 * | grep '^[A-Za-z0-9]*$' | cut -c 1-8); do if [ -z $(docker ps -a | grep $hash | awk '{print $1}') ]; then grep -irl $hash ./; fi; done | xargs rm
執行後,目錄下的文件列表變成了:
ls -l total 32 drw-r--r-- 2 root root 12288 Dec 27 17:11 ./ drw-r--r-- 3 root root 4096 Dec 27 13:52 ../ -rw-r--r-- 1 root root 64 Dec 27 17:11 10.244.1.2 -rw-r--r-- 1 root root 64 Dec 27 17:11 10.244.1.3 -rw-r--r-- 1 root root 64 Dec 27 17:11 10.244.1.4 -rw-r--r-- 1 root root 10 Dec 27 17:11 last_reserved_ip
不過pod仍然處於失敗狀態,但此次失敗的緣由又發生了變化:
Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 23s 23s 1 {default-scheduler } Normal Scheduled Successfully assigned my-nginx-1948696469-7p4nn to iz2ze39jeyizepdxhwqci6z 22s 1s 22 {kubelet iz2ze39jeyizepdxhwqci6z} Warning FailedSync Error syncing pod, skipping: failed to "SetupNetwork" for "my-nginx-1948696469-7p4nn_default" with SetupNetworkError: "Failed to setup network for pod \"my-nginx-1948696469-7p4nn_default(a40fe652-cc14-11e6-8c42-00163e1001d7)\" using network plugins \"cni\": \"cni0\" already has an IP address different from 10.244.1.1/24; Skipping pod"
而/var/lib/cni/networks/cbr0目錄下的文件又開始迅速增長!問題陷入僵局。
折騰到這裏,基本筋疲力盡了。因而在兩個node上執行kubeadm reset,準備從新來過。
kubeadm reset後,以前flannel建立的bridge device cni0和網口設備flannel.1依然健在。爲了保證環境完全恢復到初始狀態,咱們能夠經過下面命令刪除這兩個設備:
# ifconfig cni0 down # brctl delbr cni0 # ip link delete flannel.1
有了前面幾個問題的「磨鍊」後,從新init和join的k8s cluster顯得格外順利。此次minion node沒有再出現什麼異常。
# kubectl get nodes -o wide NAME STATUS AGE EXTERNAL-IP iz25beglnhtz Ready,master 5m <none> iz2ze39jeyizepdxhwqci6z Ready 51s <none> # kubectl get pod --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE default my-nginx-1948696469-71h1l 1/1 Running 0 3m default my-nginx-1948696469-zwt5g 1/1 Running 0 3m default my-ubuntu-2560993602-ftdm6 1/1 Running 0 3m kube-system dummy-2088944543-lmlbh 1/1 Running 0 5m kube-system etcd-iz25beglnhtz 1/1 Running 0 6m kube-system kube-apiserver-iz25beglnhtz 1/1 Running 0 6m kube-system kube-controller-manager-iz25beglnhtz 1/1 Running 0 6m kube-system kube-discovery-1769846148-l5lfw 1/1 Running 0 5m kube-system kube-dns-2924299975-mdq5r 4/4 Running 0 5m kube-system kube-flannel-ds-9zwr1 2/2 Running 0 5m kube-system kube-flannel-ds-p7xh2 2/2 Running 0 1m kube-system kube-proxy-dwt5f 1/1 Running 0 5m kube-system kube-proxy-vm6v2 1/1 Running 0 1m kube-system kube-scheduler-iz25beglnhtz 1/1 Running 0 6m
接下來咱們建立my-nginx deployment和service來測試flannel網絡的連通性。經過curl my-nginx service的nodeport,發現能夠reach master上的兩個nginx pod,可是minion node上的pod依舊不通。
在master上看flannel docker的日誌:
I1228 02:52:22.097083 1 network.go:225] L3 miss: 10.244.1.2 I1228 02:52:22.097169 1 device.go:191] calling NeighSet: 10.244.1.2, 46:6c:7a:a6:06:60 I1228 02:52:22.097335 1 network.go:236] AddL3 succeeded I1228 02:52:55.169952 1 network.go:220] Ignoring not a miss: 46:6c:7a:a6:06:60, 10.244.1.2 I1228 02:53:00.801901 1 network.go:220] Ignoring not a miss: 46:6c:7a:a6:06:60, 10.244.1.2 I1228 02:53:03.801923 1 network.go:220] Ignoring not a miss: 46:6c:7a:a6:06:60, 10.244.1.2 I1228 02:53:04.801764 1 network.go:220] Ignoring not a miss: 46:6c:7a:a6:06:60, 10.244.1.2 I1228 02:53:05.801848 1 network.go:220] Ignoring not a miss: 46:6c:7a:a6:06:60, 10.244.1.2 I1228 02:53:06.888269 1 network.go:225] L3 miss: 10.244.1.2 I1228 02:53:06.888340 1 device.go:191] calling NeighSet: 10.244.1.2, 46:6c:7a:a6:06:60 I1228 02:53:06.888507 1 network.go:236] AddL3 succeeded I1228 02:53:39.969791 1 network.go:220] Ignoring not a miss: 46:6c:7a:a6:06:60, 10.244.1.2 I1228 02:53:45.153770 1 network.go:220] Ignoring not a miss: 46:6c:7a:a6:06:60, 10.244.1.2 I1228 02:53:48.154822 1 network.go:220] Ignoring not a miss: 46:6c:7a:a6:06:60, 10.244.1.2 I1228 02:53:49.153774 1 network.go:220] Ignoring not a miss: 46:6c:7a:a6:06:60, 10.244.1.2 I1228 02:53:50.153734 1 network.go:220] Ignoring not a miss: 46:6c:7a:a6:06:60, 10.244.1.2 I1228 02:53:52.154056 1 network.go:225] L3 miss: 10.244.1.2 I1228 02:53:52.154110 1 device.go:191] calling NeighSet: 10.244.1.2, 46:6c:7a:a6:06:60 I1228 02:53:52.154256 1 network.go:236] AddL3 succeeded
日誌中有大量:「Ignoring not a miss」字樣的日誌,彷佛vxlan網絡有問題。這個問題與下面issue中描述頗爲接近:
https://github.com/coreos/flannel/issues/427
Flannel默認採用vxlan做爲backend,使用kernel vxlan默認的udp 8742端口。Flannel還支持udp的backend,使用udp 8285端口。因而試着更換一下flannel後端。更換flannel後端的步驟以下:
--- kind: ConfigMap apiVersion: v1 metadata: name: kube-flannel-cfg namespace: kube-system labels: tier: node app: flannel data: cni-conf.json: | { "name": "cbr0", "type": "flannel", "delegate": { "isDefaultGateway": true } } net-conf.json: | { "Network": "10.244.0.0/16", "Backend": { "Type": "udp", "Port": 8285 } } --- ... ...
# kubectl delete -f kube-flannel.yml configmap "kube-flannel-cfg" deleted daemonset "kube-flannel-ds" deleted # kubectl apply -f kube-flannel.yml configmap "kube-flannel-cfg" created daemonset "kube-flannel-ds" created # netstat -an|grep 8285 udp 0 0 123.56.200.187:8285 0.0.0.0:*
通過測試發現:udp端口是通的。在兩個node上tcpdump -i flannel0 能夠看到udp數據包的發送和接收。可是兩個node間的pod network依舊不通。
正常狀況下master node和minion node上的flannel pod的啓動日誌以下:
master node flannel的運行:
I1227 04:56:16.577828 1 main.go:132] Installing signal handlers I1227 04:56:16.578060 1 kube.go:233] starting kube subnet manager I1227 04:56:16.578064 1 manager.go:133] Determining IP address of default interface I1227 04:56:16.578576 1 manager.go:163] Using 123.56.200.187 as external interface I1227 04:56:16.578616 1 manager.go:164] Using 123.56.200.187 as external endpoint E1227 04:56:16.579079 1 network.go:106] failed to register network: failed to acquire lease: node "iz25beglnhtz" not found I1227 04:56:17.583744 1 ipmasq.go:47] Adding iptables rule: -s 10.244.0.0/16 -d 10.244.0.0/16 -j RETURN I1227 04:56:17.585367 1 ipmasq.go:47] Adding iptables rule: -s 10.244.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE I1227 04:56:17.587765 1 ipmasq.go:47] Adding iptables rule: ! -s 10.244.0.0/16 -d 10.244.0.0/16 -j MASQUERADE I1227 04:56:17.589943 1 manager.go:246] Lease acquired: 10.244.0.0/24 I1227 04:56:17.590203 1 network.go:58] Watching for L3 misses I1227 04:56:17.590255 1 network.go:66] Watching for new subnet leases I1227 07:43:27.164103 1 network.go:153] Handling initial subnet events I1227 07:43:27.164211 1 device.go:163] calling GetL2List() dev.link.Index: 5 I1227 07:43:27.164350 1 device.go:168] calling NeighAdd: 59.110.67.15, ca:50:97:1f:c2:ea
minion node上flannel的運行:
# docker logs 1f64bd9c0386 I1227 07:43:26.670620 1 main.go:132] Installing signal handlers I1227 07:43:26.671006 1 manager.go:133] Determining IP address of default interface I1227 07:43:26.670825 1 kube.go:233] starting kube subnet manager I1227 07:43:26.671514 1 manager.go:163] Using 59.110.67.15 as external interface I1227 07:43:26.671575 1 manager.go:164] Using 59.110.67.15 as external endpoint I1227 07:43:26.746811 1 ipmasq.go:47] Adding iptables rule: -s 10.244.0.0/16 -d 10.244.0.0/16 -j RETURN I1227 07:43:26.749785 1 ipmasq.go:47] Adding iptables rule: -s 10.244.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE I1227 07:43:26.752343 1 ipmasq.go:47] Adding iptables rule: ! -s 10.244.0.0/16 -d 10.244.0.0/16 -j MASQUERADE I1227 07:43:26.755126 1 manager.go:246] Lease acquired: 10.244.1.0/24 I1227 07:43:26.755444 1 network.go:58] Watching for L3 misses I1227 07:43:26.755475 1 network.go:66] Watching for new subnet leases I1227 07:43:27.755830 1 network.go:153] Handling initial subnet events I1227 07:43:27.755905 1 device.go:163] calling GetL2List() dev.link.Index: 10 I1227 07:43:27.756099 1 device.go:168] calling NeighAdd: 123.56.200.187, ca:68:7c:9b:cc:67
但在進行上面問題5的測試過程當中,咱們發現flannel container的啓動日誌中有以下錯誤:
master node:
# docker logs c2d1cee3df3d I1228 06:53:52.502571 1 main.go:132] Installing signal handlers I1228 06:53:52.502735 1 manager.go:133] Determining IP address of default interface I1228 06:53:52.503031 1 manager.go:163] Using 123.56.200.187 as external interface I1228 06:53:52.503054 1 manager.go:164] Using 123.56.200.187 as external endpoint E1228 06:53:52.503869 1 network.go:106] failed to register network: failed to acquire lease: node "iz25beglnhtz" not found I1228 06:53:52.503899 1 kube.go:233] starting kube subnet manager I1228 06:53:53.522892 1 ipmasq.go:47] Adding iptables rule: -s 10.244.0.0/16 -d 10.244.0.0/16 -j RETURN I1228 06:53:53.524325 1 ipmasq.go:47] Adding iptables rule: -s 10.244.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE I1228 06:53:53.526622 1 ipmasq.go:47] Adding iptables rule: ! -s 10.244.0.0/16 -d 10.244.0.0/16 -j MASQUERADE I1228 06:53:53.528438 1 manager.go:246] Lease acquired: 10.244.0.0/24 I1228 06:53:53.528744 1 network.go:58] Watching for L3 misses I1228 06:53:53.528777 1 network.go:66] Watching for new subnet leases
minion node:
# docker logs dcbfef45308b I1228 05:28:05.012530 1 main.go:132] Installing signal handlers I1228 05:28:05.012747 1 manager.go:133] Determining IP address of default interface I1228 05:28:05.013011 1 manager.go:163] Using 59.110.67.15 as external interface I1228 05:28:05.013031 1 manager.go:164] Using 59.110.67.15 as external endpoint E1228 05:28:05.013204 1 network.go:106] failed to register network: failed to acquire lease: node "iz2ze39jeyizepdxhwqci6z" not found I1228 05:28:05.013237 1 kube.go:233] starting kube subnet manager I1228 05:28:06.041602 1 ipmasq.go:47] Adding iptables rule: -s 10.244.0.0/16 -d 10.244.0.0/16 -j RETURN I1228 05:28:06.042863 1 ipmasq.go:47] Adding iptables rule: -s 10.244.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE I1228 05:28:06.044896 1 ipmasq.go:47] Adding iptables rule: ! -s 10.244.0.0/16 -d 10.244.0.0/16 -j MASQUERADE I1228 05:28:06.046497 1 manager.go:246] Lease acquired: 10.244.1.0/24 I1228 05:28:06.046780 1 network.go:98] Watching for new subnet leases I1228 05:28:07.047052 1 network.go:191] Subnet added: 10.244.0.0/24
兩個Node都有「註冊網絡」失敗的錯誤:failed to register network: failed to acquire lease: node 「xxxx」 not found。很難判定是不是由於這兩個錯誤致使的兩個node間的網絡不通。從整個測試過程來看,這個問題時有時無。在下面flannel issue中也有相似的問題討論:
https://github.com/coreos/flannel/issues/435
Flannel pod network的諸多問題讓我決定暫時放棄在kubeadm建立的kubernetes cluster中繼續使用Flannel。
Kubernetes支持的pod network add-ons中,除了Flannel,還有calico、Weave net等。這裏咱們試試基於邊界網關BGP協議實現的Calico pod network。Calico Project針對在kubeadm創建的K8s集羣的Pod網絡安裝也有專門的文檔。文檔中描述的需求和約束咱們均知足,好比:
master node帶有kubeadm.alpha.kubernetes.io/role: master標籤:
# kubectl get nodes -o wide --show-labels NAME STATUS AGE EXTERNAL-IP LABELS iz25beglnhtz Ready,master 3m <none> beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubeadm.alpha.kubernetes.io/role=master,kubernetes.io/hostname=iz25beglnhtz
在安裝calico以前,咱們仍是要執行kubeadm reset重置環境,並將flannel建立的各類網絡設備刪除,可參考上面幾個小節中的命令。
使用calico的kubeadm init無需再指定–pod-network-cidr=10.244.0.0/16 option:
# kubeadm init --api-advertise-addresses=10.47.217.91 [kubeadm] WARNING: kubeadm is in alpha, please do not use it for production clusters. [preflight] Running pre-flight checks [preflight] Starting the kubelet service [init] Using Kubernetes version: v1.5.1 [tokens] Generated token: "531b3f.3bd900d61b78d6c9" [certificates] Generated Certificate Authority key and certificate. [certificates] Generated API Server key and certificate [certificates] Generated Service Account signing keys [certificates] Created keys and certificates in "/etc/kubernetes/pki" [kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/kubelet.conf" [kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/admin.conf" [apiclient] Created API client, waiting for the control plane to become ready [apiclient] All control plane components are healthy after 13.527323 seconds [apiclient] Waiting for at least one node to register and become ready [apiclient] First node is ready after 0.503814 seconds [apiclient] Creating a test deployment [apiclient] Test deployment succeeded [token-discovery] Created the kube-discovery deployment, waiting for it to become ready [token-discovery] kube-discovery is ready after 1.503644 seconds [addons] Created essential addon: kube-proxy [addons] Created essential addon: kube-dns Your Kubernetes master has initialized successfully! You should now deploy a pod network to the cluster. Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at: http://kubernetes.io/docs/admin/addons/ You can now join any number of machines by running the following on each node: kubeadm join --token=531b3f.3bd900d61b78d6c9 10.47.217.91
# kubectl apply -f http://docs.projectcalico.org/v2.0/getting-started/kubernetes/installation/hosted/kubeadm/calico.yaml configmap "calico-config" created daemonset "calico-etcd" created service "calico-etcd" created daemonset "calico-node" created deployment "calico-policy-controller" created job "configure-calico" created
實際建立過程須要一段時間,由於calico須要pull 一些images:
# docker images REPOSITORY TAG IMAGE ID CREATED SIZE quay.io/calico/node v1.0.0 74bff066bc6a 7 days ago 256.4 MB calico/ctl v1.0.0 069830246cf3 8 days ago 43.35 MB calico/cni v1.5.5 ada87b3276f3 12 days ago 67.13 MB gcr.io/google_containers/etcd 2.2.1 a6cd91debed1 14 months ago 28.19 MB
calico在master node本地建立了兩個network device:
# ip a ... ... 47: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1440 qdisc noqueue state UNKNOWN group default qlen 1 link/ipip 0.0.0.0 brd 0.0.0.0 inet 192.168.91.0/32 scope global tunl0 valid_lft forever preferred_lft forever 48: califa32a09679f@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default link/ether 62:39:10:55:44:c8 brd ff:ff:ff:ff:ff:ff link-netnsid 0
執行下面命令,將minion node加入cluster:
# kubeadm join --token=531b3f.3bd900d61b78d6c9 10.47.217.91
calico在minion node上也建立了一個network device:
57988: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1440 qdisc noqueue state UNKNOWN group default qlen 1 link/ipip 0.0.0.0 brd 0.0.0.0 inet 192.168.136.192/32 scope global tunl0 valid_lft forever preferred_lft forever
join成功後,咱們查看一下cluster status:
# kubectl get pods --all-namespaces -o wide NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE kube-system calico-etcd-488qd 1/1 Running 0 18m 10.47.217.91 iz25beglnhtz kube-system calico-node-jcb3c 2/2 Running 0 18m 10.47.217.91 iz25beglnhtz kube-system calico-node-zthzp 2/2 Running 0 4m 10.28.61.30 iz2ze39jeyizepdxhwqci6z kube-system calico-policy-controller-807063459-f21q4 1/1 Running 0 18m 10.47.217.91 iz25beglnhtz kube-system dummy-2088944543-rtsfk 1/1 Running 0 23m 10.47.217.91 iz25beglnhtz kube-system etcd-iz25beglnhtz 1/1 Running 0 23m 10.47.217.91 iz25beglnhtz kube-system kube-apiserver-iz25beglnhtz 1/1 Running 0 23m 10.47.217.91 iz25beglnhtz kube-system kube-controller-manager-iz25beglnhtz 1/1 Running 0 23m 10.47.217.91 iz25beglnhtz kube-system kube-discovery-1769846148-51wdk 1/1 Running 0 23m 10.47.217.91 iz25beglnhtz kube-system kube-dns-2924299975-fhf5f 4/4 Running 0 23m 192.168.91.1 iz25beglnhtz kube-system kube-proxy-2s7qc 1/1 Running 0 4m 10.28.61.30 iz2ze39jeyizepdxhwqci6z kube-system kube-proxy-h2qds 1/1 Running 0 23m 10.47.217.91 iz25beglnhtz kube-system kube-scheduler-iz25beglnhtz 1/1 Running 0 23m 10.47.217.91 iz25beglnhtz
全部組件都是ok的。彷佛是好兆頭!但跨node的pod network是否聯通,還需進一步探究。
咱們依舊利用上面測試flannel網絡的my-nginx-svc.yaml和run-my-nginx.yaml,建立my-nginx service和my-nginx deployment。注意:這以前要先在master node上執行一下」kubectl taint nodes –all dedicated-」,以讓master node承載work load。
遺憾的是,結果和flannel很類似,分配到master node上http request獲得了nginx的響應;minion node上的pod依舊沒法聯通。
此次我不想在calico這塊過多耽擱,我要快速看看下一個候選者:weave net是否知足要求。
通過上面那麼屢次嘗試,結果是使人掃興的。Weave network彷佛是最後一顆救命稻草了。有了前面的鋪墊,這裏就不詳細列出各類命令的輸出細節了。Weave network也有專門的官方文檔用於指導如何與kubernetes集羣集成,咱們主要也是參考它。
在kubeadm reset後,咱們從新初始化了集羣。接下來咱們安裝weave network add-on:
# kubectl apply -f https://git.io/weave-kube daemonset "weave-net" created
前面不管是Flannel仍是calico,在安裝pod network add-on時至少都仍是順利的。不過在Weave network此次,咱們遭遇「當頭棒喝」:(:
# kubectl get pod --all-namespaces -o wide NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE kube-system dummy-2088944543-4kxtk 1/1 Running 0 42m 10.47.217.91 iz25beglnhtz kube-system etcd-iz25beglnhtz 1/1 Running 0 42m 10.47.217.91 iz25beglnhtz kube-system kube-apiserver-iz25beglnhtz 1/1 Running 0 42m 10.47.217.91 iz25beglnhtz kube-system kube-controller-manager-iz25beglnhtz 1/1 Running 0 42m 10.47.217.91 iz25beglnhtz kube-system kube-discovery-1769846148-pzv8p 1/1 Running 0 42m 10.47.217.91 iz25beglnhtz kube-system kube-dns-2924299975-09dcb 0/4 ContainerCreating 0 42m <none> iz25beglnhtz kube-system kube-proxy-z465f 1/1 Running 0 42m 10.47.217.91 iz25beglnhtz kube-system kube-scheduler-iz25beglnhtz 1/1 Running 0 42m 10.47.217.91 iz25beglnhtz kube-system weave-net-3wk9h 0/2 CrashLoopBackOff 16 17m 10.47.217.91 iz25beglnhtz
安裝後,weave-net pod提示:CrashLoopBackOff。追蹤其Container log,獲得以下錯誤信息:
docker logs cde899efa0af
time=」2016-12-28T08:25:29Z」 level=info msg=」Starting Weaveworks NPC 1.8.2″