kubeadm的實現設計請參考:https://github.com/kubernetes/kubeadm/blob/master/docs/design/design_v1.10.mdnode
節點網絡 | Pod網絡 | service網絡 |
---|---|---|
192.168.101.0/24 | 10.244.0.0/16(flannel網絡默認) | 10.96.0.0/12 |
kubeadm init
命令初始化,完成主節點的初化。在master節點上把API Server
、controller-manager
、scheduler
和etcd
運行成Pod,在各node節點上把kube-proxy
也運行成Pod,這些Pod是靜態化Pod;kubeadm join
把節點加入到集羣flannel
附件也運行成在各master和nodes節點上,也運行成Pod節點角色 | IP地址 |
---|---|
master主節點 | 192.168.101.40 |
node1工做節點 | 192.168.101.41 |
node2工做節點 | 192.168.101.42 |
三個節點系統環境徹底相同linux
root@node01:~# cat /etc/issue Ubuntu 18.04.4 LTS \n \l root@node01:~# uname -r 4.15.0-111-generic root@node01:~# lsb_release -cr Release: 18.04 Codename: bionic
master和node節點上分別執行以下操做git
# 禁用swap # 增長開機啓動時關閉swap # 禁用 /etc/fstab 文件中swap的相關行 root@node01:~# swapoff -a root@node01:~# vim /etc/rc.local #/bin/bash swapoff -a root@node01:~# chmod +x /etc/rc.local # 關閉ufw防火牆,若是是centos7系統,則須要關閉firewall,並disable root@node01:~# systemctl stop ufw.service root@node01:~# systemctl disable ufw.service Synchronizing state of ufw.service with SysV service script with /lib/systemd/systemd-sysv-install. Executing: /lib/systemd/systemd-sysv-install disable ufw root@node01:~# systemctl list-unit-files | grep ufw ufw.service disabled # selinux未配置,若是是啓用狀態,得須要禁用 # 清空iptables規則 root@node01:~# dpkg -l | grep iptables # 默認已安裝Iptables管理工具 ii iptables 1.6.1-2ubuntu2 amd64 administration tools for packet filtering and NAT root@node01:~# iptables -F root@node01:~# iptables -X root@node01:~# iptables -Z # 更改apt源,使用阿里的鏡像源 root@node01:~# vim /etc/apt/sources.list deb http://mirrors.aliyun.com/ubuntu/ bionic main restricted universe multiverse deb-src http://mirrors.aliyun.com/ubuntu/ bionic main restricted universe multiverse deb http://mirrors.aliyun.com/ubuntu/ bionic-security main restricted universe multiverse deb-src http://mirrors.aliyun.com/ubuntu/ bionic-security main restricted universe multiverse deb http://mirrors.aliyun.com/ubuntu/ bionic-updates main restricted universe multiverse deb-src http://mirrors.aliyun.com/ubuntu/ bionic-updates main restricted universe multiverse deb http://mirrors.aliyun.com/ubuntu/ bionic-proposed main restricted universe multiverse deb-src http://mirrors.aliyun.com/ubuntu/ bionic-proposed main restricted universe multiverse deb http://mirrors.aliyun.com/ubuntu/ bionic-backports main restricted universe multiverse deb-src http://mirrors.aliyun.com/ubuntu/ bionic-backports main restricted universe multiverse # 安裝時間同步軟件 root@node01:~# apt-get update && apt-get install chrony # 修正時區 root@node01:~# cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime # 增長aliyun的docker-ce源 root@node01:~# apt-get -y install apt-transport-https ca-certificates curl software-properties-common root@node01:~# curl -fsSL https://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | sudo apt-key add - root@node01:~# echo "deb [arch=amd64] https://mirrors.aliyun.com/docker-ce/linux/ubuntu $(lsb_release -cs) stable" > /etc/apt/sources.list.d/docker-ce.list # 安裝docker-ce root@node01:~# apt-get update && apt-get install docker-ce # 增長aliyun的kubernetes鏡像源 root@node01:~# apt-get install -y apt-transport-https root@node01:~# curl https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | apt-key add - root@node01:~# cat <<EOF >/etc/apt/sources.list.d/kubernetes.list deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main EOF # 在/etc/hosts中增長各個節點的主機解析 192.168.101.40 node01.k8s.com node01 192.168.101.41 node02.k8s.com node02 192.168.101.42 node03.k8s.com node03
ubuntu上的iptables不像centos系統上是以服務的形式來管理,iptable只是一個管理工具而已,只須要保證沒有啓用規則便可。github
Ubutu阿里鏡像源配置參考:https://developer.aliyun.com/mirror/ubuntu?spm=a2c6h.13651102.0.0.3e221b11HFtiVedocker
Docker-ce阿里鏡像源配置參考:https://developer.aliyun.com/mirror/docker-ce?spm=a2c6h.13651102.0.0.3e221b11O3EaIzjson
kubernetes阿里鏡像配置參考:https://developer.aliyun.com/mirror/kubernetes?spm=a2c6h.13651102.0.0.3e221b11HFtiVebootstrap
經過以上的步驟安裝的docker-ce爲19.03.12
版本,對kubernetes來講該版本太高,在這裏有說明:ubuntu
Kubernetes system requirements: if running on linux: [error] if not Kernel 3.10+ or 4+ with specific KernelSpec. [error] if required cgroups subsystem aren't in set up. if using docker: [error/warning] if Docker endpoint does not exist or does not work, if docker version >17.03. Note: starting from 1.9, kubeadm provides better support for CRI-generic functionality; in that case, docker specific controls are skipped or replaced by similar controls for crictl
若是是生產環境,請安裝17.03
的版本。vim
root@node01:~# apt-get install kubelet kubeadm kubectl ... o you want to continue? [Y/n] y Get:1 http://mirrors.aliyun.com/ubuntu bionic/main amd64 conntrack amd64 1:1.4.4+snapshot20161117-6ubuntu2 [30.6 kB] Get:2 http://mirrors.aliyun.com/ubuntu bionic/main amd64 socat amd64 1.7.3.2-2ubuntu2 [342 kB] Get:3 https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial/main amd64 cri-tools amd64 1.13.0-01 [8775 kB] Get:4 https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial/main amd64 kubernetes-cni amd64 0.8.6-00 [25.0 MB] Get:5 https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial/main amd64 kubelet amd64 1.18.6-00 [19.4 MB] Get:6 https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial/main amd64 kubectl amd64 1.18.6-00 [8826 kB] Get:7 https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial/main amd64 kubeadm amd64 1.18.6-00 [8167 kB] Fetched 70.6 MB in 15s (4599 kB/s) ...
kubectl: API Server的客戶端工具,node節點上不執行API Server相關命令就不用安裝centos
kubeadm在初始化時會下載一些鏡像到本地,而這些鏡像是託管在k8s.gcr.io
,在大陸地區沒法訪問。能夠想辦法搭建一個代理來解決,要讓docker deamon在拉取鏡像時走代理,配置以下:
root@node01:~# vim /lib/systemd/system/docker.service [Service] Environment="HTTPS_PROXY=http://x.x.x.x:10080" Environment="NO_PROXY=127.0.0.0/8,192.168.101.0/24" ... # 從新啓動docker root@node01:~# systemctl daemon-reload root@node01:~# systemctl stop docker root@node01:~# systemctl start docker
確保關於iptable的兩個內核參數值爲1
root@node01:~# cat /proc/sys/net/bridge/bridge-nf-call-iptables 1 root@node01:~# cat /proc/sys/net/bridge/bridge-nf-call-ip6tables 1
確保kubelet
服務設置爲開機啓動,但當前處理關閉狀態
root@node01:~# systemctl is-enabled kubelet enabled
增長docker運行加載參數
root@node01:~# vim /etc/docker/daemon.json { "exec-opts": ["native.cgroupdriver=systemd"], .... }
若是不加此選項,那kubeadm init
在初始化時會有警告信息,而且初始化失敗,警告信息以下:
... [preflight] Running pre-flight checks [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/ ... [kubelet-check] It seems like the kubelet isn't running or healthy. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused. [kubelet-check] It seems like the kubelet isn't running or healthy. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused. [kubelet-check] It seems like the kubelet isn't running or healthy. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused. [kubelet-check] It seems like the kubelet isn't running or healthy. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused. [kubelet-check] It seems like the kubelet isn't running or healthy. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused. Unfortunately, an error has occurred: timed out waiting for the condition This error is likely caused by: - The kubelet is not running - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled) If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands: - 'systemctl status kubelet' - 'journalctl -xeu kubelet' Additionally, a control plane component may have crashed or exited when started by the container runtime. To troubleshoot, list all containers using your preferred container runtimes CLI. Here is one example how you may list all Kubernetes containers running in docker: - 'docker ps -a | grep kube | grep -v pause' Once you have found the failing container, you can inspect its logs with: - 'docker logs CONTAINERID'
初始化kubernetes
# 查看須要拉取哪些鏡像 root@node01:~# kubeadm config images list W0725 13:02:07.511180 6409 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io] k8s.gcr.io/kube-apiserver:v1.18.6 k8s.gcr.io/kube-controller-manager:v1.18.6 k8s.gcr.io/kube-scheduler:v1.18.6 k8s.gcr.io/kube-proxy:v1.18.6 k8s.gcr.io/pause:3.2 k8s.gcr.io/etcd:3.4.3-0 k8s.gcr.io/coredns:1.6.7 # 先拉取所須要的鏡像 root@node01:~# kubeadm config images pull W0722 16:17:21.699535 8329 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io] [config/images] Pulled k8s.gcr.io/kube-apiserver:v1.18.6 [config/images] Pulled k8s.gcr.io/kube-controller-manager:v1.18.6 [config/images] Pulled k8s.gcr.io/kube-scheduler:v1.18.6 [config/images] Pulled k8s.gcr.io/kube-proxy:v1.18.6 [config/images] Pulled k8s.gcr.io/pause:3.2 [config/images] Pulled k8s.gcr.io/etcd:3.4.3-0 [config/images] Pulled k8s.gcr.io/coredns:1.6.7 # 初始化爲master root@node01:~# kubeadm init --kubernetes-version=v1.18.6 --pod-network-cidr=10.244.0.0/16 --service-cidr=10.96.0.0/12 W0722 17:02:21.625550 25074 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io] [init] Using Kubernetes version: v1.18.6 [preflight] Running pre-flight checks [preflight] Pulling images required for setting up a Kubernetes cluster [preflight] This might take a minute or two, depending on the speed of your internet connection [preflight] You can also perform this action in beforehand using 'kubeadm config images pull' [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env" [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml" [kubelet-start] Starting the kubelet [certs] Using certificateDir folder "/etc/kubernetes/pki" [certs] Generating "ca" certificate and key [certs] Generating "apiserver" certificate and key [certs] apiserver serving cert is signed for DNS names [node01 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.101.40] [certs] Generating "apiserver-kubelet-client" certificate and key [certs] Generating "front-proxy-ca" certificate and key [certs] Generating "front-proxy-client" certificate and key [certs] Generating "etcd/ca" certificate and key [certs] Generating "etcd/server" certificate and key [certs] etcd/server serving cert is signed for DNS names [node01 localhost] and IPs [192.168.101.40 127.0.0.1 ::1] [certs] Generating "etcd/peer" certificate and key [certs] etcd/peer serving cert is signed for DNS names [node01 localhost] and IPs [192.168.101.40 127.0.0.1 ::1] [certs] Generating "etcd/healthcheck-client" certificate and key [certs] Generating "apiserver-etcd-client" certificate and key [certs] Generating "sa" key and public key [kubeconfig] Using kubeconfig folder "/etc/kubernetes" [kubeconfig] Writing "admin.conf" kubeconfig file [kubeconfig] Writing "kubelet.conf" kubeconfig file [kubeconfig] Writing "controller-manager.conf" kubeconfig file [kubeconfig] Writing "scheduler.conf" kubeconfig file [control-plane] Using manifest folder "/etc/kubernetes/manifests" [control-plane] Creating static Pod manifest for "kube-apiserver" [control-plane] Creating static Pod manifest for "kube-controller-manager" W0722 17:02:25.619105 25074 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC" [control-plane] Creating static Pod manifest for "kube-scheduler" W0722 17:02:25.620260 25074 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC" [etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests" [wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s [apiclient] All control plane components are healthy after 25.005958 seconds [upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace [kubelet] Creating a ConfigMap "kubelet-config-1.18" in namespace kube-system with the configuration for the kubelets in the cluster [upload-certs] Skipping phase. Please see --upload-certs [mark-control-plane] Marking the node node01 as control-plane by adding the label "node-role.kubernetes.io/master=''" [mark-control-plane] Marking the node node01 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule] [bootstrap-token] Using token: ri964b.aos1fa4h7y2zmu5g [bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles [bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes [bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials [bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token [bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster [bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace [kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key [addons] Applied essential addon: CoreDNS [addons] Applied essential addon: kube-proxy Your Kubernetes control-plane has initialized successfully! To start using your cluster, you need to run the following as a regular user: mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config You should now deploy a pod network to the cluster. Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at: https://kubernetes.io/docs/concepts/cluster-administration/addons/ Then you can join any number of worker nodes by running the following on each as root: kubeadm join 192.168.101.40:6443 --token ri964b.aos1fa4h7y2zmu5g \ --discovery-token-ca-cert-hash sha256:c7c8e629116b4bda1af8ad83236291f1a38ca01bb0abd8a7a8a46c286547d609
注意:
kubeadm join 192.168.101.40:6443 --token ri964b.aos1fa4h7y2zmu5g \ --discovery-token-ca-cert-hash sha256:c7c8e629116b4bda1af8ad83236291f1a38ca01bb0abd8a7a8a46c286547d609 這個增長工做節點命令中的「token」是有時效性的,默認爲24小時,過時後在增長工做節點時出現「error execution phase preflight: couldn't validate the identity of the API Server: could not find a JWS signature in the cluster-info ConfigMap for token ID...」這樣的錯誤,那就說明token過時了,解決辦法: 在master節點上使用「kubeadm token create --ttl 0」來生成新的token,其中「--ttl 0」表示token不過時,根據須要看是否增長此選項。"kubeadm token list"列出token有哪些。
master節點初始化完成,按照提示建立建立一個普通用戶來管理kubernetes集羣
root@node01:~# adduser k8s # 配置sudo權限 root@node01:~# visudo # 增長一行 k8s ALL=(ALL) NOPASSWD:ALL # k8s@node01:~$ mkdir -p $HOME/.kube k8s@node01:~$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config k8s@node01:~$ sudo chown $(id -u):$(id -g) $HOME/.kube/config k8s@node01:~$ ls -al .kube/config -rw------- 1 k8s k8s 5454 Jul 22 17:37 .kube/config
此時查看集羣狀態,節點狀態,運行Pod信息都是有問題的,以下
# 集羣狀態不健康 k8s@node01:~$ sudo kubectl get componentstatus NAME STATUS MESSAGE ERROR controller-manager Unhealthy Get http://127.0.0.1:10252/healthz: dial tcp 127.0.0.1:10252: connect: connection refused scheduler Unhealthy Get http://127.0.0.1:10251/healthz: dial tcp 127.0.0.1:10251: connect: connection refused etcd-0 Healthy {"health":"true"} # 有兩個Pod沒有正常運行 k8s@node01:~$ sudo kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE coredns-66bff467f8-7dr57 0/1 Pending 0 85m coredns-66bff467f8-xzf9p 0/1 Pending 0 85m etcd-node01 1/1 Running 0 85m kube-apiserver-node01 1/1 Running 0 85m kube-controller-manager-node01 1/1 Running 0 85m kube-proxy-vlbxb 1/1 Running 0 85m kube-scheduler-node01 1/1 Running 0 85m # master節點也是未就緒狀態 k8s@node01:~$ sudo kubectl get nodes NAME STATUS ROLES AGE VERSION node01 NotReady master 49m v1.18.6
後兩個問題都將在安裝網絡插件flannel後獲得解決。
安裝網絡插件flannel
k8s@node01:~$ sudo kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml podsecuritypolicy.policy/psp.flannel.unprivileged created clusterrole.rbac.authorization.k8s.io/flannel created clusterrolebinding.rbac.authorization.k8s.io/flannel created serviceaccount/flannel created configmap/kube-flannel-cfg created daemonset.apps/kube-flannel-ds-amd64 created daemonset.apps/kube-flannel-ds-arm64 created daemonset.apps/kube-flannel-ds-arm created daemonset.apps/kube-flannel-ds-ppc64le created daemonset.apps/kube-flannel-ds-s390x created
docker安裝好後默認使用的Cgroup Driver
爲cgroupfs
,以下
root@node03:/var/lib/kubelet# docker info | grep -i cgroup Cgroup Driver: cgroupfs
而kuebelet默認使用的Cgroup Driver
爲systemd
,因此kubelet與docker使用的驅動要一致才能正常的協調工做,在初始化master時,是修改的/etc/docker/daemon.json
文件,給docker daemon傳遞一個參數讓其Cgroup Driver
設置爲systemd
,也能夠修改kubelet的啓動參數,讓其工做在cgroupfs
模式,確保如下配置文件中--cgroup-driver=cgroupfs
便可
$ cat /var/lib/kubelet/kubeadm-flags.env KUBELET_KUBEADM_ARGS="--cgroup-driver=cgroupfs --network-plugin=cni --pod-infra-container-image=k8s.gcr.io/pause:3.2 --resolv-conf=/run/systemd/resolve/resolv.conf"
工做節點node03就是以這種方式加入到集羣的。
kubectl get componentstatus
可簡寫爲kubectl cs
列出集羣健康狀態信息
kubectl get nodes
列出集羣節點信息
kubectl get pods -n kube-system
列出名稱空間爲「kube-system」中Pod的運行狀態
kubectl get ns
列出集羣的名稱空間
kubectl get deployment -w
實時監控deployment的信息
kubectl describe node NODENAME
查看一個節點的詳細信息
kubectl cluster-info
集羣信息
kubectl get services
簡寫爲kubectl get svc
列出services
kubectl get pods --show-labels
顯示pods資源時一併顯示相應的標籤信息
kubectl edit svc SERVICE_NAME
修改一個服務的運行中的信息
kubectl describe deployment DEPLOYMENT_NAME
顯示指定deployment的詳細信息
# 安裝所須要組件 root@node02:~# apt-get update && apt-get -y install kubelet kubeadm # 複製master節點上的/etc/docker/daemon.json文件,主要是配置「"exec-opts": ["native.cgroupdriver=systemd"],」,不然 # kubelet沒法啓動,配置更改後重啓docker # 設置開機啓動 oot@node02:~# systemctl enable docker kubelet # 加入集羣 root@node02:~# kubeadm join 192.168.101.40:6443 --token ri964b.aos1fa4h7y2zmu5g --discovery-token-ca-cert-hash sha256:c7c8e629116b4bda1af8ad83236291f1a38ca01bb0abd8a7a8a46c286547d609 W0722 18:42:58.676548 25113 join.go:346] [preflight] WARNING: JoinControlPane.controlPlane settings will be ignored when control-plane flag is not set. [preflight] Running pre-flight checks [preflight] Reading configuration from the cluster... [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml' [kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.18" ConfigMap in the kube-system namespace [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml" [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env" [kubelet-start] Starting the kubelet [kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap... This node has joined the cluster: * Certificate signing request was sent to apiserver and a response was received. * The Kubelet was informed of the new secure connection details. Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
再回到master節點查看狀態信息
# 集羣狀態已就緒 k8s@node01:~$ sudo kubectl get nodes NAME STATUS ROLES AGE VERSION node01 Ready master 117m v1.18.6 node02 Ready <none> 24m v1.18.6 # 各Pod已正常運行 k8s@node01:~$ sudo kubectl get pods -n kube-system -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES coredns-66bff467f8-7dr57 1/1 Running 0 116m 10.244.1.3 node02 <none> <none> coredns-66bff467f8-xzf9p 1/1 Running 0 116m 10.244.1.2 node02 <none> <none> etcd-node01 1/1 Running 0 116m 192.168.101.40 node01 <none> <none> kube-apiserver-node01 1/1 Running 0 116m 192.168.101.40 node01 <none> <none> kube-controller-manager-node01 1/1 Running 0 116m 192.168.101.40 node01 <none> <none> kube-flannel-ds-amd64-djjs7 1/1 Running 0 6m35s 192.168.101.41 node02 <none> <none> kube-flannel-ds-amd64-hthnk 1/1 Running 0 6m35s 192.168.101.40 node01 <none> <none> kube-proxy-r2v2p 1/1 Running 0 23m 192.168.101.41 node02 <none> <none> kube-proxy-vlbxb 1/1 Running 0 116m 192.168.101.40 node01 <none> <none> kube-scheduler-node01 1/1 Running 0 116m 192.168.101.40 node01 <none> <none>
node03以一樣的方式加入到集羣,最終集羣狀態以下
k8s@node01:~$ sudo kubectl get nodes NAME STATUS ROLES AGE VERSION node01 Ready master 124m v1.18.6 node02 Ready <none> 31m v1.18.6 node03 Ready <none> 47s v1.18.6
若是想移除集羣中的節點依次進行以下操做
k8s@node01:~$ kubectl get nodes NAME STATUS ROLES AGE VERSION node01 Ready master 12d v1.18.6 node02 Ready <none> 12d v1.18.6 node03 Ready <none> 12d v1.18.6 node04 Ready <none> 24h v1.18.6 # 須要移除node04節點 # 遷移node04節點上的pod,daemonset類型的pod不用遷移 k8s@node01:~$ kubectl drain node04 --delete-local-data --force --ignore-daemonsets node/node04 cordoned WARNING: ignoring DaemonSet-managed Pods: kube-system/canal-ggt5n, kube-system/kube-flannel-ds-amd64-xhksw, kube-system/kube-proxy-g9rpd node/node04 drained k8s@node01:~$ kubectl get nodes NAME STATUS ROLES AGE VERSION node01 Ready master 12d v1.18.6 node02 Ready <none> 12d v1.18.6 node03 Ready <none> 12d v1.18.6 node04 Ready,SchedulingDisabled <none> 24h v1.18.6 k8s@node01:~$ kubectl delete nodes node04 node "node04" deleted k8s@node01:~$ kubectl get nodes NAME STATUS ROLES AGE VERSION node01 Ready master 12d v1.18.6 node02 Ready <none> 12d v1.18.6 node03 Ready <none> 12d v1.18.6 # 再到node04節點上執行 root@node01:~# kubeadm reset
master初始化完成後,如下兩個組件狀態顯示依然爲Unhealthy
k8s@node01:~$ sudo kubectl get cs NAME STATUS MESSAGE ERROR controller-manager Unhealthy Get http://127.0.0.1:10252/healthz: dial tcp 127.0.0.1:10252: connect: connection refused scheduler Unhealthy Get http://127.0.0.1:10251/healthz: dial tcp 127.0.0.1:10251: connect: connection refused etcd-0 Healthy {"health":"true"}
網絡搜尋說是controller-manager
與scheduler
兩個組件運行所在節點與執行kubectl get cs
的節點不是同一個節點,因此才致使訪問http://127.0.0.1:10252
失敗,但我這裏執行`kubectl get cs
命令的節點與controller-manager
與scheduler
兩個組件運行的節點都是node01
節點,但經測試不影響集羣使用。
問題處理思路:
先查看master節點的確沒有監聽10251與10252這兩個端口
查看兩個組件的Pod是否正常運行
k8s@node01:~$ sudo kubectl get pods -n kube-system -o wide | grep 'scheduler\|controller-manager' kube-controller-manager-node01 1/1 Running 1 7m42s 192.168.101.40 node01 <none> <none> kube-scheduler-node01 1/1 Running 0 6h32m 192.168.101.40 node01 <none> <none>
兩個組件已正常運行
那的確是兩個組件的相應Pod運行時沒有監聽相應的端口,那得找到運行兩個組件的配置文件,在主節點初化時的輸出信息中在/etc/kubernetes/manifests
目錄下建立了各個組件的相應靜態Pod的清單文件,從這裏入手
[control-plane] Using manifest folder "/etc/kubernetes/manifests" [control-plane] Creating static Pod manifest for "kube-apiserver" [control-plane] Creating static Pod manifest for "kube-controller-manager" W0722 17:02:25.619105 25074 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC" [control-plane] Creating static Pod manifest for "kube-scheduler"
k8s@node01:~$ ls /etc/kubernetes/manifests/ etcd.yaml kube-apiserver.yaml kube-controller-manager.yaml kube-scheduler.yaml
修改清單文件,去掉--port=0
這一行,在對清單文件進行修改時先作備份操做
注意:
在對清單文件作備份時,不要直接把清單文件備份在平級目錄裏,即/etc/kubernetes/manifests
目錄,須要備份到其餘目錄中或在平級目錄再建立一個相似/etc/kubernetes/manifests/bak
的備份目錄,不然按照如下操做後master節點上依然沒法監聽10251
和10252
兩個端口,組件健康狀態依然沒法恢復爲health狀態。
k8s@node01:~$ sudo vim /etc/kubernetes/manifests/kube-controller-manager.yaml - command: - kube-controller-manager - --allocate-node-cidrs=true - --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf - --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf - --bind-address=127.0.0.1 - --client-ca-file=/etc/kubernetes/pki/ca.crt - --cluster-cidr=10.244.0.0/16 - --cluster-name=kubernetes - --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt - --cluster-signing-key-file=/etc/kubernetes/pki/ca.key - --controllers=*,bootstrapsigner,tokencleaner - --kubeconfig=/etc/kubernetes/controller-manager.conf - --leader-elect=true - --node-cidr-mask-size=24 - --port=0 ########################## 刪除這行 ######### - --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt - --root-ca-file=/etc/kubernetes/pki/ca.crt - --service-account-private-key-file=/etc/kubernetes/pki/sa.key - --service-cluster-ip-range=10.96.0.0/12 - --use-service-account-credentials=true k8s@node01:~$ sudo vim /etc/kubernetes/manifests/kube-scheduler.yaml - command: - kube-scheduler - --authentication-kubeconfig=/etc/kubernetes/scheduler.conf - --authorization-kubeconfig=/etc/kubernetes/scheduler.conf - --bind-address=127.0.0.1 - --kubeconfig=/etc/kubernetes/scheduler.conf - --leader-elect=true - --port=0 ########### 刪除這行 ################# # 重啓kubelet服務 k8s@node01:~$ sudo systemctl restart kubelet # 查看監聽監聽端口以及組件狀態 k8s@node01:~$ sudo ss -tanlp | grep '10251\|10252' LISTEN 0 128 *:10251 *:* users:(("kube-scheduler",pid=51054,fd=5)) LISTEN 0 128 *:10252 *:* users:(("kube-controller",pid=51100,fd=5)) k8s@node01:~$ sudo kubectl get cs NAME STATUS MESSAGE ERROR scheduler Healthy ok controller-manager Healthy ok etcd-0 Healthy {"health":"true"}
至此,kubernetes單master集羣安裝完成。master HA的安裝請參考官方文檔:https://kubernetes.io/zh/docs/setup/production-environment/tools/kubeadm/high-availability/