今天在經過kubeadm安裝kubernetes v1.13.1集羣時,發現有一臺機器始終安裝不成功,老是在啓動kubelet
時失敗,報錯信息以下:html
問題現象:node
[root@master taoweizhong]# systemctl status kubelet -llinux
● kubelet.service - kubelet: The Kubernetes Node Agentdocker
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)bootstrap
Drop-In: /usr/lib/systemd/system/kubelet.service.dvim
└─10-kubeadm.confcentos
Active: active (running) since Wed 2019-07-31 05:37:08 PDT; 2min 8s agoapi
Docs: https://kubernetes.io/docs/bash
Main PID: 6161 (kubelet)網絡
Tasks: 16
Memory: 82.0M
CGroup: /system.slice/kubelet.service
└─6161 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=cgroupfs --network-plugin=cni --pod-infra-container-image=k8s.gcr.io/pause:3.1 --fail-swap-on=false
Jul 31 05:39:16 master kubelet[6161]: E0731 05:39:16.008683 6161 kubelet.go:2248] node "master" not found
Jul 31 05:39:16 master kubelet[6161]: E0731 05:39:16.109488 6161 kubelet.go:2248] node "master" not found
Jul 31 05:39:16 master kubelet[6161]: E0731 05:39:16.210368 6161 kubelet.go:2248] node "master" not found
初始化kubelet時設置的master IP錯誤,致使kubelet沒法鏈接master的API Server上,檢查kubelet.conf配置文件,發現server: https:// 192.168.135.139:6443這項配置非當前機器的IP(緣由是我使用動態IP致使):
[root@master taoweizhong]# cat /etc/kubernetes/kubelet.conf
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: server: https://192.168.135.143:6443
name: kubernetes
contexts:
- context:
cluster: kubernetes
user: system:node:master
name: system:node:master@kubernetes
current-context: system:node:master@kubernetes
修改以下文件中配置正確的IP地址
[root@master kubernetes]# vim admin.conf
[root@master kubernetes]# vim controller-manager.conf
[root@master kubernetes]# vim kubelet.conf
[root@master kubernetes]# vim scheduler.conf
ETCD啓動錯誤定位
配置docker網絡flannel時,配置etcd的key的時候出現如下錯誤
Error: client: etcd cluster is unavailable or misconfigured; error #0: dial tcp 127.0.0.1:4001: getsockopt: connection refused
; error #1: dial tcp 127.0.0.1:2379: getsockopt: connection refused
error #0: dial tcp 127.0.0.1:4001: getsockopt: connection refused
error #1: dial tcp 127.0.0.1:2379: getsockopt: connection refused
解決辦法:
修改etcd的配置文件:
vim /etc/etcd/etcd.conf
在 第6行,後面配置http://127.0.0.1:2379,與本機本身進行通訊, ETCD_LISTEN_CLIENT_URLS="http:// 192.168.135.143:2379,http://127.0.0.1:2379"
而後重啓etcd服務
Jul 31 05:59:03 master kubelet[22561]: W0731 05:59:03.364199 22561 cni.go:213] Unable to update cni config: No networks found in /etc/cni/net.d
Jul 31 05:59:04 master kubelet[22561]: E0731 05:59:04.542692 22561 kubelet.go:2169] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
部署flannel做爲k8s中的網絡插件,yaml文件都大小同異。
但在要注意如下細節。
之前,只須要前面master判斷。
如今也須要有not-ready狀態了。
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
- key: node.kubernetes.io/not-ready
operator: Exists
effect: NoSchedule
root@master taoweizhong]# kubectl get cs
The connection to the server localhost:8080 was refused - did you specify the right host or port?
緣由:kubenetes master沒有與本機綁定,集羣初始化的時候沒有設置
解決辦法:執行如下命令 export KUBECONFIG=/etc/kubernetes/admin.conf
/etc/ admin.conf這個文件主要是集羣初始化的時候用來傳遞參數的
在Kubernetes的從節點上運行命令kubectl出現了以下錯誤
# kubectl get pod
The connection to the server localhost:8080 was refused - did you specify the right host or port?
問題緣由是kubectl命令需使用kubernetes-admin來運行,解決方法以下,將主節點中的/etc/kubernetes/admin.conf文件拷貝到從節點相同目錄下,而後配置環境變量:
echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> ~/.bash_profile
可生效
source ~/.bash_profile
[root@master kubernetes]# scp admin.conf root@192.168.135.130:/home/taoweizhong
如今能夠用kubectl get node 查看有多少節點了 , 若是想在node節點上使用kubectl 命令須要把 k8s-master 上 /etc/kubernetes/admin.conf 文件copy到幾點機器上並使用 export KUBECONFIG=/etc/kubernetes/admin.conf, 這個在初始化的時候已經提到, 能夠用scp 命令拷貝
在Linux系統中使用yum安裝軟件時,提示yum處於鎖定狀態
1 Another app is currently holding the yum lock; waiting for it to exit...
可經過強制關閉yum進程
1 #rm -f /var/run/yum.pid
t@slave2 taoweizhong]# kubectl apply -f kube-flannel.yml
unable to recognize "kube-flannel.yml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused
kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: activating (auto-restart) (Result: exit-code) since Thu 2019-08-01 07:53:40 PDT; 6s ago
Docs: https://kubernetes.io/docs/
Process: 23003 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=255)
Main PID: 23003 (code=exited, status=255)
Aug 01 07:53:40 slave2 systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
Aug 01 07:53:40 slave2 systemd[1]: Unit kubelet.service entered failed state.
Aug 01 07:53:40 slave2 systemd[1]: kubelet.service failed.
[root@slave2 taoweizhong]#
Aug 02 06:13:59 slave2 kubelet[15376]: E0802 06:13:59.567319 15376 pod_workers.go:190] Error syncing pod 6286750b-83ea-4c93-a895-f03a7d3ac8f6 ("kube-proxy-6m2hd_kube-system(6286750b-83ea-4c93-a895-f03a7d3ac8f6)"), skipping: failed to "CreatePodSandbox" for "kube-proxy-6m2hd_kube-system(6286750b-83ea-4c93-a895-f03a7d3ac8f6)" with CreatePodSandboxError: "CreatePodSandbox for pod \"kube-proxy-6m2hd_kube-system(6286750b-83ea-4c93-a895-f03a7d3ac8f6)\" failed: rpc error: code = Unknown desc = failed to create a sandbox for pod \"kube-proxy-6m2hd\": Error response from daemon: error creating overlay mount to /var/lib/docker/overlay2/458f364c092810a4ce67b80279af2f9de926d5caf0d639c46130e6876b2aca59-init/merged: no such file or directory"
在網上搜索一番後,一個可行的方案以下(改變storage driver類型, 禁用selinux):
中止docker服務
systemctl stop docker
清理鏡像 rm -rf /var/lib/docker
修改存儲類型
vi /etc/sysconfig/docker-storage
DOCKER_STORAGE_OPTIONS="--storage-driver overlay"
1 [ERROR Swap]: running with swap on is not supported. Please disable swap
2 [ERROR SystemVerification]: missing cgroups: memory
3 [ERROR ImagePull]: failed to pull image [k8s.gcr.io/kube-apiserver-amd64:v1.12.2]
建議不使用: swapoff -a
註釋掉/etc/fstab下swap掛載後安裝成功
重置kubernetes服務,重置網絡。刪除網絡配置,link
kubeadm reset
systemctl stop kubelet
systemctl stop docker
rm -rf /var/lib/cni/
rm -rf /var/lib/kubelet/*
rm -rf /etc/cni/
ifconfig cni0 down
ifconfig flannel.1 down
ifconfig docker0 down
ip link delete cni0
ip link delete flannel.1
systemctl start docker