因爲業務須要,近期在研究k8s,故就須要先部署一套。我經過官方文檔來部署發現仍是有一些坑,故整理了部署中遇到的問題作個記錄。本文章主要介紹了在centos7環境下k8s 1.8.2+dashboard+metrics server+ingress的部署。node
1,k8s的版本爲1.8.2
2,docker ce的版本爲19.03.8-3
3,五臺主機操做系統版本爲centos7,kernel版本3.10.0-957
4,使用五臺主機部署,機器列表
172.18.2.175 master1
172.18.2.180 master2
172.18.2.181 master3
172.18.2.186 work1
172.18.2.187 work2
172.18.2.182 apiserver-lblinux
1,etcd是使用Go語言開發的一個開源的、高可用的強一致性分佈式key-value存儲系統,能夠用於配置共享和服務的註冊和發現集羣,每一個節點均可以提供服務。
2,kubernetes系統組件間只能經過API服務器通訊,它們之間不會直接通訊,API服務器是和etcd通訊的惟一組件。 其餘組件不會直接和etcd通訊,須要經過API服務器來修改集羣狀態。
3,controller-manager和scheduler監聽API服務器變化,若是API服務器有更新則進行對應的操做。
4,因爲各個組件都須要和API服務器通訊,默認狀況下組件經過指定一臺API服務器的ip進行通訊。故須要配置API服務的高可用,咱們經過單獨部署一套高可用負載均衡服務,配置一個VIP,此VIP的後端是三臺API服務器,在負載均衡層作轉發和API服務器的監控檢查,從而實現API服務的高可用。
5,默認狀況下,master節點本機的組件只會和本機的API服務器或者etcd進行通訊。
6,高可用master節點至少3臺機器,官方建議能夠根據集羣大小擴容。nginx
1,確認每臺機器的時區和時間都正確,若是不正確執行以下命令git
# rm -rf /etc/localtime;ln -s /usr/share/zoneinfo/Asia/Shanghai /etc/localtime # /usr/sbin/ntpdate -u ntp.ubuntu.com cn.pool.ntp.org;clock -w # echo "*/30 * * * * /usr/sbin/ntpdate -u ntp.ubuntu.com cn.pool.ntp.org;clock -w" >> /var/spool/cron/root;chmod 600 /var/spool/cron/root
2,每臺機器設置主機名github
hostnamectl set-hostname <hostname>
3,每臺機器添加全部機器的主機名到ip的映射,有些服務經過主機名來相互通訊,例如metrics server獲取node的狀態信息web
# cat << EOF >> /etc/hosts 172.18.2.175 master1 172.18.2.180 master2 172.18.2.181 master3 172.18.2.186 work1 172.18.2.187 work2 EOF
4,確保每臺機器mac地址的惟一性chrome
# ip addr
5,確保每臺機器product_uuid的惟一性docker
# cat /sys/class/dmi/id/product_uuid
6,禁用每臺機器的swapjson
# swapoff -a # sed -i.bak '/ swap /s/^/#/' /etc/fstab
7,因爲k8s在v1.2及以後版本kube-proxy默認使用iptables來實現代理功能,而經過bridge-netfilter的設置可使 iptables過濾bridge的流量。若是容器是鏈接到bridge的這種狀況,那麼就必須將bridge-nf-call-iptables參數設置爲1,使iptables能過濾到bridge的流量,確保kube-proxy正常工做。默認狀況下,iptables不過濾bridge的流量。ubuntu
# lsmod | grep br_netfilter # modprobe br_netfilter 注意:當kernel版本比較低的時候,可能出現報錯找不到對應的module,能夠經過升級kernel解決 # cat <<EOF > /etc/sysctl.d/k8s.conf net.bridge.bridge-nf-call-ip6tables = 1 net.bridge.bridge-nf-call-iptables = 1 EOF # sysctl --system
8,每臺機器關閉firewalld防火牆和selinux
# systemctl disable --now firewalld # setenforce 0 # sed -i 's/SELINUX=enforcing/SELINUX=permissive/g' /etc/selinux/config
9,每臺機器添加阿里k8s和docker的官方yum repo
# cat << EOF > /etc/yum.repos.d/kubernetes.repo [kubernetes] name=Kubernetes baseurl=http://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64 enabled=1 gpgcheck=0 repo_gpgcheck=0 gpgkey=http://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg http://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg EOF # yum install -y yum-utils # yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
9,不一樣角色的機器須要開放對應端口 master節點: |
協議 | 方向 | 端口範圍 | 做用 | 使用者 |
---|---|---|---|---|---|
TCP | 入站 | 6443 | Kubernetes API 服務器 | 全部組件 | |
TCP | 入站 | 2379-2380 | etcd server client API | kube-apiserver, etcd | |
TCP | 入站 | 10250 | Kubelet API | kubelet自身、控制平面組件 | |
TCP | 入站 | 10251 | kube-scheduler | kube-scheduler自身 | |
TCP | 入站 | 10252 | kube-controller-manager | kube-controller-manager自身 |
work節點: |
協議 | 方向 | 端口範圍 | 做用 | 使用者 |
---|---|---|---|---|---|
TCP | 入站 | 10250 | Kubelet API | kubelet 自身、控制平面組件 | |
TCP | 入站 | 30000-32767 | NodePort 服務 | 全部組件 |
10,每臺機器安裝docker環境
# yum install docker-ce -y # systemctl enable --now docker
11,每臺機器安裝kubeadm,kubelet,kubectl
kubeadm:用來初始化集羣的指令。
kubelet:在集羣中的每一個節點上用來啓動pod和容器等。
kubectl:用來與集羣通訊的命令行工具。
# yum install -y kubelet kubeadm kubectl –disableexcludes=kubernetes # systemctl enable --now kubelet
1,建立的HA負載均衡器監聽端口:6443 / TCP
2,配置其後端:172.18.2.175:6443,172.18.2.180:6443,172.18.2.181:6443
3,開啓按源地址保持會話
4,配置完成以後,HA負載均衡VIP爲172.18.2.182
1,在master1上執行init命令
# kubeadm init --kubernetes-version 1.18.2 --image-repository registry.cn-hangzhou.aliyuncs.com/google_containers --control-plane-endpoint apiserver-lb:6443 --upload-certs W0513 07:18:48.318511 30399 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io] [init] Using Kubernetes version: v1.18.2 [preflight] Running pre-flight checks [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
如上輸出有警告信息,咱們經過步驟2來解決
2,調整下docker的cgroup驅動爲k8s官方建議的systemd和使用阿里雲的鏡像加速。因爲默認docker官方鏡像在國外,速度比較慢,阿里雲提供了加速器,可以提升獲取docker官方鏡像的速度。以下修改在每臺機器上進行。
# vim /etc/docker/daemon.json { "registry-mirrors": ["https://v16stybc.mirror.aliyuncs.com"], "exec-opts": ["native.cgroupdriver=systemd"] } # systemctl daemon-reload # systemctl restart docker
3,在master1上繼續執行init命令
# kubeadm init --kubernetes-version 1.18.2 --image-repository registry.cn-hangzhou.aliyuncs.com/google_containers --control-plane-endpoint apiserver-lb:6443 --upload-certs Your Kubernetes control-plane has initialized successfully! To start using your cluster, you need to run the following as a regular user: mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config You should now deploy a pod network to the cluster. Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at: https://kubernetes.io/docs/concepts/cluster-administration/addons/ You can now join any number of the control-plane node running the following command on each as root: kubeadm join apiserver-lb:6443 --token i7ffha.cbp9wse6jhy4uz2q \ --discovery-token-ca-cert-hash sha256:1f084d1ac878308635f1dbe8676bac33fe3df6d52fa212834787a0bc71f1db6d \ --control-plane --certificate-key e6d08e338ee5e0178a85c01067e223d2a00b5ac0e452bca58561976cf2187dd5 Please note that the certificate-key gives access to cluster sensitive data, keep it secret! As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use "kubeadm init phase upload-certs --upload-certs" to reload certs afterward. Then you can join any number of worker nodes by running the following on each as root: kubeadm join apiserver-lb:6443 --token i7ffha.cbp9wse6jhy4uz2q \ --discovery-token-ca-cert-hash sha256:1f084d1ac878308635f1dbe8676bac33fe3df6d52fa212834787a0bc71f1db6d
如上輸出已經提供了初始化其它master和其它work節點的命令(token有過時時間,默認2h,過時則如上命令就失效,須要手動從新生成token),可是須要等master1上全部服務都就緒後才能執行,具體見接下來的步驟。
命令選項說明:
--image-repository:默認master初始化時,k8s會從k8s.gcr.io拉取容器鏡像,因爲國內此地址訪問不到,故調整爲阿里雲倉庫
--control-plane-endpoint: 配置VIP地址映射的域名和port
--upload-certs:將master之間的共享證書上傳到集羣
4,根據步驟3的輸出提示在master1上執行以下命令
# mkdir -p $HOME/.kube # cp -i /etc/kubernetes/admin.conf $HOME/.kube/config # chown $(id -u):$(id -g) $HOME/.kube/config
使用calico作爲pod之間通訊用的CNI(Container Network Interface),並修改calico.yaml以下字段配置,確保calico的ipv4地址池和k8s的service cidr相同
# wget https://docs.projectcalico.org/v3.14/manifests/calico.yaml # vim calico.yaml - name: CALICO_IPV4POOL_CIDR value: "10.96.0.0/12" # kubectl apply -f calico.yaml
5,過10min左右在master1上執行以下命令查看全部的pod是否都處於Running狀態,而後再繼續接下來的步驟
# kubectl get pods -A -o wide NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES kube-system calico-kube-controllers-789f6df884-66bf8 1/1 Running 0 75s 10.97.40.67 master1 <none> <none> kube-system calico-node-65dks 1/1 Running 0 75s 172.18.2.175 master1 <none> <none> kube-system coredns-546565776c-wwdmq 1/1 Running 0 115s 10.97.40.65 master1 <none> <none> kube-system coredns-546565776c-z66mm 1/1 Running 0 115s 10.97.40.66 master1 <none> <none> kube-system etcd-master1 1/1 Running 0 116s 172.18.2.175 master1 <none> <none> kube-system kube-apiserver-master1 1/1 Running 0 116s 172.18.2.175 master1 <none> <none> kube-system kube-controller-manager-master1 1/1 Running 0 116s 172.18.2.175 master1 <none> <none> kube-system kube-proxy-ghc7q 1/1 Running 0 115s 172.18.2.175 master1 <none> <none> kube-system kube-scheduler-master1 1/1 Running 0 116s 172.18.2.175 master1 <none> <none>
6,若是初始化有問題,則執行以下命令後從新初始化
# kubeadm reset # rm -rf $HOME/.kube/config
7,在master上執行驗證API Server是否正常訪問(須要負載均衡正確配置完成)
# curl https://apiserver-lb:6443/version -k { "major": "1", "minor": "18", "gitVersion": "v1.18.2", "gitCommit": "52c56ce7a8272c798dbc29846288d7cd9fbae032", "gitTreeState": "clean", "buildDate": "2020-04-16T11:48:36Z", "goVersion": "go1.13.9", "compiler": "gc", "platform": "linux/amd64"
8,若是距master1初始化時間沒超過2h,則在master2和master3執行以下命令,開始初始化
# kubeadm join apiserver-lb:6443 --token i7ffha.cbp9wse6jhy4uz2q \ --discovery-token-ca-cert-hash sha256:1f084d1ac878308635f1dbe8676bac33fe3df6d52fa212834787a0bc71f1db6d \ --control-plane --certificate-key e6d08e338ee5e0178a85c01067e223d2a00b5ac0e452bca58561976cf2187dd5
master2和master3初始化完成以後,查看node狀態:
# kubectl get nodes NAME STATUS ROLES AGE VERSION master1 Ready master 3h7m v1.18.2 master2 Ready master 169m v1.18.2 master3 Ready master 118m v1.18.2
9,若是距master1初始化時間超過2h,token已通過期,則須要在master1上從新生成token和cert,再在master2和master3上執行初始化
在master1上從新生成token和cert:
# kubeadm init phase upload-certs --upload-certs W0514 13:22:23.433664 656 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io] [upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace [upload-certs] Using certificate key: b55acff8cd105fe152c7de6e49372f9ccde71fc74bdf6ec22a08feaf9f00eba4 # kubeadm token create --print-join-command W0514 13:22:41.748101 955 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io] kubeadm join apiserver-lb:6443 --token 1iznqy.ulvp986lej4zcace --discovery-token-ca-cert-hash sha256:1f084d1ac878308635f1dbe8676bac33fe3df6d52fa212834787a0bc71f1db6d
新的初始化master2和master3的命令以下:
# kubeadm join apiserver-lb:6443 --token 1iznqy.ulvp986lej4zcace --discovery-token-ca-cert-hash sha256:1f084d1ac878308635f1dbe8676bac33fe3df6d52fa212834787a0bc71f1db6d --control-plane --certificate-key b55acff8cd105fe152c7de6e49372f9ccde71fc74bdf6ec22a08feaf9f00eba4
新的初始化work節點的命令以下:
# kubeadm join apiserver-lb:6443 --token 1iznqy.ulvp986lej4zcace --discovery-token-ca-cert-hash sha256:1f084d1ac878308635f1dbe8676bac33fe3df6d52fa212834787a0bc71f1db6d
1,在work1和work2機器執行以下初始化命令
# kubeadm join apiserver-lb:6443 --token 1iznqy.ulvp986lej4zcace --discovery-token-ca-cert-hash sha256:1f084d1ac878308635f1dbe8676bac33fe3df6d52fa212834787a0bc71f1db6d
2,若是要從新初始化或者移除一個work節點,則執行以下步驟。
在要從新初始化的work上執行:
# kubeadm reset
在master上執行:
# kubectl delete node work1 # kubectl delete node work2
3,在master1上執行查看master和work節點是否都正常運行
# kubectl get nodes NAME STATUS ROLES AGE VERSION master1 Ready master 4h31m v1.18.2 master2 Ready master 4h13m v1.18.2 master3 Ready master 3h22m v1.18.2 work1 Ready <none> 82m v1.18.2 work2 Ready <none> 81m v1.18.2 # kubectl get pods -A NAMESPACE NAME READY STATUS RESTARTS AGE kube-system calico-kube-controllers-789f6df884-vdz42 1/1 Running 1 4h37m kube-system calico-node-429s9 1/1 Running 1 89m kube-system calico-node-4cmwj 1/1 Running 1 4h37m kube-system calico-node-bhw9s 1/1 Running 1 89m kube-system calico-node-rw752 1/1 Running 1 3h29m kube-system calico-node-xcqp8 1/1 Running 1 4h21m kube-system coredns-546565776c-jjlsm 1/1 Running 1 4h38m kube-system coredns-546565776c-ztglq 1/1 Running 1 4h38m kube-system etcd-master1 1/1 Running 2 4h38m kube-system etcd-master2 1/1 Running 2 4h20m kube-system etcd-master3 1/1 Running 1 3h29m kube-system kube-apiserver-master1 1/1 Running 1 4h38m kube-system kube-apiserver-master2 1/1 Running 2 4h20m kube-system kube-apiserver-master3 1/1 Running 1 3h29m kube-system kube-controller-manager-master1 1/1 Running 2 4h38m kube-system kube-controller-manager-master2 1/1 Running 1 4h20m kube-system kube-controller-manager-master3 1/1 Running 1 3h29m kube-system kube-proxy-5lf4b 1/1 Running 1 89m kube-system kube-proxy-dwh7w 1/1 Running 1 4h38m kube-system kube-proxy-nndpn 1/1 Running 1 89m kube-system kube-proxy-spclw 1/1 Running 1 4h21m kube-system kube-proxy-zc25r 1/1 Running 1 3h29m kube-system kube-scheduler-master1 1/1 Running 2 4h38m kube-system kube-scheduler-master2 1/1 Running 2 4h20m kube-system kube-scheduler-master3 1/1 Running 1 3h29m
Dashboard能夠將容器應用部署到Kubernetes集羣中,也能夠對容器應用排錯,還能管理集羣資源。您可使用Dashboard獲取運行在集羣中的應用的概覽信息,也能夠建立或者修改Kubernetes資源(如 Deployment,Job,DaemonSet 等等)。例如,您能夠對Deployment實現彈性伸縮、發起滾動升級、重啓Pod或者使用嚮導建立新的應用。
Dashboard同時展現了Kubernetes集羣中的資源狀態信息和全部報錯信息。
1,在master1上安裝
下載manifests:
# wget https://raw.githubusercontent.com/kubernetes/dashboard/v2.0.0/aio/deploy/recommended.yaml
經過NodePod的方式訪問dashboard
修改recommended.yaml以下內容 kind: Service apiVersion: v1 metadata: labels: k8s-app: kubernetes-dashboard name: kubernetes-dashboard namespace: kubernetes-dashboard spec: ports: - port: 443 targetPort: 8443 selector: k8s-app: kubernetes-dashboard 爲 kind: Service apiVersion: v1 metadata: labels: k8s-app: kubernetes-dashboard name: kubernetes-dashboard namespace: kubernetes-dashboard spec: type: NodePort ports: - port: 443 targetPort: 8443 nodePort: 30008 selector: k8s-app: kubernetes-dashboard
部署dashboard
# kubectl apply -f recommended.yaml
2,經過firefox訪問(chrome和safari瀏覽器安全限制等級過高,對於自制的證書禁止訪問,firefox能夠添加例外來實現訪問)此地址:https://172.18.2.175:30008
獲取登錄dashboard的token
# kubectl -n kubernetes-dashboard describe secret $(kubectl -n kubernetes-dashboard get secret | grep dashboard-admin | awk '{print $1}')
3,登錄以後,發現不能選擇命名空間,經過查kubernetes-dashboard這個pod日誌來分析緣由
# # kubectl logs -f kubernetes-dashboard-7b544877d5-225rk -n kubernetes-dashboard 2020/05/14 08:21:35 Getting list of all pet sets in the cluster 2020/05/14 08:21:35 Non-critical error occurred during resource retrieval: pods is forbidden: User "system:serviceaccount:kubernetes-dashboard:kubernetes-dashboard" cannot list resource "pods" in API group "" in the namespace "default" 2020/05/14 08:21:35 Non-critical error occurred during resource retrieval: events is forbidden: User "system:serviceaccount:kubernetes-dashboard:kubernetes-dashboard" cannot list resource "events" in API group "" in the namespace "default" 2020/05/14 08:21:35 [2020-05-14T08:21:35Z] Outcoming response to 10.97.40.64:58540 with 200 status code 2020/05/14 08:21:35 Non-critical error occurred during resource retrieval: statefulsets.apps is forbidden: User "system:serviceaccount:kubernetes-dashboard:kubernetes-dashboard" cannot list resource "statefulsets" in API group "apps" in the namespace "default" 2020/05/14 08:21:35 Non-critical error occurred during resource retrieval: pods is forbidden: User "system:serviceaccount:kubernetes-dashboard:kubernetes-dashboard" cannot list resource "pods" in API group "" in the namespace "default" 2020/05/14 08:21:35 Non-critical error occurred during resource retrieval: events is forbidden: User "system:serviceaccount:kubernetes-dashboard:kubernetes-dashboard" cannot list resource "events" in API group "" in the namespace "default"
經過如上日誌咱們能夠看到dashboard沒有訪問其餘namespace和相關資源的權限,咱們經過調整rbac來解決:
# vim r.yaml kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1 metadata: labels: k8s-app: kubernetes-dashboard name: kubernetes-dashboard rules: # Allow Metrics Scraper to get metrics from the Metrics server - apiGroups: ["","apps","batch","extensions", "metrics.k8s.io"] resources: ["*"] verbs: ["get", "list", "watch"] # kubectl apply -f r.yaml
4,咱們再刷新dashboard發現數據都已經正常顯示了
從 Kubernetes 1.8開始,官方廢棄heapster項目,爲了將核心資源監控做爲一等公民對待,資源使用指標,例如容器 CPU 和內存使用率,可經過 Metrics API 在 Kubernetes 中得到。這些指標能夠直接被用戶訪問,好比使用kubectl top命令行,或者這些指標由集羣中的控制器使用,例如,Horizontal Pod Autoscaler,使用這些指標來作決策。主要有兩部分功能:
1,Metrics API
經過Metrics API,您能夠得到指定節點或pod當前使用的資源量。此API不存儲指標值,所以想要獲取某個指定節點10分鐘前的資源使用量是不可能的。
2,Metrics Server
它集羣範圍資源使用數據的聚合器。 從Kubernetes 1.8開始,它做爲Deployment對象,被默認部署在由kube-up.sh腳本建立的集羣中。若是您使用不一樣的Kubernetes安裝方法,則可使用提供的deployment manifests來部署。Metric server 從每一個節點上的 Kubelet 公開的 Summary API 中採集指標信息。
1,安裝
下載和修改manifests文件替換國內訪問不到的k8s.gcr.io地址
# wget https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.6/components.yaml # sed -i 's#k8s.gcr.io#registry.cn-hangzhou.aliyuncs.com/google_containers#g' components.yaml # kubectl apply -f components.yaml
2,測試使用
確保metrics server運行
# kubectl get pods -A |grep "metrics-server" kube-system metrics-server-68b7c54c96-nqpds 1/1 Running 0 48s
獲取node的cpu,內存信息,發現報錯
# kubectl top nodes error: metrics not available yet
3,查看metrics-server-68b7c54c96-nqpds這個pod的日誌來分析
# kubectl logs -f metrics-server-68b7c54c96-nqpds -n kube-system E0514 11:20:58.357516 1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:work2: unable to fetch metrics from Kubelet work2 (work2): Get https://work2:10250/stats/summary?only_cpu_and_memory=true: dial tcp: lookup work2 on 10.96.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:master2: unable to fetch metrics from Kubelet master2 (master2): Get https://master2:10250/stats/summary?only_cpu_and_memory=true: dial tcp: lookup master2 on 10.96.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:master1: unable to fetch metrics from Kubelet master1 (master1): Get https://master1:10250/stats/summary?only_cpu_and_memory=true: dial tcp: lookup master1 on 10.96.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:work1: unable to fetch metrics from Kubelet work1 (work1): Get https://work1:10250/stats/summary?only_cpu_and_memory=true: dial tcp: lookup work1 on 10.96.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:master3: unable to fetch metrics from Kubelet master3 (master3): Get https://master3:10250/stats/summary?only_cpu_and_memory=true: dial tcp: lookup master3 on 10.96.0.10:53: no such host]
經過查看如上log,應該是dns解析的問題。k8s中使用coredns負責全部pod的dns解析,而master1,master2,master3,work1,work2是服務器的主機名不是pod的,故沒有對應的解析。
4,經過google,發現解決辦法有兩個:
第一個辦法:直接使用http方式+ip來獲取node的metrics信息,缺點就是不安全,沒有了https。找到componets.yaml文件中args相關的行,args修改成以下內容:
containers: - args: - --cert-dir=/tmp - --secure-port=4443 - --kubelet-insecure-tls - --kubelet-preferred-address-types=InternalIP # kubectl apply -f components.yaml
過幾分鐘,就能正常獲取node的cpu,內存信息
# kubectl top nodes NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% master1 204m 10% 1189Mi 68% master2 137m 6% 1079Mi 62% master3 141m 7% 1085Mi 62% work1 92m 4% 879Mi 50% work2 94m 4% 876Mi 50%
第二個辦法:仍然使用https+域名的安全方式訪問,對其它組件作調整。缺點就是麻煩,擴容的時候也須要考慮到這一步驟。
1)給coredns添加全部機器的主機名解析
獲取目前coredns目前的配置:
# kubectl -n kube-system get configmap coredns -o yaml > coredns.yaml
給coredns配置添加hosts塊的配置,此配置從默認從/etc/hosts加載映射後添加到coredns的解析中:
# cat coredns.yaml apiVersion: v1 data: Corefile: | .:53 { errors health { lameduck 5s } ready kubernetes cluster.local in-addr.arpa ip6.arpa { pods insecure fallthrough in-addr.arpa ip6.arpa ttl 30 } hosts { 172.18.2.175 master1 172.18.2.180 master2 172.18.2.181 master3 172.18.2.186 work1 172.18.2.187 work2 172.18.2.182 apiserver-lb fallthrough } prometheus :9153 forward . /etc/resolv.conf cache 30 loop reload loadbalance } kind: ConfigMap metadata: creationTimestamp: "2020-05-14T02:21:41Z" managedFields: - apiVersion: v1 fieldsType: FieldsV1 fieldsV1: f:data: .: {} f:Corefile: {} manager: kubeadm operation: Update time: "2020-05-14T02:21:41Z" name: coredns namespace: kube-system resourceVersion: "216" selfLink: /api/v1/namespaces/kube-system/configmaps/coredns uid: a0e4adaa-8577-4b99-aef2-a543988a6ea8 # kubectl apply -f coredns.yaml
2)查看metrics-server-68b7c54c96-d9r25這個pod的日誌
# kubectl logs -f metrics-server-68b7c54c96-d9r25 -n kube-system E0514 11:52:59.242690 1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:master1: unable to fetch metrics from Kubelet master1 (master1): Get https://master1:10250/stats/summary?only_cpu_and_memory=true: x509: certificate signed by unknown authority, unable to fully scrape metrics from source kubelet_summary:master3: unable to fetch metrics from Kubelet master3 (master3): Get https://master3:10250/stats/summary?only_cpu_and_memory=true: x509: certificate signed by unknown authority, unable to fully scrape metrics from source kubelet_summary:work1: unable to fetch metrics from Kubelet work1 (work1): Get https://work1:10250/stats/summary?only_cpu_and_memory=true: x509: certificate signed by unknown authority, unable to fully scrape metrics from source kubelet_summary:work2: unable to fetch metrics from Kubelet work2 (work2): Get https://work2:10250/stats/summary?only_cpu_and_memory=true: x509: certificate signed by unknown authority, unable to fully scrape metrics from source kubelet_summary:master2: unable to fetch metrics from Kubelet master2 (master2): Get https://master2:10250/stats/summary?only_cpu_and_memory=true: x509: certificate signed by unknown authority]
咱們發現又有新的報錯產生,看着應該是證書問題。經過google發現多是因爲master節點上kubelet的證書和node上kubelet的證書由不一樣的本地系統的ca簽發,致使不可信。全部節點(master和node)上的證書,能夠在master1上是用master1本地的CA從新生成全部節點的kubelet證書來解決。
3)從新生成master1上kubelet的證書
安裝CFSSL
curl -s -L -o /bin/cfssl https://pkg.cfssl.org/R1.2/cfssl_linux-amd64 curl -s -L -o /bin/cfssljson https://pkg.cfssl.org/R1.2/cfssljson_linux-amd64 curl -s -L -o /bin/cfssl-certinfo https://pkg.cfssl.org/R1.2/cfssl-certinfo_linux-amd64 chmod +x /bin/cfssl*
生成過時時間爲2年的證書配置
# mkdir ~/mycerts; cd ~/mycerts # cp /etc/kubernetes/pki/ca.crt ca.pem # cp /etc/kubernetes/pki/ca.key ca-key.pem # cat kubelet-csr.json { "CN": "kubernetes", "hosts": [ "127.0.0.1", "master1", "kubernetes", "kubernetes.default", "kubernetes.default.svc", "kubernetes.default.svc.cluster", "kubernetes.default.svc.cluster.local" ], "key": { "algo": "rsa", "size": 2048 }, "names": [{ "C": "US", "ST": "NY", "L": "City", "O": "Org", "OU": "Unit" }] } # cat ca-config.json { "signing": { "default": { "expiry": "17520h" }, "profiles": { "kubernetes": { "usages": [ "signing", "key encipherment", "server auth", "client auth" ], "expiry": "17520h" } } } } # cat config.json { "signing": { "default": { "expiry": "168h" }, "profiles": { "www": { "expiry": "17520h", "usages": [ "signing", "key encipherment", "server auth" ] }, "client": { "expiry": "17520h", "usages": [ "signing", "key encipherment", "client auth" ] } } } } # cfssl gencert -ca=ca.pem -ca-key=ca-key.pem \ --config=ca-config.json -profile=kubernetes \ kubelet-csr.json | cfssljson -bare kubelet # scp kubelet.pem root@master1:/var/lib/kubelet/pki/kubelet.crt # scp kubelet-key.pem root@master1:/var/lib/kubelet/pki/kubelet.key
4)在master1上爲master2生成kubelet證書,只須要修改(3)步驟中kubelet-csr.json配置中master1改成master2,scp中master1爲master2,而後完整執行(3)的其它步驟便可。master3,work1,work2證書的生成步驟相同。
5)重啓每臺機器的kubelet
# systemctl restart kubelet
6)過幾分鐘,就能正常獲取node的cpu,內存信息。而且,經過dashbaord也能顯示node的cpu和內存信息了。
# kubectl top nodes NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% master1 246m 12% 1202Mi 69% master2 152m 7% 1094Mi 62% master3 160m 8% 1096Mi 63% work1 97m 4% 882Mi 50% work2 98m 4% 879Mi 50%
nginx ingress安裝沒有坑,直接按照官方文檔安裝便可,連接爲:https://docs.nginx.com/nginx-ingress-controller/installation/installation-with-manifests/
https://docs.docker.com/engine/install/centos/
https://kubernetes.io/zh/docs/tasks/access-application-cluster/web-ui-dashboard/
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/
https://kubernetes.io/zh/docs/tasks/access-application-cluster/web-ui-dashboard/
https://kubernetes.io/zh/docs/tasks/debug-application-cluster/resource-metrics-pipeline/
https://coredns.io/plugins/hosts/
https://stackoverflow.com/questions/53212149/x509-certificate-signed-by-unknown-authority-kubeadm
https://docs.nginx.com/nginx-ingress-controller/installation/installation-with-manifests/