【k8s部署】6. 部署 worker 節點

若是沒有特殊指明,全部操做均在 zhaoyixin-k8s-01 節點上執行。html

kubernetes worker 節點運行以下組件:node

  • containerd
  • kubelet
  • kube-proxy
  • calico
  • kube-nginx

0.安裝依賴包

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    ssh root@${node_ip} "yum install -y epel-release" &
    ssh root@${node_ip} "yum install -y chrony conntrack ipvsadm ipset jq iptables curl sysstat libseccomp wget socat git" &
  done

1.apiserver 高可用

本節使用 nginx 4 層透明代理功能實現 Kubernetes worker 節點組件高可用訪問 kube-apiserver 集羣。linux

基於 nginx 代理的 kube-apiserver 高可用方案

  • 控制節點的 kube-controller-manager、kube-scheduler 是多實例部署且鏈接本機的 kube-apiserver,因此只要有一個實例正常,就能夠保證高可用;
  • 集羣內的 Pod 使用 K8S 服務域名 kubernetes 訪問 kube-apiserver, kube-dns 會自動解析出多個 kube-apiserver 節點的 IP,因此也是高可用的;
  • 在每一個節點起一個 nginx 進程,後端對接多個 apiserver 實例,nginx 對它們作健康檢查和負載均衡;
  • kubelet、kube-proxy 經過本地的 nginx(監聽 127.0.0.1)訪問 kube-apiserver,從而實現 kube-apiserver 的高可用。

下載和編譯 nginx

下載源碼:nginx

cd /opt/k8s/work
wget http://nginx.org/download/nginx-1.15.3.tar.gz
tar -xzvf nginx-1.15.3.tar.gz

配置編譯參數:git

cd /opt/k8s/work/nginx-1.15.3
mkdir nginx-prefix
yum install -y gcc make
./configure --with-stream --without-http --prefix=$(pwd)/nginx-prefix --without-http_uwsgi_module --without-http_scgi_module --without-http_fastcgi_module
  • --with-stream:開啓 4 層透明轉發(TCP Proxy)功能;
  • --without-xxx:關閉全部其餘功能,這樣生成的動態連接二進制程序依賴最小;

輸出:github

Configuration summary
  + PCRE library is not used
  + OpenSSL library is not used
  + zlib library is not used

  nginx path prefix: "/opt/k8s/work/nginx-1.15.3/nginx-prefix"
  nginx binary file: "/opt/k8s/work/nginx-1.15.3/nginx-prefix/sbin/nginx"
  nginx modules path: "/opt/k8s/work/nginx-1.15.3/nginx-prefix/modules"
  nginx configuration prefix: "/opt/k8s/work/nginx-1.15.3/nginx-prefix/conf"
  nginx configuration file: "/opt/k8s/work/nginx-1.15.3/nginx-prefix/conf/nginx.conf"
  nginx pid file: "/opt/k8s/work/nginx-1.15.3/nginx-prefix/logs/nginx.pid"
  nginx error log file: "/opt/k8s/work/nginx-1.15.3/nginx-prefix/logs/error.log"
  nginx http access log file: "/opt/k8s/work/nginx-1.15.3/nginx-prefix/logs/access.log"
  nginx http client request body temporary files: "client_body_temp"
  nginx http proxy temporary files: "proxy_temp"

編譯和安裝:web

cd /opt/k8s/work/nginx-1.15.3
make && make install

驗證編譯安裝的 nginx正則表達式

cd /opt/k8s/work/nginx-1.15.3
./nginx-prefix/sbin/nginx -v
nginx version: nginx/1.15.3

安裝和部署 nginx

建立目錄結構:docker

cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    ssh root@${node_ip} "mkdir -p /opt/k8s/kube-nginx/{conf,logs,sbin}"
  done

拷貝二進制程序,並重命名爲 kube-nginx。json

cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    ssh root@${node_ip} "mkdir -p /opt/k8s/kube-nginx/{conf,logs,sbin}"
    scp /opt/k8s/work/nginx-1.15.3/nginx-prefix/sbin/nginx  root@${node_ip}:/opt/k8s/kube-nginx/sbin/kube-nginx
    ssh root@${node_ip} "chmod a+x /opt/k8s/kube-nginx/sbin/*"
  done

配置 nginx,開啓 4 層透明轉發功能:

cd /opt/k8s/work
cat > kube-nginx.conf << \EOF
worker_processes 1;

events {
    worker_connections  1024;
}

stream {
    upstream backend {
        hash $remote_addr consistent;
        server 192.168.16.8:6443        max_fails=3 fail_timeout=30s;
        server 192.168.16.10:6443        max_fails=3 fail_timeout=30s;
        server 192.168.16.6:6443        max_fails=3 fail_timeout=30s;
    }

    server {
        listen 127.0.0.1:8443;
        proxy_connect_timeout 1s;
        proxy_pass backend;
    }
}
EOF
  • upstream backend 中的 server 列表爲集羣中各 kube-apiserver 的節點 IP,須要根據實際狀況修改;

分發配置文件:

cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    scp kube-nginx.conf  root@${node_ip}:/opt/k8s/kube-nginx/conf/kube-nginx.conf
  done

配置 systemd unit 文件,啓動服務

配置 kube-nginx systemd unit 文件:

cd /opt/k8s/work
cat > kube-nginx.service <<EOF
[Unit]
Description=kube-apiserver nginx proxy
After=network.target
After=network-online.target
Wants=network-online.target

[Service]
Type=forking
ExecStartPre=/opt/k8s/kube-nginx/sbin/kube-nginx -c /opt/k8s/kube-nginx/conf/kube-nginx.conf -p /opt/k8s/kube-nginx -t
ExecStart=/opt/k8s/kube-nginx/sbin/kube-nginx -c /opt/k8s/kube-nginx/conf/kube-nginx.conf -p /opt/k8s/kube-nginx
ExecReload=/opt/k8s/kube-nginx/sbin/kube-nginx -c /opt/k8s/kube-nginx/conf/kube-nginx.conf -p /opt/k8s/kube-nginx -s reload
PrivateTmp=true
Restart=always
RestartSec=5
StartLimitInterval=0
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
EOF

分發 systemd unit 文件:

cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    scp kube-nginx.service  root@${node_ip}:/etc/systemd/system/
  done

啓動 kube-nginx 服務:

cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    ssh root@${node_ip} "systemctl daemon-reload && systemctl enable kube-nginx && systemctl restart kube-nginx"
  done

檢查 kube-nginx 服務運行狀態

cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    ssh root@${node_ip} "systemctl status kube-nginx |grep 'Active:'"
  done

確保狀態爲 active (running),不然經過journalctl -u kube-nginx查看日誌,確認緣由。

2.部署 containerd 組件

containerd 實現了 kubernetes 的 Container Runtime Interface (CRI) 接口,提供容器運行時核心功能,如鏡像管理、容器管理等,相比 dockerd 更加簡單、健壯和可移植。

下載和分發二進制文件

下載二進制文件:

cd /opt/k8s/work
wget https://github.com/kubernetes-sigs/cri-tools/releases/download/v1.17.0/crictl-v1.17.0-linux-amd64.tar.gz \
  https://github.com/opencontainers/runc/releases/download/v1.0.0-rc10/runc.amd64 \
  https://github.com/containernetworking/plugins/releases/download/v0.8.5/cni-plugins-linux-amd64-v0.8.5.tgz \
  https://github.com/containerd/containerd/releases/download/v1.3.3/containerd-1.3.3.linux-amd64.tar.gz

解壓:

cd /opt/k8s/work
mkdir containerd
tar -xvf containerd-1.3.3.linux-amd64.tar.gz -C containerd
tar -xvf crictl-v1.17.0-linux-amd64.tar.gz

mkdir cni-plugins
sudo tar -xvf cni-plugins-linux-amd64-v0.8.5.tgz -C cni-plugins

sudo mv runc.amd64 runc

分發二進制文件到全部 worker 節點:

cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    scp containerd/bin/*  crictl  cni-plugins/*  runc  root@${node_ip}:/opt/k8s/bin
    ssh root@${node_ip} "chmod a+x /opt/k8s/bin/* && mkdir -p /etc/cni/net.d"
  done

建立和分發 containerd 配置文件

cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
cat << EOF | sudo tee containerd-config.toml
version = 2
root = "${CONTAINERD_DIR}/root"
state = "${CONTAINERD_DIR}/state"

[plugins]
  [plugins."io.containerd.grpc.v1.cri"]
    sandbox_image = "registry.cn-beijing.aliyuncs.com/images_k8s/pause-amd64:3.1"
    [plugins."io.containerd.grpc.v1.cri".cni]
      bin_dir = "/opt/k8s/bin"
      conf_dir = "/etc/cni/net.d"
  [plugins."io.containerd.runtime.v1.linux"]
    shim = "containerd-shim"
    runtime = "runc"
    runtime_root = ""
    no_shim = false
    shim_debug = false
EOF
cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    ssh root@${node_ip} "mkdir -p /etc/containerd/ ${CONTAINERD_DIR}/{root,state}"
    scp containerd-config.toml root@${node_ip}:/etc/containerd/config.toml
  done

建立 containerd systemd unit 文件

cd /opt/k8s/work
cat <<EOF | sudo tee containerd.service
[Unit]
Description=containerd container runtime
Documentation=https://containerd.io
After=network.target

[Service]
Environment="PATH=/opt/k8s/bin:/bin:/sbin:/usr/bin:/usr/sbin"
ExecStartPre=/sbin/modprobe overlay
ExecStart=/opt/k8s/bin/containerd
Restart=always
RestartSec=5
Delegate=yes
KillMode=process
OOMScoreAdjust=-999
LimitNOFILE=1048576
LimitNPROC=infinity
LimitCORE=infinity

[Install]
WantedBy=multi-user.target
EOF

分發 systemd unit 文件,啓動 containerd 服務

cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    scp containerd.service root@${node_ip}:/etc/systemd/system
    ssh root@${node_ip} "systemctl enable containerd && systemctl restart containerd"
  done

建立和分發 crictl 配置文件

crictl 是兼容 CRI 容器運行時的命令行工具,提供相似於 docker 命令的功能。具體參考官方文檔

cd /opt/k8s/work
cat << EOF | sudo tee crictl.yaml
runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
timeout: 10
debug: false
EOF

分發到全部 worker 節點:

cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    scp crictl.yaml root@${node_ip}:/etc/crictl.yaml
  done

3.部署 kubelet 組件

kubelet 運行在每一個 worker 節點上,接收 kube-apiserver 發送的請求,管理 Pod 容器,執行交互式命令,如 exec、run、logs 等。

kubelet 啓動時自動向 kube-apiserver 註冊節點信息,內置的 cadvisor 統計和監控節點的資源使用狀況。

爲確保安全,部署時關閉了 kubelet 的非安全 http 端口,對請求進行認證和受權,拒絕未受權的訪問(如 apiserver、heapster 的請求)。

下載和分發 kubelet 二進制文件

參考 【k8s部署】5. 部署 master 節點 中的 《下載二進制文件》 一節。

建立 kubelet bootstrap kubeconfig 文件

cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_name in ${NODE_NAMES[@]}
  do
    echo ">>> ${node_name}"

    # 建立 token
    export BOOTSTRAP_TOKEN=$(kubeadm token create \
      --description kubelet-bootstrap-token \
      --groups system:bootstrappers:${node_name} \
      --kubeconfig ~/.kube/config)

    # 設置集羣參數
    kubectl config set-cluster kubernetes \
      --certificate-authority=/etc/kubernetes/cert/ca.pem \
      --embed-certs=true \
      --server=${KUBE_APISERVER} \
      --kubeconfig=kubelet-bootstrap-${node_name}.kubeconfig

    # 設置客戶端認證參數
    kubectl config set-credentials kubelet-bootstrap \
      --token=${BOOTSTRAP_TOKEN} \
      --kubeconfig=kubelet-bootstrap-${node_name}.kubeconfig

    # 設置上下文參數
    kubectl config set-context default \
      --cluster=kubernetes \
      --user=kubelet-bootstrap \
      --kubeconfig=kubelet-bootstrap-${node_name}.kubeconfig

    # 設置默認上下文
    kubectl config use-context default --kubeconfig=kubelet-bootstrap-${node_name}.kubeconfig
  done
  • 向 kubeconfig 寫入的是 token,bootstrap 結束後 kube-controller-manager 爲 kubelet 建立 client 和 server 證書;

查看 kubeadm 爲各節點建立的 token:

$ kubeadm token list --kubeconfig ~/.kube/config
TOKEN                     TTL       EXPIRES                     USAGES                   DESCRIPTION               EXTRA GROUPS
3uk8cy.1yeeawz00uxr2r01   23h       2020-05-31T15:05:58+08:00   authentication,signing   kubelet-bootstrap-token   system:bootstrappers:zhaoyixin-k8s-03
udg7tq.qh9dksbq0u0jxjat   23h       2020-05-31T15:05:55+08:00   authentication,signing   kubelet-bootstrap-token   system:bootstrappers:zhaoyixin-k8s-01
vl120m.v8a8hdecwkpo4cyn   23h       2020-05-31T15:05:57+08:00   authentication,signing   kubelet-bootstrap-token   system:bootstrappers:zhaoyixin-k8s-02
  • token 有效期爲 1 天,超期後將不能再被用來 boostrap kubelet,且會被 kube-controller-manager 的 tokencleaner 清理;
  • kube-apiserver 接收 kubelet 的 bootstrap token 後,將請求的 user 設置爲 system:bootstrap:<Token ID>,group 設置爲 system:bootstrappers,後續將爲這個 group 設置 ClusterRoleBinding;

分發 bootstrap kubeconfig 文件到全部 worker 節點

cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_name in ${NODE_NAMES[@]}
  do
    echo ">>> ${node_name}"
    scp kubelet-bootstrap-${node_name}.kubeconfig root@${node_name}:/etc/kubernetes/kubelet-bootstrap.kubeconfig
  done

建立和分發 kubelet 參數配置文件

建立 kubelet 參數配置文件模板(可配置項參考 代碼中註釋):

cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
cat > kubelet-config.yaml.template <<EOF
kind: KubeletConfiguration
apiVersion: kubelet.config.k8s.io/v1beta1
address: "##NODE_IP##"
staticPodPath: ""
syncFrequency: 1m
fileCheckFrequency: 20s
httpCheckFrequency: 20s
staticPodURL: ""
port: 10250
readOnlyPort: 0
rotateCertificates: true
serverTLSBootstrap: true
authentication:
  anonymous:
    enabled: false
  webhook:
    enabled: true
  x509:
    clientCAFile: "/etc/kubernetes/cert/ca.pem"
authorization:
  mode: Webhook
registryPullQPS: 0
registryBurst: 20
eventRecordQPS: 0
eventBurst: 20
enableDebuggingHandlers: true
enableContentionProfiling: true
healthzPort: 10248
healthzBindAddress: "##NODE_IP##"
clusterDomain: "${CLUSTER_DNS_DOMAIN}"
clusterDNS:
  - "${CLUSTER_DNS_SVC_IP}"
nodeStatusUpdateFrequency: 10s
nodeStatusReportFrequency: 1m
imageMinimumGCAge: 2m
imageGCHighThresholdPercent: 85
imageGCLowThresholdPercent: 80
volumeStatsAggPeriod: 1m
kubeletCgroups: ""
systemCgroups: ""
cgroupRoot: ""
cgroupsPerQOS: true
cgroupDriver: cgroupfs
runtimeRequestTimeout: 10m
hairpinMode: promiscuous-bridge
maxPods: 220
podCIDR: "${CLUSTER_CIDR}"
podPidsLimit: -1
resolvConf: /etc/resolv.conf
maxOpenFiles: 1000000
kubeAPIQPS: 1000
kubeAPIBurst: 2000
serializeImagePulls: false
evictionHard:
  memory.available:  "100Mi"
  nodefs.available:  "10%"
  nodefs.inodesFree: "5%"
  imagefs.available: "15%"
evictionSoft: {}
enableControllerAttachDetach: true
failSwapOn: true
containerLogMaxSize: 20Mi
containerLogMaxFiles: 10
systemReserved: {}
kubeReserved: {}
systemReservedCgroup: ""
kubeReservedCgroup: ""
enforceNodeAllocatable: ["pods"]
EOF
  • address:kubelet 安全端口(https,10250)監聽的地址,不能爲 127.0.0.1,不然 kube-apiserver、heapster 等不能調用 kubelet 的 API;
  • readOnlyPort=0:關閉只讀端口(默認 10255),等效爲未指定;
  • authentication.anonymous.enabled:設置爲 false,不容許匿名訪問 10250 端口;
  • authentication.x509.clientCAFile:指定簽名客戶端證書的 CA 證書,開啓 HTTP 證書認證;
  • authentication.webhook.enabled=true:開啓 HTTPs bearer token 認證;
  • 對於未經過 x509 證書和 webhook 認證的請求(kube-apiserver 或其餘客戶端),將被拒絕,提示 Unauthorized;
  • authroization.mode=Webhook:kubelet 使用 SubjectAccessReview API 查詢 kube-apiserver 某 user、group 是否具備操做資源的權限(RBAC);
  • featureGates.RotateKubeletClientCertificatefeatureGates.RotateKubeletServerCertificate:自動 rotate 證書,證書的有效期取決於 kube-controller-manager 的 --experimental-cluster-signing-duration 參數;
  • 須要 root 帳戶運行;

爲各節點建立和分發 kubelet 配置文件:

cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do 
    echo ">>> ${node_ip}"
    sed -e "s/##NODE_IP##/${node_ip}/" kubelet-config.yaml.template > kubelet-config-${node_ip}.yaml.template
    scp kubelet-config-${node_ip}.yaml.template root@${node_ip}:/etc/kubernetes/kubelet-config.yaml
  done

建立和分發 kubelet systemd unit 文件

建立 kubelet systemd unit 文件模板:

cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
cat > kubelet.service.template <<EOF
[Unit]
Description=Kubernetes Kubelet
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=containerd.service
Requires=containerd.service

[Service]
WorkingDirectory=${K8S_DIR}/kubelet
ExecStart=/opt/k8s/bin/kubelet \\
  --bootstrap-kubeconfig=/etc/kubernetes/kubelet-bootstrap.kubeconfig \\
  --cert-dir=/etc/kubernetes/cert \\
  --network-plugin=cni \\
  --cni-conf-dir=/etc/cni/net.d \\
  --container-runtime=remote \\
  --container-runtime-endpoint=unix:///var/run/containerd/containerd.sock \\
  --root-dir=${K8S_DIR}/kubelet \\
  --kubeconfig=/etc/kubernetes/kubelet.kubeconfig \\
  --config=/etc/kubernetes/kubelet-config.yaml \\
  --hostname-override=##NODE_NAME## \\
  --image-pull-progress-deadline=15m \\
  --volume-plugin-dir=${K8S_DIR}/kubelet/kubelet-plugins/volume/exec/ \\
  --logtostderr=true \\
  --v=2
Restart=always
RestartSec=5
StartLimitInterval=0

[Install]
WantedBy=multi-user.target
EOF
  • 若是設置了 --hostname-override 選項,則 kube-proxy 也須要設置該選項,不然會出現找不到 Node 的狀況;
  • --bootstrap-kubeconfig:指向 bootstrap kubeconfig 文件,kubelet 使用該文件中的用戶名和 token 向 kube-apiserver 發送 TLS Bootstrapping 請求;
  • K8S approve kubelet 的 csr 請求後,在 --cert-dir 目錄建立證書和私鑰文件,而後寫入 --kubeconfig 文件;
  • --pod-infra-container-image 不使用 redhat 的 pod-infrastructure:latest 鏡像,它不能回收容器的殭屍;

爲各節點建立和分發 kubelet systemd unit 文件:

cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_name in ${NODE_NAMES[@]}
  do 
    echo ">>> ${node_name}"
    sed -e "s/##NODE_NAME##/${node_name}/" kubelet.service.template > kubelet-${node_name}.service
    scp kubelet-${node_name}.service root@${node_name}:/etc/systemd/system/kubelet.service
  done

授予 kube-apiserver 訪問 kubelet API 的權限

在執行 kubectl exec、run、logs 等命令時,apiserver 會將請求轉發到 kubelet 的 https 端口。這裏定義 RBAC 規則,受權 apiserver 使用的證書(kubernetes.pem)用戶名(CN:kuberntes-master)訪問 kubelet API 的權限:

kubectl create clusterrolebinding kube-apiserver:kubelet-apis --clusterrole=system:kubelet-api-admin --user kubernetes-master

Bootstrap Token Auth 和授予權限

kubelet 啓動時查找 --kubeletconfig 參數對應的文件是否存在,若是不存在則使用 --bootstrap-kubeconfig 指定的 kubeconfig 文件向 kube-apiserver 發送證書籤名請求 (CSR)。

kube-apiserver 收到 CSR 請求後,對其中的 Token 進行認證,認證經過後將請求的 user 設置爲 system:bootstrap:<Token ID>,group 設置爲 system:bootstrappers,這一過程稱爲 Bootstrap Token Auth

默認狀況下,這個 user 和 group 沒有建立 CSR 的權限,kubelet 啓動失敗。

解決辦法是:建立一個 clusterrolebinding,將 group system:bootstrappers 和 clusterrole system:node-bootstrapper 綁定:

kubectl create clusterrolebinding kubelet-bootstrap --clusterrole=system:node-bootstrapper --group=system:bootstrappers

自動 approve CSR 請求,生成 kubelet client 證書

kubelet 建立 CSR 請求後,下一步須要建立被 approve,有兩種方式:

  1. kube-controller-manager 自動 aprrove;
  2. 手動使用命令 ·kubectl certificate approve·;

CSR 被 approve 後,kubelet 向 kube-controller-manager 請求建立 client 證書,kube-controller-manager 中的 csrapproving controller 使用 SubjectAccessReview API 來檢查 kubelet 請求(對應的 group 是 system:bootstrappers)是否具備相應的權限。

建立三個 ClusterRoleBinding,分別授予 group system:bootstrappers 和 group system:nodes 進行 approve client、renew client、renew server 證書的權限(server csr 是手動 approve 的,見後文):

cd /opt/k8s/work
cat > csr-crb.yaml <<EOF
 # Approve all CSRs for the group "system:bootstrappers"
 kind: ClusterRoleBinding
 apiVersion: rbac.authorization.k8s.io/v1
 metadata:
   name: auto-approve-csrs-for-group
 subjects:
 - kind: Group
   name: system:bootstrappers
   apiGroup: rbac.authorization.k8s.io
 roleRef:
   kind: ClusterRole
   name: system:certificates.k8s.io:certificatesigningrequests:nodeclient
   apiGroup: rbac.authorization.k8s.io
---
 # To let a node of the group "system:nodes" renew its own credentials
 kind: ClusterRoleBinding
 apiVersion: rbac.authorization.k8s.io/v1
 metadata:
   name: node-client-cert-renewal
 subjects:
 - kind: Group
   name: system:nodes
   apiGroup: rbac.authorization.k8s.io
 roleRef:
   kind: ClusterRole
   name: system:certificates.k8s.io:certificatesigningrequests:selfnodeclient
   apiGroup: rbac.authorization.k8s.io
---
# A ClusterRole which instructs the CSR approver to approve a node requesting a
# serving cert matching its client cert.
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: approve-node-server-renewal-csr
rules:
- apiGroups: ["certificates.k8s.io"]
  resources: ["certificatesigningrequests/selfnodeserver"]
  verbs: ["create"]
---
 # To let a node of the group "system:nodes" renew its own server credentials
 kind: ClusterRoleBinding
 apiVersion: rbac.authorization.k8s.io/v1
 metadata:
   name: node-server-cert-renewal
 subjects:
 - kind: Group
   name: system:nodes
   apiGroup: rbac.authorization.k8s.io
 roleRef:
   kind: ClusterRole
   name: approve-node-server-renewal-csr
   apiGroup: rbac.authorization.k8s.io
EOF
kubectl apply -f csr-crb.yaml
  • auto-approve-csrs-for-group:自動 approve node 的第一次 CSR; 注意第一次 CSR 時,請求的 Group 爲 system:bootstrappers;
  • node-client-cert-renewal:自動 approve node 後續過時的 client 證書,自動生成的證書 Group 爲 system:nodes;
  • node-server-cert-renewal:自動 approve node 後續過時的 server 證書,自動生成的證書 Group 爲 system:nodes;

啓動 kubelet 服務

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    ssh root@${node_ip} "mkdir -p ${K8S_DIR}/kubelet/kubelet-plugins/volume/exec/"
    ssh root@${node_ip} "/usr/sbin/swapoff -a"
    ssh root@${node_ip} "systemctl daemon-reload && systemctl enable kubelet && systemctl restart kubelet"
  done
  • 啓動服務前必須先建立工做目錄;
  • 關閉 swap 分區,不然 kubelet 會啓動失敗;

kubelet 啓動後使用 --bootstrap-kubeconfig 向 kube-apiserver 發送 CSR 請求,當這個 CSR 被 approve 後,kube-controller-manager 爲 kubelet 建立 TLS 客戶端證書、私鑰和 --kubeletconfig 文件。

注意:kube-controller-manager 須要配置 --cluster-signing-cert-file--cluster-signing-key-file 參數,纔會爲 TLS Bootstrap 建立證書和私鑰。

查看 kubelet 狀況

稍等一會,三個節點的 CSR 都被自動 approved:

$ kubectl get csr
NAME        AGE   REQUESTOR                      CONDITION
csr-ct8r8   55s   system:node:zhaoyixin-k8s-02   Pending
csr-dtm97   72s   system:bootstrap:udg7tq        Approved,Issued
csr-hwsnh   70s   system:bootstrap:vl120m        Approved,Issued
csr-jxml4   57s   system:node:zhaoyixin-k8s-01   Pending
csr-tw6m6   70s   system:bootstrap:3uk8cy        Approved,Issued
csr-xh7j6   56s   system:node:zhaoyixin-k8s-03   Pending
  • Pending 的 CSR 用於建立 kubelet server 證書,須要手動 approve,參考後文。

全部節點均註冊(NotReady 狀態是預期的,後續安裝了網絡插件後就好):

$ kubectl get node
NAME               STATUS     ROLES    AGE   VERSION
zhaoyixin-k8s-01   NotReady   <none>   98s   v1.16.6
zhaoyixin-k8s-02   NotReady   <none>   96s   v1.16.6
zhaoyixin-k8s-03   NotReady   <none>   96s   v1.16.6

kube-controller-manager 爲各 node 生成了 kubeconfig 文件和公私鑰:

$ ls -l /etc/kubernetes/kubelet.kubeconfig
-rw------- 1 root root 2258 May 30 15:08 /etc/kubernetes/kubelet.kubeconfig
$ ls -l /etc/kubernetes/cert/kubelet-client-*
-rw------- 1 root root 1289 May 30 15:08 /etc/kubernetes/cert/kubelet-client-2020-05-30-15-08-26.pem
lrwxrwxrwx 1 root root   59 May 30 15:08 /etc/kubernetes/cert/kubelet-client-current.pem -> /etc/kubernetes/cert/kubelet-client-2020-05-30-15-08-26.pem
  • 沒有自動生成 kubelet server 證書;

手動 approve server cert csr

基於安全性考慮,CSR approving controllers 不會自動 approve kubelet server 證書籤名請求,須要手動 approve:

$ kubectl get csr
NAME        AGE     REQUESTOR                      CONDITION
csr-ct8r8   2m23s   system:node:zhaoyixin-k8s-02   Pending
csr-dtm97   2m40s   system:bootstrap:udg7tq        Approved,Issued
csr-hwsnh   2m38s   system:bootstrap:vl120m        Approved,Issued
csr-jxml4   2m25s   system:node:zhaoyixin-k8s-01   Pending
csr-tw6m6   2m38s   system:bootstrap:3uk8cy        Approved,Issued
csr-xh7j6   2m24s   system:node:zhaoyixin-k8s-03   Pending

$ # 手動 approve
$ kubectl get csr | grep Pending | awk '{print $1}' | xargs kubectl certificate approve

$ # 自動生成了 server 證書
$  ls -l /etc/kubernetes/cert/kubelet-*
-rw------- 1 root root 1289 May 30 15:08 /etc/kubernetes/cert/kubelet-client-2020-05-30-15-08-26.pem
lrwxrwxrwx 1 root root   59 May 30 15:08 /etc/kubernetes/cert/kubelet-client-current.pem -> /etc/kubernetes/cert/kubelet-client-2020-05-30-15-08-26.pem
-rw------- 1 root root 1338 May 30 15:12 /etc/kubernetes/cert/kubelet-server-2020-05-30-15-12-23.pem
lrwxrwxrwx 1 root root   59 May 30 15:12 /etc/kubernetes/cert/kubelet-server-current.pem -> /etc/kubernetes/cert/kubelet-server-2020-05-30-15-12-23.pem

kubelet api 認證和受權

kubelet 配置了以下認證參數:

  • authentication.anonymous.enabled:設置爲 false,不容許匿名訪問 10250 端口;
  • authentication.x509.clientCAFile:指定簽名客戶端證書的 CA 證書,開啓 HTTPs 證書認證;
  • authentication.webhook.enabled=true:開啓 HTTPs bearer token 認證;

同時配置了以下受權參數:

  • authroization.mode=Webhook:開啓 RBAC 受權;

kubelet 收到請求後,使用 clientCAFile 對證書籤名進行認證,或者查詢 bearer token 是否有效。若是二者都沒經過,則拒絕請求,提示 Unauthorized:

$ curl -s --cacert /etc/kubernetes/cert/ca.pem https://192.168.16.8:10250/metrics
Unauthorized

$ curl -s --cacert /etc/kubernetes/cert/ca.pem -H "Authorization: Bearer 123456" https://192.168.16.8:10250/metrics
Unauthorized

經過認證後,kubelet 使用 SubjectAccessReview API 向 kube-apiserver 發送請求,查詢證書或 token 對應的 user、group 是否有操做資源的權限(RBAC);

證書認證和受權

$ # 權限不足的證書;
$ curl -s --cacert /etc/kubernetes/cert/ca.pem --cert /etc/kubernetes/cert/kube-controller-manager.pem --key /etc/kubernetes/cert/kube-controller-manager-key.pem https://192.168.16.8:10250/metrics
Forbidden (user=system:kube-controller-manager, verb=get, resource=nodes, subresource=metrics)

$ # 使用部署 kubectl 命令行工具時建立的、具備最高權限的 admin 證書;
$ curl -s --cacert /etc/kubernetes/cert/ca.pem --cert /opt/k8s/work/admin.pem --key /opt/k8s/work/admin-key.pem https://192.168.16.8:10250/metrics|head
# HELP apiserver_audit_event_total [ALPHA] Counter of audit events generated and sent to the audit backend.
# TYPE apiserver_audit_event_total counter
apiserver_audit_event_total 0
# HELP apiserver_audit_requests_rejected_total [ALPHA] Counter of apiserver requests rejected due to an error in audit logging backend.
# TYPE apiserver_audit_requests_rejected_total counter
apiserver_audit_requests_rejected_total 0
# HELP apiserver_client_certificate_expiration_seconds [ALPHA] Distribution of the remaining lifetime on the certificate used to authenticate a request.
# TYPE apiserver_client_certificate_expiration_seconds histogram
apiserver_client_certificate_expiration_seconds_bucket{le="0"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="1800"} 0
  • --cacert--cert--key 的參數值必須是文件路徑,如上面的 ./admin.pem 不能省略 ./,不然返回 401 Unauthorized

bear token 認證和受權

建立一個 ServiceAccount,將它和 ClusterRole system:kubelet-api-admin 綁定,從而具備調用 kubelet API 的權限:

kubectl create sa kubelet-api-test
kubectl create clusterrolebinding kubelet-api-test --clusterrole=system:kubelet-api-admin --serviceaccount=default:kubelet-api-test
SECRET=$(kubectl get secrets | grep kubelet-api-test | awk '{print $1}')
TOKEN=$(kubectl describe secret ${SECRET} | grep -E '^token' | awk '{print $2}')
echo ${TOKEN}
$ curl -s --cacert /etc/kubernetes/cert/ca.pem -H "Authorization: Bearer ${TOKEN}" https://192.168.16.8:10250/metrics | head
# HELP apiserver_audit_event_total [ALPHA] Counter of audit events generated and sent to the audit backend.
# TYPE apiserver_audit_event_total counter
apiserver_audit_event_total 0
# HELP apiserver_audit_requests_rejected_total [ALPHA] Counter of apiserver requests rejected due to an error in audit logging backend.
# TYPE apiserver_audit_requests_rejected_total counter
apiserver_audit_requests_rejected_total 0
# HELP apiserver_client_certificate_expiration_seconds [ALPHA] Distribution of the remaining lifetime on the certificate used to authenticate a request.
# TYPE apiserver_client_certificate_expiration_seconds histogram
apiserver_client_certificate_expiration_seconds_bucket{le="0"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="1800"} 0

cadvisor 和 metrics

cadvisor 是內嵌在 kubelet 二進制中的,統計所在節點各容器的資源(CPU、內存、磁盤、網卡)使用狀況的服務。

瀏覽器訪問 kube-apiserver 的安全端口 6443 時,提示證書不被信任。這是由於 kube-apiserver 的 server 證書是咱們建立的根證書 ca.pem 簽名的,須要將根證書 ca.pem 導入操做系統,並設置永久信任。

對於 windows 系統使用如下命令導入 ca.perm:

keytool -import -v -trustcacerts -alias appmanagement -file "PATH...\\ca.pem" -storepass password -keystore cacerts

咱們須要給瀏覽器生成一個 client 證書,訪問 apiserver 的 6443 https 端口時使用。

這裏使用部署 kubectl 命令行工具時建立的 admin 證書、私鑰和上面的 ca 證書,建立一個瀏覽器可使用 PKCS#12/PFX 格式的證書:

$ openssl pkcs12 -export -out admin.pfx -inkey admin-key.pem -in admin.pem -certfile ca.pem

將建立的 admin.pfx 導入到系統的證書中便可。

瀏覽器訪問 https://192.168.16.8:10250/metricshttps://192.168.16.8:10250/metrics/cadvisor 分別返回 kubelet 和 cadvisor 的 metrics。

注意:

  • kubelet.config.json 設置 authentication.anonymous.enabled 爲 false,不容許匿名證書訪問 10250 的 https 服務。

客戶端選擇證書的原理

  1. 證書選擇是在客戶端和服務端 SSL/TLS 握手協商階段商定的;
  2. 服務端若是要求客戶端提供證書,則在握手時會向客戶端發送一個它接受的 CA 列表;
  3. 客戶端查找它的證書列表(通常是操做系統的證書,對於 Mac 爲 keychain),看有沒有被 CA 簽名的證書,若是有,則將它們提供給用戶選擇(證書的私鑰);
  4. 用戶選擇一個證書私鑰,而後客戶端將使用它和服務端通訊;

4.部署 kube-proxy 組件

kube-proxy 運行在全部 worker 節點上,它監聽 apiserver 中 service 和 endpoint 的變化狀況,建立路由規則以提供服務 IP 和負載均衡功能。

下載和分發 kube-proxy 二進制文件

參考 【k8s部署】5. 部署 master 節點 中的 《下載二進制文件》 一節。

建立 kube-proxy 證書

建立證書籤名請求:

cd /opt/k8s/work
cat > kube-proxy-csr.json <<EOF
{
  "CN": "system:kube-proxy",
  "key": {
    "algo": "rsa",
    "size": 2048
  },
  "names": [
    {
      "C": "CN",
      "ST": "BeiJing",
      "L": "BeiJing",
      "O": "k8s",
      "OU": "zhaoyixin"
    }
  ]
}
EOF
  • CN:指定該證書的 User 爲 system:kube-proxy
  • 預約義的 RoleBinding system:node-proxier 將 User system:kube-proxy 與 Role system:node-proxier 綁定,該 Role 授予了調用 kube-apiserver Proxy 相關 API 的權限;
  • 該證書只會被 kube-proxy 當作 client 證書使用,因此 hosts 字段爲空;

生成證書和私鑰:

cd /opt/k8s/work
cfssl gencert -ca=/opt/k8s/work/ca.pem \
  -ca-key=/opt/k8s/work/ca-key.pem \
  -config=/opt/k8s/work/ca-config.json \
  -profile=kubernetes  kube-proxy-csr.json | cfssljson -bare kube-proxy
ls kube-proxy*

建立和分發 kubeconfig 文件

cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
kubectl config set-cluster kubernetes \
  --certificate-authority=/opt/k8s/work/ca.pem \
  --embed-certs=true \
  --server=${KUBE_APISERVER} \
  --kubeconfig=kube-proxy.kubeconfig

kubectl config set-credentials kube-proxy \
  --client-certificate=kube-proxy.pem \
  --client-key=kube-proxy-key.pem \
  --embed-certs=true \
  --kubeconfig=kube-proxy.kubeconfig

kubectl config set-context default \
  --cluster=kubernetes \
  --user=kube-proxy \
  --kubeconfig=kube-proxy.kubeconfig

kubectl config use-context default --kubeconfig=kube-proxy.kubeconfig

分發 kubeconfig 文件:

cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_name in ${NODE_NAMES[@]}
  do
    echo ">>> ${node_name}"
    scp kube-proxy.kubeconfig root@${node_name}:/etc/kubernetes/
  done

建立 kube-proxy 配置文件

建立 kube-proxy config 文件模板:

cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
cat > kube-proxy-config.yaml.template <<EOF
kind: KubeProxyConfiguration
apiVersion: kubeproxy.config.k8s.io/v1alpha1
clientConnection:
  burst: 200
  kubeconfig: "/etc/kubernetes/kube-proxy.kubeconfig"
  qps: 100
bindAddress: ##NODE_IP##
healthzBindAddress: ##NODE_IP##:10256
metricsBindAddress: ##NODE_IP##:10249
enableProfiling: true
clusterCIDR: ${CLUSTER_CIDR}
hostnameOverride: ##NODE_NAME##
mode: "ipvs"
portRange: ""
iptables:
  masqueradeAll: false
ipvs:
  scheduler: rr
  excludeCIDRs: []
EOF
  • bindAddress: 監聽地址;
  • clientConnection.kubeconfig: 鏈接 apiserver 的 kubeconfig 文件;
  • clusterCIDR: kube-proxy 根據 --cluster-cidr 判斷集羣內部和外部流量,指定 --cluster-cidr--masquerade-all 選項後 kube-proxy 纔會對訪問 Service IP 的請求作 SNAT;
  • hostnameOverride: 參數值必須與 kubelet 的值一致,不然 kube-proxy 啓動後會找不到該 Node,從而不會建立任何 ipvs 規則;
  • mode: 使用 ipvs 模式;

爲各節點建立和分發 kube-proxy 配置文件:

cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for (( i=0; i < 3; i++ ))
  do 
    echo ">>> ${NODE_NAMES[i]}"
    sed -e "s/##NODE_NAME##/${NODE_NAMES[i]}/" -e "s/##NODE_IP##/${NODE_IPS[i]}/" kube-proxy-config.yaml.template > kube-proxy-config-${NODE_NAMES[i]}.yaml.template
    scp kube-proxy-config-${NODE_NAMES[i]}.yaml.template root@${NODE_NAMES[i]}:/etc/kubernetes/kube-proxy-config.yaml
  done

建立和分發 kube-proxy systemd unit 文件

cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
cat > kube-proxy.service <<EOF
[Unit]
Description=Kubernetes Kube-Proxy Server
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=network.target

[Service]
WorkingDirectory=${K8S_DIR}/kube-proxy
ExecStart=/opt/k8s/bin/kube-proxy \\
  --config=/etc/kubernetes/kube-proxy-config.yaml \\
  --logtostderr=true \\
  --v=2
Restart=on-failure
RestartSec=5
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
EOF

分發 kube-proxy systemd unit 文件:

cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_name in ${NODE_NAMES[@]}
  do 
    echo ">>> ${node_name}"
    scp kube-proxy.service root@${node_name}:/etc/systemd/system/
  done

啓動並檢查 kube-proxy 服務

cd /opt/k8s/work
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    ssh root@${node_ip} "mkdir -p ${K8S_DIR}/kube-proxy"
    ssh root@${node_ip} "modprobe ip_vs_rr"
    ssh root@${node_ip} "systemctl daemon-reload && systemctl enable kube-proxy && systemctl restart kube-proxy"
  done

檢查啓動結果

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    ssh root@${node_ip} "systemctl status kube-proxy|grep Active"
  done

確保狀態爲 active (running),不然經過 journalctl -u kube-proxy 查看日誌,確認緣由。

查看監聽端口

$ sudo netstat -lnpt|grep kube-prox
tcp        0      0 192.168.16.8:10249      0.0.0.0:*               LISTEN      11115/kube-proxy    
tcp        0      0 192.168.16.8:10256      0.0.0.0:*               LISTEN      11115/kube-proxy
  • 10249:http prometheus metrics port;
  • 10256:http healthz port;

查看 ipvs 路由規則

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    ssh root@${node_ip} "/usr/sbin/ipvsadm -ln"
  done

預期輸出:

>>> 192.168.16.8
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.254.0.1:443 rr
  -> 192.168.16.6:6443            Masq    1      0          0         
  -> 192.168.16.8:6443            Masq    1      0          0         
  -> 192.168.16.10:6443           Masq    1      0          0         
>>> 192.168.16.10
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.254.0.1:443 rr
  -> 192.168.16.6:6443            Masq    1      0          0         
  -> 192.168.16.8:6443            Masq    1      0          0         
  -> 192.168.16.10:6443           Masq    1      0          0         
>>> 192.168.16.6
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.254.0.1:443 rr
  -> 192.168.16.6:6443            Masq    1      0          0         
  -> 192.168.16.8:6443            Masq    1      0          0         
  -> 192.168.16.10:6443           Masq    1      0          0

可見全部經過 https 訪問 K8S SVC kubernetes 的請求都轉發到 kube-apiserver 節點的 6443 端口。

5.部署 calico 網絡

kubernetes 要求集羣內各節點(包括 master 節點)能經過 Pod 網段互聯互通。

calico 使用 IPIP 或 BGP 技術(默認爲 IPIP)爲各節點建立一個能夠互通的 Pod 網絡。

安裝 calico 網絡插件

cd /opt/k8s/work
curl https://docs.projectcalico.org/manifests/calico.yaml -O

修改配置:

$ cp calico.yaml calico.yaml.orig
$ diff calico.yaml.orig calico.yaml
630c630,632
<               value: "192.168.0.0/16"
---
>               value: "172.30.0.0/16"
>             - name: IP_AUTODETECTION_METHOD
>               value: "interface=eth.*"
699c701
<             path: /opt/cni/bin
---
>             path: /opt/k8s/bin
  • 將 Pod 網段地址修改成 172.30.0.0/16,即在 【k8s部署】1. 環境準備和初始化 文中 environment.sh 環境變量文件中定義的 Pod 網段範圍;
  • calico 自動探查互聯網卡,若是有多快網卡,則能夠配置用於互聯的網絡接口命名正則表達式,如上面的 eth.*(根據本身服務器的網絡接口名修改);

運行 calico 插件:

$ kubectl apply -f  calico.yaml
  • calico 插件以 daemonset 方式運行在全部的 K8S 節點上。

查看 calico 運行狀態

$ kubectl get pods -n kube-system -o wide
NAME                                       READY   STATUS    RESTARTS   AGE   IP              NODE               NOMINATED NODE   READINESS GATES
calico-kube-controllers-77d6cbc65f-c9s88   1/1     Running   0          22m   172.30.219.1    zhaoyixin-k8s-02   <none>           <none>
calico-node-gf8h2                          1/1     Running   0          22m   192.168.16.10   zhaoyixin-k8s-02   <none>           <none>
calico-node-n26rj                          1/1     Running   0          22m   192.168.16.8    zhaoyixin-k8s-01   <none>           <none>
calico-node-nx8hz                          1/1     Running   0          22m   192.168.16.6    zhaoyixin-k8s-03   <none>           <none>

使用 crictl 命令查看 calico 使用的鏡像:

$ crictl  images
docker.io/calico/cni                                      v3.14.1             35a7136bc71a7       77.6MB
docker.io/calico/node                                     v3.14.1             04a9b816c7535       90.6MB
docker.io/calico/pod2daemon-flexvol                       v3.14.1             7f93af2e7e114       37.5MB
registry.cn-beijing.aliyuncs.com/images_k8s/pause-amd64   3.1                 21a595adc69ca       326kB
  • 若是 crictl 輸出爲空或執行失敗,則有多是缺乏配置文件 /etc/crictl.yaml 致使的,該文件的配置以下:
$ cat /etc/crictl.yaml
runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
timeout: 10
debug: false

參考

opsnull/follow-me-install-kubernetes-cluster

相關文章
相關標籤/搜索