二進制高可用已填坑

01.系統初始化和全局變量

集羣機器

test1:192.168.0.91
test2:192.168.0.92
test3:192.168.0.93


主機名

設置永久主機名稱,而後從新登陸:

sudo hostnamectl set-hostname test1 # 將 test1 替換爲當前主機名

設置的主機名保存在 /etc/hostname 文件中;

修改每臺機器的 /etc/hosts 文件,添加主機名和 IP 的對應關係:

grep kube-node /etc/hosts
192.168.0.91 test1    test1
192.168.0.92 test2    test2
192.168.0.93 test3    test3


添加 k8s 和 docker 帳戶

在每臺機器上添加 k8s 帳戶,能夠無密碼 sudo:

sudo useradd -m k8s
sudo sh -c 'echo 123456 | passwd k8s --stdin' # 爲 k8s 帳戶設置密碼
sudo visudo
sudo grep '%wheel.*NOPASSWD: ALL' /etc/sudoers
%wheel    ALL=(ALL)    NOPASSWD: ALL
sudo gpasswd -a k8s wheel


在每臺機器上添加 docker 帳戶,將 k8s 帳戶添加到 docker 組中,同時配置 dockerd 參數:

sudo useradd -m docker
sudo gpasswd -a k8s docker
sudo mkdir -p  /etc/docker/
cat /etc/docker/daemon.json
{
    "registry-mirrors": ["https://hub-mirror.c.163.com", "https://docker.mirrors.ustc.edu.cn"],
    "max-concurrent-downloads": 20
}



無密碼 ssh 登陸其它節點

若是沒有特殊指明,本文檔的全部操做均在 test1 節點上執行,而後遠程分發文件和執行命令。?


設置 test1 能夠無密碼登陸全部節點的 k8s 和 root 帳戶:

[k8s@test1 k8s]$ ssh-keygen -t rsa
[k8s@test1 k8s]$ ssh-copy-id root@test1
[k8s@test1 k8s]$ ssh-copy-id root@test2
[k8s@test1 k8s]$ ssh-copy-id root@test3

[k8s@test1 k8s]$ ssh-copy-id k8s@test1
[k8s@test1 k8s]$ ssh-copy-id k8s@test2
[k8s@test1 k8s]$ ssh-copy-id k8s@test3


將可執行文件路徑 /opt/k8s/bin 添加到 PATH 變量中

在每臺機器上添加環境變量:

sudo sh -c "echo 'PATH=/opt/k8s/bin:$PATH:$HOME/bin:$JAVA_HOME/bin' >>/root/.bashrc"
echo 'PATH=/opt/k8s/bin:$PATH:$HOME/bin:$JAVA_HOME/bin' >>~/.bashrc


安裝依賴包

在每臺機器上安裝依賴包:

CentOS:

sudo yum install -y epel-release
sudo yum install -y conntrack ipvsadm ipset jq sysstat curl iptables libseccomp

ipvs 依賴 ipset

關閉防火牆

在每臺機器上關閉防火牆:

sudo systemctl stop firewalld
sudo systemctl disable firewalld
sudo iptables -F && sudo iptables -X && sudo iptables -F -t nat && sudo iptables -X -t nat
sudo sudo iptables -P FORWARD ACCEPT


關閉 swap 分區

若是開啓了 swap 分區,kubelet 會啓動失敗(能夠經過將參數 --fail-swap-on 設置爲 false 來忽略 swap on),故須要在每臺機器上關閉 swap 分區:

sudo swapoff -a


爲了防止開機自動掛載 swap 分區,能夠註釋 /etc/fstab 中相應的條目:

sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab


關閉 SELinux

關閉 SELinux,不然後續 K8S 掛載目錄時可能報錯 Permission denied:

sudo setenforce 0
grep SELINUX /etc/selinux/config 
SELINUX=disabled


修改配置文件,永久生效;


關閉 dnsmasq

linux 系統開啓了 dnsmasq 後(如 GUI 環境),將系統 DNS Server 設置爲 127.0.0.1,這會致使 docker 容器沒法解析域名,須要關閉它:

sudo service dnsmasq stop
sudo systemctl disable dnsmasq



設置系統參數

cat > kubernetes.conf <<EOF
net.bridge.bridge-nf-call-iptables=1
net.bridge.bridge-nf-call-ip6tables=1
net.ipv4.ip_forward=1
vm.swappiness=0
vm.overcommit_memory=1
vm.panic_on_oom=0
fs.inotify.max_user_watches=89100
EOF

sudo cp kubernetes.conf  /etc/sysctl.d/kubernetes.conf
sudo sysctl -p /etc/sysctl.d/kubernetes.conf
sudo mount -t cgroup -o cpu,cpuacct none /sys/fs/cgroup/cpu,cpuacct


加載內核模塊

sudo modprobe br_netfilter
sudo modprobe ip_vs


設置系統時區

sudo timedatectl set-timezone Asia/Shanghai

將當前的 UTC 時間寫入硬件時鐘

sudo timedatectl set-local-rtc 0


重啓依賴於系統時間的服務

sudo systemctl restart rsyslog 
sudo systemctl restart crond


建立目錄

在每臺機器上建立目錄:

sudo mkdir -p /opt/k8s/bin
sudo chown -R k8s /opt/k8s

sudo sudo mkdir -p /etc/kubernetes/cert
sudo chown -R k8s /etc/kubernetes

sudo mkdir -p /etc/etcd/cert
sudo chown -R k8s /etc/etcd/cert

sudo mkdir -p /var/lib/etcd && chown -R k8s /etc/etcd/cert


集羣環境變量

後續的部署步驟將使用下面定義的全局環境變量,請根據本身的機器、網絡狀況修改:

#!/usr/bin/bash


# 生成 EncryptionConfig 所需的加密 key

ENCRYPTION_KEY=$(head -c 32 /dev/urandom | base64)


# 最好使用 當前未用的網段 來定義服務網段和 Pod 網段


# 服務網段,部署前路由不可達,部署後集羣內路由可達(kube-proxy 和 ipvs 保證)

SERVICE_CIDR="10.254.0.0/16"


# Pod 網段,建議 /16 段地址,部署前路由不可達,部署後集羣內路由可達(flanneld 保證)

CLUSTER_CIDR="172.30.0.0/16"


# 服務端口範圍 (NodePort Range)

export NODE_PORT_RANGE="8400-9000"


# 集羣各機器 IP 數組

export NODE_IPS=(192.168.0.91 192.168.0.92 192.168.0.93)


# 集羣各 IP 對應的 主機名數組

export NODE_NAMES=(test1 test2 test3)

# kube-apiserver 的 VIP(HA 組件 keepalived 發佈的 IP)

export MASTER_VIP="192.168.0.235"

# kube-apiserver VIP 地址(HA 組件 haproxy 監聽 8443 端口)

export KUBE_APISERVER="https://${MASTER_VIP}:8443"

# HA 節點,VIP 所在的網絡接口名稱

export VIP_IF="eth0"

# etcd 集羣服務地址列表

export ETCD_ENDPOINTS="https://192.168.0.91:2379,https://192.168.0.92:2379,https://192.168.0.93:2379"

# etcd 集羣間通訊的 IP 和端口

export ETCD_NODES="test1=https://192.168.0.91:2380,test2=https://192.168.0.92:2380,test3=https://192.168.0.93:2380"

# flanneld 網絡配置前綴

export FLANNEL_ETCD_PREFIX="/kubernetes/network"

# kubernetes 服務 IP (通常是 SERVICE_CIDR 中第一個IP)

export CLUSTER_KUBERNETES_SVC_IP="10.254.0.1"

# 集羣 DNS 服務 IP (從 SERVICE_CIDR 中預分配)

export CLUSTER_DNS_SVC_IP="10.254.0.2"

# 集羣 DNS 域名

export CLUSTER_DNS_DOMAIN="cluster.local."

# 將二進制目錄 /opt/k8s/bin 加到 PATH 中

export PATH=/opt/k8s/bin:$PATH

打包後的變量定義見 environment.sh,後續部署時會提示導入該腳本;

分發集羣環境變量定義腳本

把全局變量定義腳本拷貝到全部節點的 /opt/k8s/bin 目錄:

source environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    scp environment.sh k8s@${node_ip}:/opt/k8s/bin/
    ssh k8s@${node_ip} "chmod +x /opt/k8s/bin/*"
  done





02.建立 CA 證書和祕鑰

爲確保安全,kubernetes 系統各組件須要使用 x509 證書對通訊進行加密和認證。

CA (Certificate Authority) 是自簽名的根證書,用來簽名後續建立的其它證書。

本文檔使用 CloudFlare 的 PKI 工具集 cfssl 建立全部證書。


安裝 cfssl 工具集

sudo mkdir -p /opt/k8s/cert && sudo chown -R k8s /opt/k8s && cd /opt/k8s


wget https://pkg.cfssl.org/R1.2/cfssl_linux-amd64
mv cfssl_linux-amd64 /opt/k8s/bin/cfssl

wget https://pkg.cfssl.org/R1.2/cfssljson_linux-amd64
mv cfssljson_linux-amd64 /opt/k8s/bin/cfssljson

wget https://pkg.cfssl.org/R1.2/cfssl-certinfo_linux-amd64
mv cfssl-certinfo_linux-amd64 /opt/k8s/bin/cfssl-certinfo

chmod +x /opt/k8s/bin/*
export PATH=/opt/k8s/bin:$PATH


建立根證書 (CA)

CA 證書是集羣全部節點共享的,只須要建立一個 CA 證書,後續建立的全部證書都由它簽名。

建立配置文件

CA 配置文件用於配置根證書的使用場景 (profile) 和具體參數 (usage,過時時間、服務端認證、客戶端認證、加密等),後續在簽名其它證書時須要指定特定場景。

cat > ca-config.json <<EOF
{
  "signing": {
    "default": {
      "expiry": "87600h"
    },
    "profiles": {
      "kubernetes": {
        "usages": [
            "signing",
            "key encipherment",
            "server auth",
            "client auth"
        ],
        "expiry": "87600h"
      }
    }
  }
}
EOF

signing:表示該證書可用於簽名其它證書,生成的 ca.pem 證書中 CA=TRUE;
server auth:表示 client 能夠用該該證書對 server 提供的證書進行驗證;
client auth:表示 server 能夠用該該證書對 client 提供的證書進行驗證;


建立證書籤名請求文件

cat > ca-csr.json <<EOF
{
  "CN": "kubernetes",
  "key": {
    "algo": "rsa",
    "size": 2048
  },
  "names": [
    {
      "C": "CN",
      "ST": "BeiJing",
      "L": "BeiJing",
      "O": "k8s",
      "OU": "4Paradigm"
    }
  ]
}
EOF

CN:Common Name,kube-apiserver 從證書中提取該字段做爲請求的用戶名 (User Name),瀏覽器使用該字段驗證網站是否合法;
O:Organization,kube-apiserver 從證書中提取該字段做爲請求用戶所屬的組 (Group);
kube-apiserver 將提取的 User、Group 做爲 RBAC 受權的用戶標識;

生成 CA 證書和私鑰

cfssl gencert -initca ca-csr.json | cfssljson -bare ca
ls ca*


分發證書文件

將生成的 CA 證書、祕鑰文件、配置文件拷貝到全部節點的 /etc/kubernetes/cert 目錄下:

source /opt/k8s/bin/environment.sh # 導入 NODE_IPS 環境變量
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    ssh root@${node_ip} "mkdir -p /etc/kubernetes/cert && chown -R k8s /etc/kubernetes"
    scp ca*.pem ca-config.json k8s@${node_ip}:/etc/kubernetes/cert
  done
k8s 帳戶須要有讀寫 /etc/kubernetes 目錄及其子目錄文件的權限;





03.部署 kubectl 命令行工具

kubectl 是 kubernetes 集羣的命令行管理工具,本文檔介紹安裝和配置它的步驟。

kubectl 默認從 ~/.kube/config 文件讀取 kube-apiserver 地址、證書、用戶名等信息,若是沒有配置,執行 kubectl 命令時會報以下錯誤:

kubectl get pods
The connection to the server localhost:8080 was refused - did you specify the right host or port?


下載和分發 kubectl 二進制文件

下載、解壓:

wget https://dl.k8s.io/v1.10.4/kubernetes-client-linux-amd64.tar.gz
tar -xzvf kubernetes-client-linux-amd64.tar.gz


分發到全部使用 kubectl 的節點:

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    scp kubernetes/client/bin/kubectl k8s@${node_ip}:/opt/k8s/bin/
    ssh k8s@${node_ip} "chmod +x /opt/k8s/bin/*"
  done



建立 admin 證書和私鑰

kubectl 與 apiserver https 安全端口通訊,apiserver 對提供的證書進行認證和受權。

kubectl 做爲集羣的管理工具,須要被授予最高權限。這裏建立具備最高權限的 admin 證書。

建立證書籤名請求:

cat > admin-csr.json <<EOF
{
  "CN": "admin",
  "hosts": [],
  "key": {
    "algo": "rsa",
    "size": 2048
  },
  "names": [
    {
      "C": "CN",
      "ST": "BeiJing",
      "L": "BeiJing",
      "O": "system:masters",
      "OU": "4Paradigm"
    }
  ]
}
EOF

O 爲 system:masters,kube-apiserver 收到該證書後將請求的 Group 設置爲 system:masters;

預約義的 ClusterRoleBinding cluster-admin 將 Group system:masters 與 Role cluster-admin 綁定,該 Role 授予全部 API的權限;

該證書只會被 kubectl 當作 client 證書使用,因此 hosts 字段爲空;


生成證書和私鑰:

cfssl gencert -ca=/etc/kubernetes/cert/ca.pem \
  -ca-key=/etc/kubernetes/cert/ca-key.pem \
  -config=/etc/kubernetes/cert/ca-config.json \
  -profile=kubernetes admin-csr.json | cfssljson -bare admin
ls admin*


建立 kubeconfig 文件

kubeconfig 爲 kubectl 的配置文件,包含訪問 apiserver 的全部信息,如 apiserver 地址、CA 證書和自身使用的證書;

source /opt/k8s/bin/environment.sh
# 設置集羣參數
kubectl config set-cluster kubernetes \
  --certificate-authority=/etc/kubernetes/cert/ca.pem \
  --embed-certs=true \
  --server=${KUBE_APISERVER} \
  --kubeconfig=kubectl.kubeconfig

# 設置客戶端認證參數
kubectl config set-credentials admin \
  --client-certificate=admin.pem \
  --client-key=admin-key.pem \
  --embed-certs=true \
  --kubeconfig=kubectl.kubeconfig

# 設置上下文參數
kubectl config set-context kubernetes \
  --cluster=kubernetes \
  --user=admin \
  --kubeconfig=kubectl.kubeconfig
  
# 設置默認上下文

kubectl config use-context kubernetes --kubeconfig=kubectl.kubeconfig

--certificate-authority:驗證 kube-apiserver 證書的根證書;

--client-certificate、--client-key:生成的 admin 證書和私鑰,鏈接 kube-apiserver 時使用

--embed-certs=true:將 ca.pem 和 admin.pem 證書內容嵌入到生成的 kubectl.kubeconfig 文件中(不加時,寫入的是證書文件路徑);


分發 kubeconfig 文件

分發到全部使用 kubectl 命令的節點:

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    ssh k8s@${node_ip} "mkdir -p ~/.kube"
    scp kubectl.kubeconfig k8s@${node_ip}:~/.kube/config
    ssh root@${node_ip} "mkdir -p ~/.kube"
    scp kubectl.kubeconfig root@${node_ip}:~/.kube/config
  done

保存到用戶的 ~/.kube/config 文件;






04.部署 etcd 集羣

etcd 是基於 Raft 的分佈式 key-value 存儲系統,由 CoreOS 開發,經常使用於服務發現、共享配置以及併發控制(如 leader 選舉、分佈式鎖等)。kubernetes 使用 etcd 存儲全部運行數據。

本文檔介紹部署一個三節點高可用 etcd 集羣的步驟:

下載和分發 etcd 二進制文件;

建立 etcd 集羣各節點的 x509 證書,用於加密客戶端(如 etcdctl) 與 etcd 集羣、etcd 集羣之間的數據流;

建立 etcd 的 systemd unit 文件,配置服務參數;

檢查集羣工做狀態;

etcd 集羣各節點的名稱和 IP 以下:

test1:192.168.0.91
test2:192.168.0.92
test3:192.168.0.93


下載和分發 etcd 二進制文件
tar -xvf etcd-v3.3.7-linux-amd64.tar.gz


分發二進制文件到集羣全部節點:

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    scp etcd-v3.3.7-linux-amd64/etcd* k8s@${node_ip}:/opt/k8s/bin
    ssh k8s@${node_ip} "chmod +x /opt/k8s/bin/*"
  done

建立 etcd 證書和私鑰

建立證書籤名請求:

cat > etcd-csr.json <<EOF
{
  "CN": "etcd",
  "hosts": [
    "127.0.0.1",
    "192.168.0.91",
    "192.168.0.92",
    "192.168.0.93"
  ],
  "key": {
    "algo": "rsa",
    "size": 2048
  },
  "names": [
    {
      "C": "CN",
      "ST": "BeiJing",
      "L": "BeiJing",
      "O": "k8s",
      "OU": "4Paradigm"
    }
  ]
}

EOF

hosts 字段指定受權使用該證書的 etcd 節點 IP 或域名列表,這裏將 etcd 集羣的三個節點 IP 都列在其中;


生成證書和私鑰:

cfssl gencert -ca=/etc/kubernetes/cert/ca.pem \
    -ca-key=/etc/kubernetes/cert/ca-key.pem \
    -config=/etc/kubernetes/cert/ca-config.json \
    -profile=kubernetes etcd-csr.json | cfssljson -bare etcd
ls etcd*


分發生成的證書和私鑰到各 etcd 節點:

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    ssh root@${node_ip} "mkdir -p /etc/etcd/cert && chown -R k8s /etc/etcd/cert"
    scp etcd*.pem k8s@${node_ip}:/etc/etcd/cert/
  done


建立 etcd 的 systemd unit 模板文件

source /opt/k8s/bin/environment.sh
cat > etcd.service.template <<EOF
[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target
Documentation=https://github.com/coreos

[Service]
User=k8s
Type=notify
WorkingDirectory=/var/lib/etcd/
ExecStart=/opt/k8s/bin/etcd \\
  --data-dir=/var/lib/etcd \\
  --name=##NODE_NAME## \\
  --cert-file=/etc/etcd/cert/etcd.pem \\
  --key-file=/etc/etcd/cert/etcd-key.pem \\
  --trusted-ca-file=/etc/kubernetes/cert/ca.pem \\
  --peer-cert-file=/etc/etcd/cert/etcd.pem \\
  --peer-key-file=/etc/etcd/cert/etcd-key.pem \\
  --peer-trusted-ca-file=/etc/kubernetes/cert/ca.pem \\
  --peer-client-cert-auth \\
  --client-cert-auth \\
  --listen-peer-urls=https://##NODE_IP##:2380 \\
  --initial-advertise-peer-urls=https://##NODE_IP##:2380 \\
  --listen-client-urls=https://##NODE_IP##:2379,http://127.0.0.1:2379 \\
  --advertise-client-urls=https://##NODE_IP##:2379 \\
  --initial-cluster-token=etcd-cluster-0 \\
  --initial-cluster=${ETCD_NODES} \\
  --initial-cluster-state=new
Restart=on-failure
RestartSec=5
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
EOF

User:指定以 k8s 帳戶運行;

WorkingDirectory、--data-dir:指定工做目錄和數據目錄爲 /var/lib/etcd,需在啓動服務前建立這個目錄;

--name:指定節點名稱,當 --initial-cluster-state 值爲 new 時,--name 的參數值必須位於 --initial-cluster 列表中;

--cert-file、--key-file:etcd server 與 client 通訊時使用的證書和私鑰;

--trusted-ca-file:簽名 client 證書的 CA 證書,用於驗證 client 證書

--peer-cert-file、--peer-key-file:etcd 與 peer 通訊使用的證書和私鑰;

--peer-trusted-ca-file:簽名 peer 證書的 CA 證書,用於驗證 peer 證書;


爲各節點分發 etcd systemd unit 文件

分發時替換模板文件中的變量

source /opt/k8s/bin/environment.sh
for (( i=0; i < 3; i++ ))
  do
    sed -e "s/##NODE_NAME##/${NODE_NAMES[i]}/" -e "s/##NODE_IP##/${NODE_IPS[i]}/" etcd.service.template > etcd-${NODE_IPS[i]}.service 
  done

ls *.service

NODE_NAMES 和 NODE_IPS 爲相同長度的 bash 數組,分別爲節點名稱和對應的 IP;


分發生成的 systemd unit 文件:

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    ssh root@${node_ip} "mkdir -p /var/lib/etcd && chown -R k8s /var/lib/etcd" 
    scp etcd-${node_ip}.service root@${node_ip}:/etc/systemd/system/etcd.service
  done

必須先建立 etcd 數據目錄和工做目錄;




啓動 etcd 服務

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    ssh root@${node_ip} "systemctl daemon-reload && systemctl enable etcd && systemctl restart etcd &"
  done

etcd 進程首次啓動時會等待其它節點的 etcd 加入集羣,命令 systemctl start etcd 會卡住一段時間,爲正常現象。


檢查啓動結果

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    ssh k8s@${node_ip} "systemctl status etcd|grep Active"
  done


驗證服務狀態

部署完 etcd 集羣后,在任一 etc 節點上執行以下命令:

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    ETCDCTL_API=3 /opt/k8s/bin/etcdctl \
    --endpoints=https://${node_ip}:2379 \
    --cacert=/etc/kubernetes/cert/ca.pem \
    --cert=/etc/etcd/cert/etcd.pem \
    --key=/etc/etcd/cert/etcd-key.pem endpoint health
  done

預期輸出:

https://192.168.0.91:2379 is healthy: successfully committed proposal: took = 2.192932ms
https://192.168.0.92:2379 is healthy: successfully committed proposal: took = 3.546896ms
https://192.168.0.93:2379 is healthy: successfully committed proposal: took = 3.013667ms

輸出均爲 healthy 時表示集羣服務正常。








05.部署 flannel 網絡

kubernetes 要求集羣內各節點(包括 master 節點)能經過 Pod 網段互聯互通。flannel 使用 vxlan 技術爲各節點建立一個能夠互通的 Pod 網絡。

flaneel 第一次啓動時,從 etcd 獲取 Pod 網段信息,爲本節點分配一個未使用的 /24 段地址,而後建立 flannedl.1(也多是其它名稱,如 flannel1 等) 接口。

flannel 將分配的 Pod 網段信息寫入 /run/flannel/docker 文件,docker 後續使用這個文件中的環境變量設置 docker0 網橋。


下載和分發 flanneld 二進制文件

到 https://github.com/coreos/flannel/releases 頁面下載最新版本的發佈包:

mkdir flannel
wget https://github.com/coreos/flannel/releases/download/v0.10.0/flannel-v0.10.0-linux-amd64.tar.gz
tar -xzvf flannel-v0.10.0-linux-amd64.tar.gz -C flannel

分發 flanneld 二進制文件到集羣全部節點:

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    scp  flannel/{flanneld,mk-docker-opts.sh} k8s@${node_ip}:/opt/k8s/bin/
    ssh k8s@${node_ip} "chmod +x /opt/k8s/bin/*"
  done

建立 flannel 證書和私鑰

flannel 從 etcd 集羣存取網段分配信息,而 etcd 集羣啓用了雙向 x509 證書認證,因此須要爲 flanneld 生成證書和私鑰。

建立證書籤名請求:

cat > flanneld-csr.json <<EOF
{
  "CN": "flanneld",
  "hosts": [],
  "key": {
    "algo": "rsa",
    "size": 2048
  },
  "names": [
    {
      "C": "CN",
      "ST": "BeiJing",
      "L": "BeiJing",
      "O": "k8s",
      "OU": "4Paradigm"
    }
  ]
}
EOF

該證書只會被 kubectl 當作 client 證書使用,因此 hosts 字段爲空;

生成證書和私鑰:

cfssl gencert -ca=/etc/kubernetes/cert/ca.pem \
  -ca-key=/etc/kubernetes/cert/ca-key.pem \
  -config=/etc/kubernetes/cert/ca-config.json \
  -profile=kubernetes flanneld-csr.json | cfssljson -bare flanneld
ls flanneld*pem

將生成的證書和私鑰分發到全部節點(master 和 worker):

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    ssh root@${node_ip} "mkdir -p /etc/flanneld/cert && chown -R k8s /etc/flanneld"
    scp flanneld*.pem k8s@${node_ip}:/etc/flanneld/cert
  done

向 etcd 寫入集羣 Pod 網段信息

注意:本步驟只需執行一次。

source /opt/k8s/bin/environment.sh
etcdctl \
  --endpoints=${ETCD_ENDPOINTS} \
  --ca-file=/etc/kubernetes/cert/ca.pem \
  --cert-file=/etc/flanneld/cert/flanneld.pem \
  --key-file=/etc/flanneld/cert/flanneld-key.pem \
  set ${FLANNEL_ETCD_PREFIX}/config '{"Network":"'${CLUSTER_CIDR}'", "SubnetLen": 24, "Backend": {"Type": "vxlan"}}'

flanneld 當前版本 (v0.10.0) 不支持 etcd v3,故使用 etcd v2 API 寫入配置 key 和網段數據;

寫入的 Pod 網段 ${CLUSTER_CIDR} 必須是 /16 段地址,必須與 kube-controller-manager 的 --cluster-cidr 參數值一致;

建立 flanneld 的 systemd unit 文件

source /opt/k8s/bin/environment.sh
export IFACE=eth0

cat > flanneld.service << EOF
[Unit]
Description=Flanneld overlay address etcd agent
After=network.target
After=network-online.target
Wants=network-online.target
After=etcd.service
Before=docker.service

[Service]
Type=notify
ExecStart=/opt/k8s/bin/flanneld \\
  -etcd-cafile=/etc/kubernetes/cert/ca.pem \\
  -etcd-certfile=/etc/flanneld/cert/flanneld.pem \\
  -etcd-keyfile=/etc/flanneld/cert/flanneld-key.pem \\
  -etcd-endpoints=${ETCD_ENDPOINTS} \\
  -etcd-prefix=${FLANNEL_ETCD_PREFIX} \\
  -iface=${IFACE}
ExecStartPost=/opt/k8s/bin/mk-docker-opts.sh -k DOCKER_NETWORK_OPTIONS -d /run/flannel/docker
Restart=on-failure

[Install]
WantedBy=multi-user.target
RequiredBy=docker.service
EOF

mk-docker-opts.sh 腳本將分配給 flanneld 的 Pod 子網網段信息寫入 /run/flannel/docker 文件,後續 docker 啓動時使用這個文件中的環境變量配置 docker0 網橋;

flanneld 使用系統缺省路由所在的接口與其它節點通訊,對於有多個網絡接口(如內網和公網)的節點,能夠用 -iface 參數指定通訊接口,如上面的 eth0 接口;

flanneld 運行時須要 root 權限;


分發 flanneld systemd unit 文件到全部節點

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    scp flanneld.service root@${node_ip}:/etc/systemd/system/
  done

啓動 flanneld 服務

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    ssh root@${node_ip} "systemctl daemon-reload && systemctl enable flanneld && systemctl restart flanneld"
  done


檢查啓動結果

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    ssh k8s@${node_ip} "systemctl status flanneld|grep Active"
  done

確保狀態爲 active (running),不然查看日誌,確認緣由:

$ journalctl -u flanneld

檢查分配給各 flanneld 的 Pod 網段信息

查看集羣 Pod 網段(/16):

source /opt/k8s/bin/environment.sh
etcdctl \
  --endpoints=${ETCD_ENDPOINTS} \
  --ca-file=/etc/kubernetes/cert/ca.pem \
  --cert-file=/etc/flanneld/cert/flanneld.pem \
  --key-file=/etc/flanneld/cert/flanneld-key.pem \
  get ${FLANNEL_ETCD_PREFIX}/config

輸出:

{"Network":"172.30.0.0/16", "SubnetLen": 24, "Backend": {"Type": "vxlan"}}

查看已分配的 Pod 子網段列表(/24):

source /opt/k8s/bin/environment.sh
etcdctl \
  --endpoints=${ETCD_ENDPOINTS} \
  --ca-file=/etc/kubernetes/cert/ca.pem \
  --cert-file=/etc/flanneld/cert/flanneld.pem \
  --key-file=/etc/flanneld/cert/flanneld-key.pem \
  ls ${FLANNEL_ETCD_PREFIX}/subnets
輸出:

/kubernetes/network/subnets/172.30.81.0-24
/kubernetes/network/subnets/172.30.29.0-24
/kubernetes/network/subnets/172.30.39.0-24

查看某一 Pod 網段對應的節點 IP 和 flannel 接口地址:

source /opt/k8s/bin/environment.sh
etcdctl \
  --endpoints=${ETCD_ENDPOINTS} \
  --ca-file=/etc/kubernetes/cert/ca.pem \
  --cert-file=/etc/flanneld/cert/flanneld.pem \
  --key-file=/etc/flanneld/cert/flanneld-key.pem \
  get ${FLANNEL_ETCD_PREFIX}/subnets/172.30.81.0-24

輸出:

{"PublicIP":"192.168.0.91","BackendType":"vxlan","BackendData":{"VtepMAC":"12:21:93:9e:b1:eb"}}

驗證各節點能經過 Pod 網段互通

在各節點上部署 flannel 後,檢查是否建立了 flannel 接口(名稱可能爲 flannel0、flannel.0、flannel.1 等):

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    ssh ${node_ip} "/usr/sbin/ip addr show flannel.1|grep -w inet"
  done


輸出:

inet 172.30.81.0/32 scope global flannel.1
inet 172.30.29.0/32 scope global flannel.1
inet 172.30.39.0/32 scope global flannel.1

在各節點上 ping 全部 flannel 接口 IP,確保能通:

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    ssh ${node_ip} "ping -c 1 172.30.81.0"
    ssh ${node_ip} "ping -c 1 172.30.29.0"
    ssh ${node_ip} "ping -c 1 172.30.39.0"
  done







06-0.部署 master 節點

kubernetes master 節點運行以下組件:

kube-apiserver
kube-scheduler
kube-controller-manager
kube-scheduler 和 kube-controller-manager 能夠以集羣模式運行,經過 leader 選舉產生一個工做進程,其它進程處於阻塞模式。

對於 kube-apiserver,能夠運行多個實例(本文檔是 3 實例),但對其它組件須要提供統一的訪問地址,該地址須要高可用。本文檔使用 keepalived 和 haproxy 實現 kube-apiserver VIP 高可用和負載均衡。

下載最新版本的二進制文件

從 CHANGELOG頁面 下載 server tarball 文件。

wget https://dl.k8s.io/v1.10.4/kubernetes-server-linux-amd64.tar.gz
tar -xzvf kubernetes-server-linux-amd64.tar.gz
cd kubernetes

tar -xzvf  kubernetes-src.tar.gz

將二進制文件拷貝到全部 master 節點:

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    scp server/bin/* k8s@${node_ip}:/opt/k8s/bin/
    ssh k8s@${node_ip} "chmod +x /opt/k8s/bin/*"
  done








06-1.部署高可用組件

本文檔講解使用 keepalived 和 haproxy 實現 kube-apiserver 高可用的步驟:

keepalived 提供 kube-apiserver 對外服務的 VIP;

haproxy 監聽 VIP,後端鏈接全部 kube-apiserver 實例,提供健康檢查和負載均衡功能;
運行 keepalived 和 haproxy 的節點稱爲 LB 節點。因爲 keepalived 是一主多備運行模式,故至少兩個 LB 節點。

本文檔複用 master 節點的三臺機器,haproxy 監聽的端口(8443) 須要與 kube-apiserver 的端口 6443 不一樣,避免衝突。

keepalived 在運行過程當中週期檢查本機的 haproxy 進程狀態,若是檢測到 haproxy 進程異常,則觸發從新選主的過程,VIP 將飄移到新選出來的主節點,從而實現 VIP 的高可用。

全部組件(如 kubeclt、apiserver、controller-manager、scheduler 等)都經過 VIP 和 haproxy 監聽的 8443 端口訪問 kube-apiserver 服務。


安裝軟件包

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    ssh root@${node_ip} "yum install -y keepalived haproxy"
  done

配置和下發 haproxy 配置文件

haproxy 配置文件:

cat > haproxy.cfg <<EOF
global
    log /dev/log    local0
    log /dev/log    local1 notice
    chroot /var/lib/haproxy
    stats socket /var/run/haproxy-admin.sock mode 660 level admin
    stats timeout 30s
    user haproxy
    group haproxy
    daemon
    nbproc 1

defaults
    log     global
    timeout connect 5000
    timeout client  10m
    timeout server  10m

listen  admin_stats
    bind 0.0.0.0:10080
    mode http
    log 127.0.0.1 local0 err
    stats refresh 30s
    stats uri /status
    stats realm welcome login\ Haproxy
    stats auth admin:123456
    stats hide-version
    stats admin if TRUE

listen kube-master
    bind 0.0.0.0:8443
    mode tcp
    option tcplog
    balance source
    server 192.168.0.91 192.168.0.91:6443 check inter 2000 fall 2 rise 2 weight 1
    server 192.168.0.92 192.168.0.92:6443 check inter 2000 fall 2 rise 2 weight 1
    server 192.168.0.93 192.168.0.93:6443 check inter 2000 fall 2 rise 2 weight 1
EOF

haproxy 在 10080 端口輸出 status 信息;

haproxy 監聽全部接口的 8443 端口,該端口與環境變量 ${KUBE_APISERVER} 指定的端口必須一致;

server 字段列出全部 kube-apiserver 監聽的 IP 和端口;


下發 haproxy.cfg 到全部 master 節點:

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    scp haproxy.cfg root@${node_ip}:/etc/haproxy
  done


起 haproxy 服務

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    ssh root@${node_ip} "systemctl restart haproxy"
  done


檢查 haproxy 服務狀態

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    ssh root@${node_ip} "systemctl status haproxy|grep Active"
  done


確保狀態爲 active (running),不然查看日誌,確認緣由:

journalctl -u haproxy

檢查 haproxy 是否監聽 8443 端口:

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    ssh root@${node_ip} "netstat -lnpt|grep haproxy"
  done


確保輸出相似於:

tcp        0      0 0.0.0.0:8443            0.0.0.0:*               LISTEN      120583/haproxy

配置和下發 keepalived 配置文件

keepalived 是一主(master)多備(backup)運行模式,故有兩種類型的配置文件。master 配置文件只有一份,backup 配置文件視節點數目而定,對於本文檔而言,規劃以下:

master: 192.168.0.91
backup:192.168.0.9二、192.168.0.93


master 配置文件:

source /opt/k8s/bin/environment.sh
cat  > keepalived-master.conf <<EOF
global_defs {
    router_id lb-master-105
}

vrrp_script check-haproxy {
    script "killall -0 haproxy"
    interval 5
    weight -30
}

vrrp_instance VI-kube-master {
    state MASTER
    priority 120
    dont_track_primary
    interface ${VIP_IF}
    virtual_router_id 68
    advert_int 3
    track_script {
        check-haproxy
    }
    virtual_ipaddress {
        ${MASTER_VIP}
    }
}
EOF

VIP 所在的接口(interface ${VIP_IF})爲 eth0;
使用 killall -0 haproxy 命令檢查所在節點的 haproxy 進程是否正常。若是異常則將權重減小(-30),從而觸發從新選主過程;
router_id、virtual_router_id 用於標識屬於該 HA 的 keepalived 實例,若是有多套 keepalived HA,則必須各不相同;


backup 配置文件:

source /opt/k8s/bin/environment.sh
cat  > keepalived-backup.conf <<EOF
global_defs {
    router_id lb-backup-105
}

vrrp_script check-haproxy {
    script "killall -0 haproxy"
    interval 5
    weight -30
}

vrrp_instance VI-kube-master {
    state BACKUP
    priority 110
    dont_track_primary
    interface ${VIP_IF}
    virtual_router_id 68
    advert_int 3
    track_script {
        check-haproxy
    }
    virtual_ipaddress {
        ${MASTER_VIP}
    }
}
EOF


VIP 所在的接口(interface ${VIP_IF})爲 eth0;

使用 killall -0 haproxy 命令檢查所在節點的 haproxy 進程是否正常。若是異常則將權重減小(-30),從而觸發從新選主過程;

router_id、virtual_router_id 用於標識屬於該 HA 的 keepalived 實例,若是有多套 keepalived HA,則必須各不相同;

priority 的值必須小於 master 的值;



下發 keepalived 配置文件


下發 master 配置文件:

scp keepalived-master.conf root@192.168.0.91:/etc/keepalived/keepalived.conf


下發 backup 配置文件:

scp keepalived-backup.conf root@192.168.0.92:/etc/keepalived/keepalived.conf
scp keepalived-backup.conf root@192.168.0.93:/etc/keepalived/keepalived.conf


起 keepalived 服務
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    ssh root@${node_ip} "systemctl restart keepalived"
  done


檢查 keepalived 服務
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    ssh root@${node_ip} "systemctl status keepalived|grep Active"
  done


確保狀態爲 active (running),不然查看日誌,確認緣由:

journalctl -u keepalived


查看 VIP 所在的節點,確保能夠 ping 通 VIP:

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    ssh ${node_ip} "/usr/sbin/ip addr show ${VIP_IF}"
    ssh ${node_ip} "ping -c 1 ${MASTER_VIP}"
  done


查看 haproxy 狀態頁面

瀏覽器訪問 ${MASTER_VIP}:10080/status 地址,查看 haproxy 狀態頁面:






06-1.部署 kube-apiserver 組件

本文檔講解使用 keepalived 和 haproxy 部署一個 3 節點高可用 master 集羣的步驟,對應的 LB VIP 爲環境變量 ${MASTER_VIP}。

準備工做

下載最新版本的二進制文件、安裝和配置 flanneld 參考:06-0.部署master節點.md

建立 kubernetes 證書和私鑰


建立證書籤名請求:

source /opt/k8s/bin/environment.sh
cat > kubernetes-csr.json <<EOF
{
  "CN": "kubernetes",
  "hosts": [
    "127.0.0.1",
    "192.168.0.91",
    "192.168.0.92",
    "192.168.0.93",
    "${MASTER_VIP}",
    "${CLUSTER_KUBERNETES_SVC_IP}",
    "kubernetes",
    "kubernetes.default",
    "kubernetes.default.svc",
    "kubernetes.default.svc.cluster",
    "kubernetes.default.svc.cluster.local"
  ],
  "key": {
    "algo": "rsa",
    "size": 2048
  },
  "names": [
    {
      "C": "CN",
      "ST": "BeiJing",
      "L": "BeiJing",
      "O": "k8s",
      "OU": "4Paradigm"
    }
  ]
}
EOF

hosts 字段指定受權使用該證書的 IP 或域名列表,這裏列出了 VIP 、apiserver 節點 IP、kubernetes 服務 IP 和域名;

域名最後字符不能是 .(如不能爲 kubernetes.default.svc.cluster.local.),不然解析時失敗,提示: x509: cannot parse dnsName "kubernetes.default.svc.cluster.local.";

若是使用非 cluster.local 域名,如 opsnull.com,則須要修改域名列表中的最後兩個域名爲:kubernetes.default.svc.opsnull、kubernetes.default.svc.opsnull.com

kubernetes 服務 IP 是 apiserver 自動建立的,通常是 --service-cluster-ip-range 參數指定的網段的第一個IP,後續能夠經過以下命令獲取:


$ kubectl get svc kubernetes
NAME         CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   10.254.0.1   <none>        443/TCP   1d


生成證書和私鑰:

cfssl gencert -ca=/etc/kubernetes/cert/ca.pem \
  -ca-key=/etc/kubernetes/cert/ca-key.pem \
  -config=/etc/kubernetes/cert/ca-config.json \
  -profile=kubernetes kubernetes-csr.json | cfssljson -bare kubernetes
ls kubernetes*pem


將生成的證書和私鑰文件拷貝到 master 節點:

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    ssh root@${node_ip} "mkdir -p /etc/kubernetes/cert/ && sudo chown -R k8s /etc/kubernetes/cert/"
    scp kubernetes*.pem k8s@${node_ip}:/etc/kubernetes/cert/
  done


k8s 帳戶能夠讀寫 /etc/kubernetes/cert/ 目錄;

建立加密配置文件
source /opt/k8s/bin/environment.sh
cat > encryption-config.yaml <<EOF
kind: EncryptionConfig
apiVersion: v1
resources:
  - resources:
      - secrets
    providers:
      - aescbc:
          keys:
            - name: key1
              secret: ${ENCRYPTION_KEY}
      - identity: {}
EOF


將加密配置文件拷貝到 master 節點的 /etc/kubernetes 目錄下:

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    scp encryption-config.yaml root@${node_ip}:/etc/kubernetes/
  done

替換後的 encryption-config.yaml 文件:encryption-config.yaml



生成 service account key

cd /etc/kubernetes/
openssl genrsa -out /etc/kubernetes/sa.key 2048
openssl rsa -in /etc/kubernetes/cert/sa.key -pubout -out /etc/kubernetes/cert/sa.pub
ls /etc/kubernetes/pki/sa.*
cd $HOME

分發service account key到全部節點

subprocess.call(["ansible k8s -m copy -a 'src=/etc/kubernetes/sa.key dest=/etc/kubernetes/cert/ force=yes'"], shell=True)
subprocess.call(["ansible k8s -m copy -a 'src=/etc/kubernetes/sa.pub dest=/etc/kubernetes/cert/ force=yes'"], shell=True)




建立 kube-apiserver systemd unit 模板文件

source /opt/k8s/bin/environment.sh
cat > kube-apiserver.service.template <<EOF
[Unit]
Description=Kubernetes API Server
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=network.target

[Service]
ExecStart=/opt/k8s/bin/kube-apiserver \\
  --enable-admission-plugins=plugins=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota \\
  --anonymous-auth=false \\
  --experimental-encryption-provider-config=/etc/kubernetes/encryption-config.yaml \\
  --advertise-address=##NODE_IP## \\
  --bind-address=##NODE_IP## \\
  --insecure-port=0 \\
  --authorization-mode=Node,RBAC \\
  --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname \\
  --runtime-config=api/all \\
  --enable-bootstrap-token-auth \\
  --service-cluster-ip-range=${SERVICE_CIDR} \\
  --service-node-port-range=${NODE_PORT_RANGE} \\
  --tls-cert-file=/etc/kubernetes/cert/kubernetes.pem \\
  --tls-private-key-file=/etc/kubernetes/cert/kubernetes-key.pem \\
  --client-ca-file=/etc/kubernetes/cert/ca.pem \\
  --kubelet-client-certificate=/etc/kubernetes/cert/kubernetes.pem \\
  --kubelet-client-key=/etc/kubernetes/cert/kubernetes-key.pem \\
  --service-account-key-file=/etc/kubernetes/cert/sa.pub \\
  --etcd-cafile=/etc/kubernetes/cert/ca.pem \\
  --etcd-certfile=/etc/kubernetes/cert/kubernetes.pem \\
  --etcd-keyfile=/etc/kubernetes/cert/kubernetes-key.pem \\
  --etcd-servers=${ETCD_ENDPOINTS} \\
  --enable-swagger-ui=true \\
  --allow-privileged=true \\
  --apiserver-count=3 \\
  --audit-log-maxage=30 \\
  --audit-log-maxbackup=3 \\
  --audit-log-maxsize=100 \\
  --audit-log-path=/var/log/kube-apiserver-audit.log \\
  --event-ttl=1h \\
  --alsologtostderr=true \\
  --logtostderr=false \\
  --log-dir=/var/log/kubernetes \\
  --v=2
Restart=on-failure
RestartSec=5
Type=notify
User=k8s
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
EOF


--experimental-encryption-provider-config:啓用加密特性;

--authorization-mode=Node,RBAC: 開啓 Node 和 RBAC 受權模式,拒絕未受權的請求;

--enable-admission-plugins:啓用 ServiceAccount 和 NodeRestriction;

--service-account-key-file:簽名 ServiceAccount Token 的公鑰文件,kube-controller-manager 的 --service-account-private-key-file 指定私鑰文件,二者配對使用;

--tls-*-file:指定 apiserver 使用的證書、私鑰和 CA 文件。--client-ca-file 用於驗證 client (kue-controller-manager、kube-scheduler、kubelet、kube-proxy 等)請求所帶的證書;

--kubelet-client-certificate、--kubelet-client-key:若是指定,則使用 https 訪問 kubelet APIs;須要爲證書對應的用戶(上面 kubernetes*.pem 證書的用戶爲 kubernetes) 用戶定義 RBAC 規則,不然訪問 kubelet API 時提示未受權;

--bind-address: 不能爲 127.0.0.1,不然外界不能訪問它的安全端口 6443--insecure-port=0:關閉監聽非安全端口(8080);

--service-cluster-ip-range: 指定 Service Cluster IP 地址段;

--service-node-port-range: 指定 NodePort 的端口範圍;

--runtime-config=api/all=true: 啓用全部版本的 APIs,如 autoscaling/v2alpha1;

--enable-bootstrap-token-auth:啓用 kubelet bootstrap 的 token 認證;

--apiserver-count=3:指定集羣運行模式,多臺 kube-apiserver 會經過 leader 選舉產生一個工做節點,其它節點處於阻塞狀態;

User=k8s:使用 k8s 帳戶運行;



爲各節點建立和分發 kube-apiserver systemd unit 文件

替換模板文件中的變量,爲各節點建立 systemd unit 文件:

source /opt/k8s/bin/environment.sh
for (( i=0; i < 3; i++ ))
  do
    sed -e "s/##NODE_NAME##/${NODE_NAMES[i]}/" -e "s/##NODE_IP##/${NODE_IPS[i]}/" kube-apiserver.service.template > kube-apiserver-${NODE_IPS[i]}.service 
  done
ls kube-apiserver*.service

NODE_NAMES 和 NODE_IPS 爲相同長度的 bash 數組,分別爲節點名稱和對應的 IP;



分發生成的 systemd unit 文件:

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    ssh root@${node_ip} "mkdir -p /var/log/kubernetes && chown -R k8s /var/log/kubernetes"
    scp kube-apiserver-${node_ip}.service root@${node_ip}:/etc/systemd/system/kube-apiserver.service
  done

必須先建立日誌目錄;

文件重命名爲 kube-apiserver.service;

替換後的 unit 文件:kube-apiserver.service



啓動 kube-apiserver 服務

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    ssh root@${node_ip} "systemctl daemon-reload && systemctl enable kube-apiserver && systemctl restart kube-apiserver"
  done


檢查 kube-apiserver 運行狀態

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    ssh root@${node_ip} "systemctl status kube-apiserver |grep 'Active:'"
  done


報錯:若是找不到kube-controller-manager.kubeconfig文件會報以下錯誤:


[root@test1 ~]# journalctl -u kube-controller-manager
-- Logs begin at Mon 2019-02-04 17:56:47 EST, end at Tue 2019-02-05 01:04:33 EST. --
Feb 04 23:58:13 test1 systemd[1]: [/etc/systemd/system/kube-controller-manager.service:7] Failed to parse service restart specifier, 
Feb 04 23:58:13 test1 systemd[1]: [/etc/systemd/system/kube-controller-manager.service:7] Failed to parse service restart specifier, 
Feb 04 23:58:14 test1 kube-controller-manager[45817]: Flag --port has been deprecated, see --secure-port instead.
Feb 04 23:58:14 test1 kube-controller-manager[45817]: Flag --horizontal-pod-autoscaler-use-rest-clients has been deprecated, Heapster
Feb 04 23:58:14 test1 kube-controller-manager[45817]: I0204 23:58:14.297286   45817 flags.go:33] FLAG: --address="0.0.0.0"



確保狀態爲 active (running),不然到 master 節點查看日誌,確認緣由:

journalctl -u kube-apiserver

打印 kube-apiserver 寫入 etcd 的數據

source /opt/k8s/bin/environment.sh
ETCDCTL_API=3 etcdctl \
    --endpoints=${ETCD_ENDPOINTS} \
    --cacert=/etc/kubernetes/cert/ca.pem \
    --cert=/etc/etcd/cert/etcd.pem \
    --key=/etc/etcd/cert/etcd-key.pem \
    get /registry/ --prefix --keys-only


檢查集羣信息

kubectl cluster-info
Kubernetes master is running at https://192.168.0.235:8443

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

kubectl get all --all-namespaces
NAMESPACE   NAME                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
default     service/kubernetes   ClusterIP   10.254.0.1   <none>        443/TCP   35m

kubectl get componentstatuses
NAME                 STATUS      MESSAGE                                                                                        ERROR
controller-manager   Unhealthy   Get http://127.0.0.1:10252/healthz: dial tcp 127.0.0.1:10252: getsockopt: connection refused
scheduler            Unhealthy   Get http://127.0.0.1:10251/healthz: dial tcp 127.0.0.1:10251: getsockopt: connection refused
etcd-1               Healthy     {"health":"true"}
etcd-0               Healthy     {"health":"true"}
etcd-2               Healthy     {"health":"true"}


注意:


若是執行 kubectl 命令式時輸出以下錯誤信息,則說明使用的 ~/.kube/config 文件不對,請切換到正確的帳戶後再執行該命令:

The connection to the server localhost:8080 was refused - did you specify the right host or port?

執行 kubectl get componentstatuses 命令時,apiserver 默認向 127.0.0.1 發送請求。當 controller-manager、scheduler 以集羣模式運行時,有可能和 kube-apiserver 不在一臺機器上,這時 controller-manager 或 scheduler 的狀態爲 Unhealthy,但實際上它們工做正常。



檢查 kube-apiserver 監聽的端口

sudo netstat -lnpt|grep kube
tcp        0      0 192.168.0.91:6443     0.0.0.0:*               LISTEN      13075/kube-apiserve


6443: 接收 https 請求的安全端口,對全部請求作認證和受權;

因爲關閉了非安全端口,故沒有監聽 8080;



授予 kubernetes 證書訪問 kubelet API 的權限 (按照這個作完後感受沒這一步沒什麼用,還得建立高級權限,建立權限在啓動kubelet前建立)

kubectl create clusterrolebinding kube-apiserver:kubelet-apis --clusterrole=system:kubelet-api-admin --user kubernete

在執行 kubectl exec、run、logs 等命令時,apiserver 會轉發到 kubelet。這裏定義 RBAC 規則,受權 apiserver 調用 kubelet API。







06-2.部署高可用 kube-controller-manager 集羣


本文檔介紹部署高可用 kube-controller-manager 集羣的步驟。

該集羣包含 3 個節點,啓動後將經過競爭選舉機制產生一個 leader 節點,其它節點爲阻塞狀態。當 leader 節點不可用後,剩餘節點將再次進行選舉產生新的 leader 節點,從而保證服務的可用性。

爲保證通訊安全,本文檔先生成 x509 證書和私鑰,kube-controller-manager 在以下兩種狀況下使用該證書:

與 kube-apiserver 的安全端口通訊時;

在安全端口(https,10252) 輸出 prometheus 格式的 metrics;



準備工做

下載最新版本的二進制文件、安裝和配置 flanneld 參考:06-0.部署master節點.md

建立 kube-controller-manager 證書和私鑰

建立證書籤名請求:

cat > kube-controller-manager-csr.json <<EOF
{
    "CN": "system:kube-controller-manager",
    "key": {
        "algo": "rsa",
        "size": 2048
    },
    "hosts": [
      "127.0.0.1",
      "192.168.0.91",
      "192.168.0.92",
      "192.168.0.93"
    ],
    "names": [
      {
        "C": "CN",
        "ST": "BeiJing",
        "L": "BeiJing",
        "O": "system:kube-controller-manager",
        "OU": "4Paradigm"
      }
    ]
}
EOF

hosts 列表包含全部 kube-controller-manager 節點 IP;

CN 爲 system:kube-controller-manager、O 爲 system:kube-controller-manager,kubernetes 內置的 ClusterRoleBindings system:kube-controller-manager 賦予 kube-controller-manager 工做所需的權限。


生成證書和私鑰:

cfssl gencert -ca=/etc/kubernetes/cert/ca.pem \
  -ca-key=/etc/kubernetes/cert/ca-key.pem \
  -config=/etc/kubernetes/cert/ca-config.json \
  -profile=kubernetes kube-controller-manager-csr.json | cfssljson -bare kube-controller-manager


將生成的證書和私鑰分發到全部 master 節點:

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    scp kube-controller-manager*.pem k8s@${node_ip}:/etc/kubernetes/cert/
  done


建立和分發 kubeconfig 文件

kubeconfig 文件包含訪問 apiserver 的全部信息,如 apiserver 地址、CA 證書和自身使用的證書;

source /opt/k8s/bin/environment.sh
kubectl config set-cluster kubernetes \
  --certificate-authority=/etc/kubernetes/cert/ca.pem \
  --embed-certs=true \
  --server=${KUBE_APISERVER} \
  --kubeconfig=kube-controller-manager.kubeconfig

kubectl config set-credentials system:kube-controller-manager \
  --client-certificate=kube-controller-manager.pem \
  --client-key=kube-controller-manager-key.pem \
  --embed-certs=true \
  --kubeconfig=kube-controller-manager.kubeconfig

kubectl config set-context system:kube-controller-manager \
  --cluster=kubernetes \
  --user=system:kube-controller-manager \
  --kubeconfig=kube-controller-manager.kubeconfig

kubectl config use-context system:kube-controller-manager --kubeconfig=kube-controller-manager.kubeconfig



分發 kubeconfig 到全部 master 節點:

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    scp kube-controller-manager.kubeconfig k8s@${node_ip}:/etc/kubernetes/
  done



建立和分發 kube-controller-manager systemd unit 文件

source /opt/k8s/bin/environment.sh

cat > kube-controller-manager.service <<EOF
[Unit]
Description=Kubernetes Controller Manager
Documentation=https://github.com/GoogleCloudPlatform/kubernetes

[Service]
ExecStart=/opt/k8s/bin/kube-controller-manager \\
  --port=0 \\
  --secure-port=10252 \\
  --bind-address=127.0.0.1 \\
  --kubeconfig=/etc/kubernetes/kube-controller-manager.kubeconfig \\
  --service-cluster-ip-range=${SERVICE_CIDR} \\
  --allocate-node-cidrs=true \\
  --cluster-cidr=${CLUSTER_CIDR} \\
  --cluster-name=kubernetes \\
  --cluster-signing-cert-file=/etc/kubernetes/cert/ca.pem \\
  --cluster-signing-key-file=/etc/kubernetes/cert/ca-key.pem \\
  --experimental-cluster-signing-duration=8760h \\
  --root-ca-file=/etc/kubernetes/cert/ca.pem \\
  --service-account-private-key-file=/etc/kubernetes/cert/sa.key \\
  --leader-elect=true \\
  --feature-gates=RotateKubeletServerCertificate=true \\
  --controllers=*,bootstrapsigner,tokencleaner \\
  --horizontal-pod-autoscaler-use-rest-clients=true \\
  --horizontal-pod-autoscaler-sync-period=10s \\
  --tls-cert-file=/etc/kubernetes/cert/kube-controller-manager.pem \\
  --tls-private-key-file=/etc/kubernetes/cert/kube-controller-manager-key.pem \\
  --use-service-account-credentials=true \\
  --alsologtostderr=true \\
  --logtostderr=false \\
  --log-dir=/var/log/kubernetes \\
  --v=2
Restart=on
Restart=on-failure
RestartSec=5
User=k8s

[Install]
WantedBy=multi-user.target
EOF

--port=0:關閉監聽 http /metrics 的請求,同時 --address 參數無效,--bind-address 參數有效;

--secure-port=1025二、--bind-address=0.0.0.0: 在全部網絡接口監聽 10252 端口的 https /metrics 請求;

--kubeconfig:指定 kubeconfig 文件路徑,kube-controller-manager 使用它鏈接和驗證 kube-apiserver;

--cluster-signing-*-file:簽名 TLS Bootstrap 建立的證書;

--experimental-cluster-signing-duration:指定 TLS Bootstrap 證書的有效期;

--root-ca-file:放置到容器 ServiceAccount 中的 CA 證書,用來對 kube-apiserver 的證書進行校驗;

--service-account-private-key-file:簽名 ServiceAccount 中 Token 的私鑰文件,必須和 kube-apiserver 的 --service-account-key-file 指定的公鑰文件配對使用;

--service-cluster-ip-range :指定 Service Cluster IP 網段,必須和 kube-apiserver 中的同名參數一致;

--leader-elect=true:集羣運行模式,啓用選舉功能;被選爲 leader 的節點負責處理工做,其它節點爲阻塞狀態;

--feature-gates=RotateKubeletServerCertificate=true:開啓 kublet server 證書的自動更新特性;

--controllers=*,bootstrapsigner,tokencleaner:啓用的控制器列表,tokencleaner 用於自動清理過時的 Bootstrap token;

--horizontal-pod-autoscaler-*:custom metrics 相關參數,支持 autoscaling/v2alpha1;

--tls-cert-file、--tls-private-key-file:使用 https 輸出 metrics 時使用的 Server 證書和祕鑰;

--use-service-account-credentials=true:

User=k8s:使用 k8s 帳戶運行;

kube-controller-manager 不對請求 https metrics 的 Client 證書進行校驗,故不須要指定 --tls-ca-file 參數,並且該參數已被淘汰。



分發 systemd unit 文件到全部 master 節點:

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    scp kube-controller-manager.service root@${node_ip}:/etc/systemd/system/
  done


kube-controller-manager 的權限

ClusteRole: system:kube-controller-manager 的權限很小,只能建立 secret、serviceaccount 等資源對象,各 controller 的權限分散到 ClusterRole system:controller:XXX 中。

須要在 kube-controller-manager 的啓動參數中添加 --use-service-account-credentials=true 參數,這樣 main controller 會爲各 controller 建立對應的 ServiceAccount XXX-controller。

內置的 ClusterRoleBinding system:controller:XXX 將賦予各 XXX-controller ServiceAccount 對應的 ClusterRole system:controller:XXX 權限。



啓動 kube-controller-manager 服務

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    ssh root@${node_ip} "mkdir -p /var/log/kubernetes && chown -R k8s /var/log/kubernetes"
    ssh root@${node_ip} "systemctl daemon-reload && systemctl enable kube-controller-manager && systemctl restart kube-controller-manager"
  done

必須先建立日誌目錄;



檢查服務運行狀態

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    ssh k8s@${node_ip} "systemctl status kube-controller-manager|grep Active"
  done



確保狀態爲 active (running),不然查看日誌,確認緣由:

journalctl -u kube-controller-manager



查看輸出的 metric

注意:如下命令在 kube-controller-manager 節點上執行。

kube-controller-manager 監聽 10252 端口,接收 https 請求:

sudo netstat -lnpt|grep kube-controll
tcp        0      0 127.0.0.1:10252         0.0.0.0:*               LISTEN      18377/kube-controll



curl -s --cacert /etc/kubernetes/cert/ca.pem https://127.0.0.1:10252/metrics |head

# HELP ClusterRoleAggregator_adds Total number of adds handled by workqueue: ClusterRoleAggregator
# TYPE ClusterRoleAggregator_adds counter
ClusterRoleAggregator_adds 3
# HELP ClusterRoleAggregator_depth Current depth of workqueue: ClusterRoleAggregator
# TYPE ClusterRoleAggregator_depth gauge
ClusterRoleAggregator_depth 0
# HELP ClusterRoleAggregator_queue_latency How long an item stays in workqueueClusterRoleAggregator before being requested.
# TYPE ClusterRoleAggregator_queue_latency summary
ClusterRoleAggregator_queue_latency{quantile="0.5"} 57018
ClusterRoleAggregator_queue_latency{quantile="0.9"} 57268

curl --cacert CA 證書用來驗證 kube-controller-manager https server 證書;




測試 kube-controller-manager 集羣的高可用

停掉一個或兩個節點的 kube-controller-manager 服務,觀察其它節點的日誌,看是否獲取了 leader 權限。

查看當前的 leader

kubectl get endpoints kube-controller-manager --namespace=kube-system  -o yaml
apiVersion: v1
kind: Endpoints
metadata:
  annotations:
    control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"test2_084534e2-6cc4-11e8-a418-5254001f5b65","leaseDurationSeconds":15,"acquireTime":"2018-06-10T15:40:33Z","renewTime":"2018-06-10T16:19:08Z","leaderTransitions":12}'
  creationTimestamp: 2018-06-10T13:59:42Z
  name: kube-controller-manager
  namespace: kube-system
  resourceVersion: "4540"
  selfLink: /api/v1/namespaces/kube-system/endpoints/kube-controller-manager
  uid: 862cc048-6cb6-11e8-96fa-525400ba84c6

可見,當前的 leader 爲 test2 節點。






06-3.部署高可用 kube-scheduler 集羣

本文檔介紹部署高可用 kube-scheduler 集羣的步驟。

該集羣包含 3 個節點,啓動後將經過競爭選舉機制產生一個 leader 節點,其它節點爲阻塞狀態。當 leader 節點不可用後,剩餘節點將再次進行選舉產生新的 leader 節點,從而保證服務的可用性。

爲保證通訊安全,本文檔先生成 x509 證書和私鑰,kube-scheduler 在以下兩種狀況下使用該證書:

與 kube-apiserver 的安全端口通訊;

在安全端口(https,10251) 輸出 prometheus 格式的 metrics;



準備工做

下載最新版本的二進制文件、安裝和配置 flanneld 參考:06-0.部署master節點.md

建立 kube-scheduler 證書和私鑰

建立證書籤名請求:

cat > kube-scheduler-csr.json <<EOF
{
    "CN": "system:kube-scheduler",
    "hosts": [
      "127.0.0.1",
      "192.168.0.91",
      "192.168.0.92",
      "192.168.0.93"
    ],
    "key": {
        "algo": "rsa",
        "size": 2048
    },
    "names": [
      {
        "C": "CN",
        "ST": "BeiJing",
        "L": "BeiJing",
        "O": "system:kube-scheduler",
        "OU": "4Paradigm"
      }
    ]
}
EOF

hosts 列表包含全部 kube-scheduler 節點 IP;

CN 爲 system:kube-scheduler、O 爲 system:kube-scheduler,kubernetes 內置的 ClusterRoleBindings system:kube-scheduler 將賦予 kube-scheduler 工做所需的權限。


生成證書和私鑰:

cfssl gencert -ca=/etc/kubernetes/cert/ca.pem \
  -ca-key=/etc/kubernetes/cert/ca-key.pem \
  -config=/etc/kubernetes/cert/ca-config.json \
  -profile=kubernetes kube-scheduler-csr.json | cfssljson -bare kube-scheduler


建立和分發 kubeconfig 文件

kubeconfig 文件包含訪問 apiserver 的全部信息,如 apiserver 地址、CA 證書和自身使用的證書;

source /opt/k8s/bin/environment.sh
kubectl config set-cluster kubernetes \
  --certificate-authority=/etc/kubernetes/cert/ca.pem \
  --embed-certs=true \
  --server=${KUBE_APISERVER} \
  --kubeconfig=kube-scheduler.kubeconfig

kubectl config set-credentials system:kube-scheduler \
  --client-certificate=kube-scheduler.pem \
  --client-key=kube-scheduler-key.pem \
  --embed-certs=true \
  --kubeconfig=kube-scheduler.kubeconfig

kubectl config set-context system:kube-scheduler \
  --cluster=kubernetes \
  --user=system:kube-scheduler \
  --kubeconfig=kube-scheduler.kubeconfig

kubectl config use-context system:kube-scheduler --kubeconfig=kube-scheduler.kubeconfig

上一步建立的證書、私鑰以及 kube-apiserver 地址被寫入到 kubeconfig 文件中;



分發 kubeconfig 到全部 master 節點:

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    scp kube-scheduler.kubeconfig k8s@${node_ip}:/etc/kubernetes/
  done



建立和分發 kube-scheduler systemd unit 文件

cat > kube-scheduler.service <<EOF
[Unit]
Description=Kubernetes Scheduler
Documentation=https://github.com/GoogleCloudPlatform/kubernetes

[Service]
ExecStart=/opt/k8s/bin/kube-scheduler \\
  --address=127.0.0.1 \\
  --kubeconfig=/etc/kubernetes/kube-scheduler.kubeconfig \\
  --leader-elect=true \\
  --alsologtostderr=true \\
  --logtostderr=false \\
  --log-dir=/var/log/kubernetes \\
  --v=2
Restart=on-failure
RestartSec=5
User=k8s

[Install]
WantedBy=multi-user.target
EOF

--address:在 127.0.0.1:10251 端口接收 http /metrics 請求;kube-scheduler 目前還不支持接收 https 請求;

--kubeconfig:指定 kubeconfig 文件路徑,kube-scheduler 使用它鏈接和驗證 kube-apiserver;

--leader-elect=true:集羣運行模式,啓用選舉功能;被選爲 leader 的節點負責處理工做,其它節點爲阻塞狀態;

User=k8s:使用 k8s 帳戶運行;

完整 unit 見 kube-scheduler.service。




分發 systemd unit 文件到全部 master 節點:

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    scp kube-scheduler.service root@${node_ip}:/etc/systemd/system/
  done



啓動 kube-scheduler 服務

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    ssh root@${node_ip} "mkdir -p /var/log/kubernetes && chown -R k8s /var/log/kubernetes"
    ssh root@${node_ip} "systemctl daemon-reload && systemctl enable kube-scheduler && systemctl restart kube-scheduler"
  done

必須先建立日誌目錄;




檢查服務運行狀態

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    ssh k8s@${node_ip} "systemctl status kube-scheduler|grep Active"
  done



確保狀態爲 active (running),不然查看日誌,確認緣由:

journalctl -u kube-scheduler



查看輸出的 metric

注意:如下命令在 kube-scheduler 節點上執行。

kube-scheduler 監聽 10251 端口,接收 http 請求:

sudo netstat -lnpt|grep kube-sche
tcp        0      0 127.0.0.1:10251         0.0.0.0:*               LISTEN      23783/kube-schedule



curl -s http://127.0.0.1:10251/metrics |head

# HELP apiserver_audit_event_total Counter of audit events generated and sent to the audit backend.
# TYPE apiserver_audit_event_total counter
apiserver_audit_event_total 0
# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 9.7715e-05
go_gc_duration_seconds{quantile="0.25"} 0.000107676
go_gc_duration_seconds{quantile="0.5"} 0.00017868
go_gc_duration_seconds{quantile="0.75"} 0.000262444
go_gc_duration_seconds{quantile="1"} 0.001205223



測試 kube-scheduler 集羣的高可用

隨便找一個或兩個 master 節點,停掉 kube-scheduler 服務,看其它節點是否獲取了 leader 權限(systemd 日誌)。

查看當前的 leader
kubectl get endpoints kube-scheduler --namespace=kube-system  -o yaml
apiVersion: v1
kind: Endpoints
metadata:
  annotations:
    control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"test3_61f34593-6cc8-11e8-8af7-5254002f288e","leaseDurationSeconds":15,"acquireTime":"2018-06-10T16:09:56Z","renewTime":"2018-06-10T16:20:54Z","leaderTransitions":1}'
  creationTimestamp: 2018-06-10T16:07:33Z
  name: kube-scheduler
  namespace: kube-system
  resourceVersion: "4645"
  selfLink: /api/v1/namespaces/kube-system/endpoints/kube-scheduler
  uid: 62382d98-6cc8-11e8-96fa-525400ba84c6

可見,當前的 leader 爲 test3 節點。






07-0.部署 worker 節點

kubernetes work 節點運行以下組件:

docker

kubelet

kube-proxy

安裝和配置 flanneld

參考 05-部署flannel網絡.md


安裝依賴包

CentOS:

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    ssh root@${node_ip} "yum install -y epel-release"
    ssh root@${node_ip} "yum install -y conntrack ipvsadm ipset jq iptables curl sysstat libseccomp && /usr/sbin/modprobe ip_vs "
  done


Ubuntu:

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    ssh root@${node_ip} "apt-get install -y conntrack ipvsadm ipset jq iptables curl sysstat libseccomp && /usr/sbin/modprobe ip_vs "
  done






07-1.部署 docker 組件

docker 是容器的運行環境,管理它的生命週期。kubelet 經過 Container Runtime Interface (CRI) 與 docker 進行交互。

安裝依賴包

參考 07-0.部署worker節點.md

下載和分發 docker 二進制文件

到 https://download.docker.com/linux/static/stable/x86_64/ 頁面下載最新發布包:

wget https://download.docker.com/linux/static/stable/x86_64/docker-18.03.1-ce.tgz

tar -xvf docker-18.03.1-ce.tgz




分發二進制文件到全部 worker 節點:

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    scp docker/docker*  k8s@${node_ip}:/opt/k8s/bin/
    ssh k8s@${node_ip} "chmod +x /opt/k8s/bin/*"
  done



建立和分發 systemd unit 文件

cat > docker.service <<"EOF"
[Unit]
Description=Docker Application Container Engine
Documentation=http://docs.docker.io

[Service]
Environment="PATH=/opt/k8s/bin:/bin:/sbin:/usr/bin:/usr/sbin"
EnvironmentFile=-/run/flannel/docker
ExecStart=/opt/k8s/bin/dockerd --log-level=error $DOCKER_NETWORK_OPTIONS
ExecReload=/bin/kill -s HUP $MAINPID
Restart=on-failure
RestartSec=5
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity
Delegate=yes
KillMode=process

[Install]
WantedBy=multi-user.target
EOF


EOF 先後有雙引號,這樣 bash 不會替換文檔中的變量,如 $DOCKER_NETWORK_OPTIONS;


dockerd 運行時會調用其它 docker 命令,如 docker-proxy,因此須要將 docker 命令所在的目錄加到 PATH 環境變量中;


flanneld 啓動時將網絡配置寫入 /run/flannel/docker 文件中,dockerd 啓動前讀取該文件中的環境變量 DOCKER_NETWORK_OPTIONS ,而後設置 docker0 網橋網段;


若是指定了多個 EnvironmentFile 選項,則必須將 /run/flannel/docker 放在最後(確保 docker0 使用 flanneld 生成的 bip 參數);

docker 須要以 root 用於運行;




docker 從 1.13 版本開始,可能將 iptables FORWARD chain的默認策略設置爲DROP,從而致使 ping 其它 Node 上的 Pod IP 失敗,遇到這種狀況時,須要手動設置策略爲 ACCEPT:

sudo iptables -P FORWARD ACCEPT



而且把如下命令寫入 /etc/rc.local 文件中,防止節點重啓iptables FORWARD chain的默認策略又還原爲DROP

echo /sbin/iptables -P FORWARD ACCEPT >/etc/profile

source /etc/profile



分發 systemd unit 文件到全部 worker 節點:

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    scp docker.service root@${node_ip}:/etc/systemd/system/docker.service
  done



配置和分發 docker 配置文件

配置docker鏡像加速 (須要重啓 dockerd 生效):

cat > docker-daemon.json <<EOF
{
    "registry-mirrors": ["https://hub-mirror.c.163.com", "https://docker.mirrors.ustc.edu.cn"],
    "max-concurrent-downloads": 20
}
EOF



分發 docker 配置文件到全部 work 節點:

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    ssh root@${node_ip} "mkdir -p  /etc/docker/"
    scp docker-daemon.json root@${node_ip}:/etc/docker/daemon.json
  done



啓動 docker 服務

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    ssh root@${node_ip} "systemctl stop firewalld && systemctl disable firewalld"
    ssh root@${node_ip} "/usr/sbin/iptables -F && /usr/sbin/iptables -X && /usr/sbin/iptables -F -t nat && /usr/sbin/iptables -X -t nat"
    ssh root@${node_ip} "/usr/sbin/iptables -P FORWARD ACCEPT"
    ssh root@${node_ip} "systemctl daemon-reload && systemctl enable docker && systemctl restart docker"
    ssh root@${node_ip} 'for intf in /sys/devices/virtual/net/docker0/brif/*; do echo 1 > $intf/hairpin_mode; done'
    ssh root@${node_ip} "sudo sysctl -p /etc/sysctl.d/kubernetes.conf"
  done

關閉 firewalld(centos7)/ufw(ubuntu16.04),不然可能會重複建立 iptables 規則;

清理舊的 iptables rules 和 chains 規則;

開啓 docker0 網橋下虛擬網卡的 hairpin 模式;




檢查服務運行狀態

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    ssh k8s@${node_ip} "systemctl status docker|grep Active"
  done


確保狀態爲 active (running),不然查看日誌,確認緣由:

journalctl -u docker



檢查 docker0 網橋

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    ssh k8s@${node_ip} "/usr/sbin/ip addr show flannel.1 && /usr/sbin/ip addr show docker0"
  done


確認各 work 節點的 docker0 網橋和 flannel.1 接口的 IP 處於同一個網段中(以下 172.30.39.0 和 172.30.39.1):

3: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
    link/ether ce:2f:d6:53:e5:f3 brd ff:ff:ff:ff:ff:ff
    inet 172.30.39.0/32 scope global flannel.1
      valid_lft forever preferred_lft forever
    inet6 fe80::cc2f:d6ff:fe53:e5f3/64 scope link
      valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
    link/ether 02:42:bf:65:16:5c brd ff:ff:ff:ff:ff:ff
    inet 172.30.39.1/24 brd 172.30.39.255 scope global docker0
      valid_lft forever preferred_lft forever





07-2.部署 kubelet 組件

kublet 運行在每一個 worker 節點上,接收 kube-apiserver 發送的請求,管理 Pod 容器,執行交互式命令,如 exec、run、logs 等。

kublet 啓動時自動向 kube-apiserver 註冊節點信息,內置的 cadvisor 統計和監控節點的資源使用狀況。

爲確保安全,本文檔只開啓接收 https 請求的安全端口,對請求進行認證和受權,拒絕未受權的訪問(如 apiserver、heapster)。



下載和分發 kubelet 二進制文件

參考 06-0.部署master節點.md


安裝依賴包

參考 07-0.部署worker節點.md


建立 kubelet bootstrap kubeconfig 文件

source /opt/k8s/bin/environment.sh
for node_name in ${NODE_NAMES[@]}
  do
    echo ">>> ${node_name}"

    # 建立 token
    export BOOTSTRAP_TOKEN=$(kubeadm token create \
      --description kubelet-bootstrap-token \
      --groups system:bootstrappers:${node_name} \
      --kubeconfig ~/.kube/config)

    # 設置集羣參數
    kubectl config set-cluster kubernetes \
      --certificate-authority=/etc/kubernetes/cert/ca.pem \
      --embed-certs=true \
      --server=${KUBE_APISERVER} \
      --kubeconfig=kubelet-bootstrap-${node_name}.kubeconfig

    # 設置客戶端認證參數
    kubectl config set-credentials kubelet-bootstrap \
      --token=${BOOTSTRAP_TOKEN} \
      --kubeconfig=kubelet-bootstrap-${node_name}.kubeconfig

    # 設置上下文參數
    kubectl config set-context default \
      --cluster=kubernetes \
      --user=kubelet-bootstrap \
      --kubeconfig=kubelet-bootstrap-${node_name}.kubeconfig

    # 設置默認上下文
    kubectl config use-context default --kubeconfig=kubelet-bootstrap-${node_name}.kubeconfig
  done

證書中寫入 Token 而非證書,證書後續由 controller-manager 建立。


查看 kubeadm 爲各節點建立的 token:

kubeadm token list --kubeconfig ~/.kube/config
TOKEN                     TTL       EXPIRES                     USAGES                   DESCRIPTION               EXTRA GROUPS
k0s2bj.7nvw1zi1nalyz4gz   23h       2018-06-14T15:14:31+08:00   authentication,signing   kubelet-bootstrap-token   system:bootstrappers:test1
mkus5s.vilnjk3kutei600l   23h       2018-06-14T15:14:32+08:00   authentication,signing   kubelet-bootstrap-token   system:bootstrappers:test3
zkiem5.0m4xhw0jc8r466nk   23h       2018-06-14T15:14:32+08:00   authentication,signing   kubelet-bootstrap-token   system:bootstrappers:test2


建立的 token 有效期爲 1 天,超期後將不能再被使用,且會被 kube-controller-manager 的 tokencleaner 清理(若是啓用該 controller 的話);

kube-apiserver 接收 kubelet 的 bootstrap token 後,將請求的 user 設置爲 system:bootstrap:,group 設置爲 system:bootstrappers;


各 token 關聯的 Secret:

kubectl get secrets  -n kube-system
NAME                     TYPE                                  DATA      AGE
bootstrap-token-k0s2bj   bootstrap.kubernetes.io/token         7         1m
bootstrap-token-mkus5s   bootstrap.kubernetes.io/token         7         1m
bootstrap-token-zkiem5   bootstrap.kubernetes.io/token         7         1m
default-token-99st7      kubernetes.io/service-account-token   3         2d



分發 bootstrap kubeconfig 文件到全部 worker 節點

source /opt/k8s/bin/environment.sh

for node_name in ${NODE_NAMES[@]}
  do
    echo ">>> ${node_name}"
    scp kubelet-bootstrap-${node_name}.kubeconfig k8s@${node_name}:/etc/kubernetes/kubelet-bootstrap.kubeconfig
  done



建立和分發 kubelet 參數配置文件

從 v1.10 開始,kubelet 部分參數需在配置文件中配置,kubelet --help 會提示:

DEPRECATED: This parameter should be set via the config file specified by the Kubelet's --config flag



建立 kubelet 參數配置模板文件:

source /opt/k8s/bin/environment.sh
cat > kubelet.config.json.template <<EOF
{
  "kind": "KubeletConfiguration",
  "apiVersion": "kubelet.config.k8s.io/v1beta1",
  "authentication": {
    "x509": {
      "clientCAFile": "/etc/kubernetes/cert/ca.pem"
    },
    "webhook": {
      "enabled": true,
      "cacheTTL": "2m0s"
    },
    "anonymous": {
      "enabled": false
    }
  },
  "authorization": {
    "mode": "Webhook",
    "webhook": {
      "cacheAuthorizedTTL": "5m0s",
      "cacheUnauthorizedTTL": "30s"
    }
  },
  "address": "##NODE_IP##",
  "port": 10250,
  "readOnlyPort": 0,
  "cgroupDriver": "cgroupfs",
  "hairpinMode": "promiscuous-bridge",
  "serializeImagePulls": false,
  "featureGates": {
    "RotateKubeletClientCertificate": true,
    "RotateKubeletServerCertificate": true
  },
  "clusterDomain": "${CLUSTER_DNS_DOMAIN}",
  "clusterDNS": ["${CLUSTER_DNS_SVC_IP}"]
}
EOF

address:API 監聽地址,不能爲 127.0.0.1,不然 kube-apiserver、heapster 等不能調用 kubelet 的 API;

readOnlyPort=0:關閉只讀端口(默認 10255),等效爲未指定;

authentication.anonymous.enabled:設置爲 false,不容許匿名?訪問 10250 端口;

authentication.x509.clientCAFile:指定簽名客戶端證書的 CA 證書,開啓 HTTP 證書認證;

authentication.webhook.enabled=true:開啓 HTTPs bearer token 認證;

對於未經過 x509 證書和 webhook 認證的請求(kube-apiserver 或其餘客戶端),將被拒絕,提示 Unauthorized;

authroization.mode=Webhook:kubelet 使用 SubjectAccessReview API 查詢 kube-apiserver 某 user、group 是否具備操做資源的權限(RBAC);

featureGates.RotateKubeletClientCertificate、featureGates.RotateKubeletServerCertificate:自動 rotate 證書,證書的有效期取決於 kube-controller-manager 的 --experimental-cluster-signing-duration 參數;

須要 root 帳戶運行;



爲各節點建立和分發 kubelet 配置文件:

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do 
    echo ">>> ${node_ip}"
    sed -e "s/##NODE_IP##/${node_ip}/" kubelet.config.json.template > kubelet.config-${node_ip}.json
    scp kubelet.config-${node_ip}.json root@${node_ip}:/etc/kubernetes/kubelet.config.json
  done

替換後的 kubelet.config.json 文件: kubelet.config.json

建立和分發 kubelet systemd unit 文件



建立 kubelet systemd unit 文件模板:

cat > kubelet.service.template <<EOF
[Unit]
Description=Kubernetes Kubelet
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=docker.service
Requires=docker.service

[Service]
WorkingDirectory=/var/lib/kubelet
ExecStart=/opt/k8s/bin/kubelet \\
  --bootstrap-kubeconfig=/etc/kubernetes/kubelet-bootstrap.kubeconfig \\
  --cert-dir=/etc/kubernetes/cert \\
  --network-plugin=cni \\
  --cni-bin-dir=/opt/cni/bin \\
  --cni-conf-dir=/etc/cni/net.d \\
  --kubeconfig=/etc/kubernetes/kubelet.kubeconfig \\
  --config=/etc/kubernetes/kubelet.config.json \\
  --hostname-override=##NODE_NAME## \\
  --pod-infra-container-image=registry.access.redhat.com/rhel7/pod-infrastructure:latest \\
  --allow-privileged=true \\
  --alsologtostderr=true \\
  --logtostderr=false \\
  --log-dir=/var/log/kubernetes/ \\
  --v=2
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target
EOF

若是設置了 --hostname-override 選項,則 kube-proxy 也須要設置該選項,不然會出現找不到 Node 的狀況;

--bootstrap-kubeconfig:指向 bootstrap kubeconfig 文件,kubelet 使用該文件中的用戶名和 token 向 kube-apiserver 發送 TLS Bootstrapping 請求;

K8S approve kubelet 的 csr 請求後,在 --cert-dir 目錄建立證書和私鑰文件,而後寫入 --kubeconfig 文件;

替換後的 unit 文件:kubelet.service



爲各節點建立和分發 kubelet systemd unit 文件:

source /opt/k8s/bin/environment.sh
for node_name in ${NODE_NAMES[@]}
  do 
    echo ">>> ${node_name}"
    sed -e "s/##NODE_NAME##/${node_name}/" kubelet.service.template > kubelet-${node_name}.service
    scp kubelet-${node_name}.service root@${node_name}:/etc/systemd/system/kubelet.service
  done



Bootstrap Token Auth 和授予權限

kublet 啓動時查找配置的 --kubeletconfig 文件是否存在,若是不存在則使用 --bootstrap-kubeconfig 向 kube-apiserver 發送證書籤名請求 (CSR)。

kube-apiserver 收到 CSR 請求後,對其中的 Token 進行認證(事先使用 kubeadm 建立的 token),認證經過後將請求的 user 設置爲 system:bootstrap:,group 設置爲 system:bootstrappers,這一過程稱爲 Bootstrap Token Auth。

默認狀況下,這個 user 和 group 沒有建立 CSR 的權限,kubelet 啓動失敗,錯誤日誌以下:

sudo journalctl -u kubelet -a |grep -A 2 'certificatesigningrequests'
May 06 06:42:36 test1 kubelet[26986]: F0506 06:42:36.314378   26986 server.go:233] failed to run Kubelet: cannot create certificate signing request: certificatesigningrequests.certificates.k8s.io is forbidden: User "system:bootstrap:lemy40" cannot create certificatesigningrequests.certificates.k8s.io at the cluster scope
May 06 06:42:36 test1 systemd[1]: kubelet.service: Main process exited, code=exited, status=255/n/a
May 06 06:42:36 test1 systemd[1]: kubelet.service: Failed with result 'exit-code'.



解決辦法是:建立一個 clusterrolebinding,將 group system:bootstrappers 和 clusterrole system:node-bootstrapper 綁定:

kubectl create clusterrolebinding kubelet-bootstrap --clusterrole=system:node-bootstrapper --group=system:bootstrappers


給kubelet受權高級權限:

不然沒法經過kubectl exec 進入一個pod

cat > apiserver-to-kubelet.yaml <<EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  annotations:
    rbac.authorization.kubernetes.io/autoupdate: "true"
  labels:
    kubernetes.io/bootstrapping: rbac-defaults
  name: system:kubernetes-to-kubelet
rules:
  - apiGroups:
      - ""
    resources:
      - nodes/proxy
      - nodes/stats
      - nodes/log
      - nodes/spec
      - nodes/metrics
    verbs:
      - "*"
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: system:kubernetes
  namespace: ""
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:kubernetes-to-kubelet
subjects:
  - apiGroup: rbac.authorization.k8s.io
    kind: User
    name: kubernetes
EOF


建立受權:

kubectl create -f apiserver-to-kubelet.yaml 

[root@test4 ~]# kubectl create -f apiserver-to-kubelet.yaml 
clusterrole.rbac.authorization.k8s.io/system:kubernetes-to-kubelet created
clusterrolebinding.rbac.authorization.k8s.io/system:kubernetes created

從新進到容器查看資源

[root@test4 ~]# kubectl exec -it http-test-dm2-6dbd76c7dd-cv9qf sh
/ # exit

如今能夠進到容器裏面查看資源了

這是以前實驗報錯後解決的結果貼到這裏了,如今尚未pod,因此沒法操做這一步



啓動 kubelet 服務
source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    ssh root@${node_ip} "mkdir -p /var/lib/kubelet"
    ssh root@${node_ip} "/usr/sbin/swapoff -a"
    ssh root@${node_ip} "mkdir -p /var/log/kubernetes && chown -R k8s /var/log/kubernetes"
    ssh root@${node_ip} "systemctl daemon-reload && systemctl enable kubelet && systemctl restart kubelet"
  done

關閉 swap 分區,不然 kubelet 會啓動失敗;

必須先建立工做和日誌目錄;



查看日誌

journalctl -u kubelet |tail
Jun 13 16:05:40 test2 kubelet[22343]: I0613 16:05:40.388242   22343 feature_gate.go:226] feature gates: &{{} map[RotateKubeletServerCertificate:true RotateKubeletClientCertificate:true]}
Jun 13 16:05:40 test2 kubelet[22343]: I0613 16:05:40.394342   22343 mount_linux.go:211] Detected OS with systemd
Jun 13 16:05:40 test2 kubelet[22343]: W0613 16:05:40.394494   22343 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d
Jun 13 16:05:40 test2 kubelet[22343]: I0613 16:05:40.399508   22343 server.go:376] Version: v1.10.4
Jun 13 16:05:40 test2 kubelet[22343]: I0613 16:05:40.399583   22343 feature_gate.go:226] feature gates: &{{} map[RotateKubeletServerCertificate:true RotateKubeletClientCertificate:true]}
Jun 13 16:05:40 test2 kubelet[22343]: I0613 16:05:40.399736   22343 plugins.go:89] No cloud provider specified.
Jun 13 16:05:40 test2 kubelet[22343]: I0613 16:05:40.399752   22343 server.go:492] No cloud provider specified: "" from the config file: ""
Jun 13 16:05:40 test2 kubelet[22343]: I0613 16:05:40.399777   22343 bootstrap.go:58] Using bootstrap kubeconfig to generate TLS client cert, key and kubeconfig file
Jun 13 16:05:40 test2 kubelet[22343]: I0613 16:05:40.446068   22343 csr.go:105] csr for this node already exists, reusing
Jun 13 16:05:40 test2 kubelet[22343]: I0613 16:05:40.453761   22343 csr.go:113] csr for this node is still valid

kubelet 啓動後使用 --bootstrap-kubeconfig 向 kube-apiserver 發送 CSR 請求,當這個 CSR 被 approve 後,kube-controller-manager 爲 kubelet 建立 TLS 客戶端證書、私鑰和 --kubeletconfig 文件。

注意:kube-controller-manager 須要配置 --cluster-signing-cert-file 和 --cluster-signing-key-file 參數,纔會爲 TLS Bootstrap 建立證書和私鑰。


查看csr

kubectl get csr
NAME                                                   AGE       REQUESTOR                 CONDITION
node-csr-QzuuQiuUfcSdp3j5W4B2UOuvQ_n9aTNHAlrLzVFiqrk   43s       system:bootstrap:zkiem5   Pending
node-csr-oVbPmU-ikVknpynwu0Ckz_MvkAO_F1j0hmbcDa__sGA   27s       system:bootstrap:mkus5s   Pending
node-csr-u0E1-ugxgotO_9FiGXo8DkD6a7-ew8sX2qPE6KPS2IY   13m       system:bootstrap:k0s2bj   Pending

kubectl get nodes
No resources found.

三個 work 節點的 csr 均處於 pending 狀態;



approve kubelet CSR 請求

能夠手動或自動 approve CSR 請求。推薦使用自動的方式,由於從 v1.8 版本開始,能夠自動輪轉approve csr 後生成的證書。


方式、手動 approve CSR 請求

查看 CSR 列表:

kubectl get csr
NAME                                                   AGE       REQUESTOR                 CONDITION
node-csr-QzuuQiuUfcSdp3j5W4B2UOuvQ_n9aTNHAlrLzVFiqrk   43s       system:bootstrap:zkiem5   Pending
node-csr-oVbPmU-ikVknpynwu0Ckz_MvkAO_F1j0hmbcDa__sGA   27s       system:bootstrap:mkus5s   Pending
node-csr-u0E1-ugxgotO_9FiGXo8DkD6a7-ew8sX2qPE6KPS2IY   13m       system:bootstrap:k0s2bj   Pending
approve CSR:

kubectl certificate approve node-csr-QzuuQiuUfcSdp3j5W4B2UOuvQ_n9aTNHAlrLzVFiqrk
certificatesigningrequest.certificates.k8s.io "node-csr-QzuuQiuUfcSdp3j5W4B2UOuvQ_n9aTNHAlrLzVFiqrk" approved


查看 Approve 結果:

kubectl describe  csr node-csr-QzuuQiuUfcSdp3j5W4B2UOuvQ_n9aTNHAlrLzVFiqrk
Name:               node-csr-QzuuQiuUfcSdp3j5W4B2UOuvQ_n9aTNHAlrLzVFiqrk
Labels:             <none>
Annotations:        <none>
CreationTimestamp:  Wed, 13 Jun 2018 16:05:04 +0800
Requesting User:    system:bootstrap:zkiem5
Status:             Approved
Subject:
         Common Name:    system:node:test2
         Serial Number:
         Organization:   system:nodes
Events:  <none>

Requesting User:請求 CSR 的用戶,kube-apiserver 對它進行認證和受權;

Subject:請求籤名的證書信息;

證書的 CN 是 system:node:test2, Organization 是 system:nodes,kube-apiserver 的 Node 受權模式會授予該證書的相關權限;




方式2、自動 approve CSR 請求

建立三個 ClusterRoleBinding,分別用於自動 approve client、renew client、renew server 證書:

cat > csr-crb.yaml <<EOF
 # Approve all CSRs for the group "system:bootstrappers"
 kind: ClusterRoleBinding
 apiVersion: rbac.authorization.k8s.io/v1
 metadata:
   name: auto-approve-csrs-for-group
 subjects:
 - kind: Group
   name: system:bootstrappers
   apiGroup: rbac.authorization.k8s.io
 roleRef:
   kind: ClusterRole
   name: system:certificates.k8s.io:certificatesigningrequests:nodeclient
   apiGroup: rbac.authorization.k8s.io
---
 # To let a node of the group "system:nodes" renew its own credentials
 kind: ClusterRoleBinding
 apiVersion: rbac.authorization.k8s.io/v1
 metadata:
   name: node-client-cert-renewal
 subjects:
 - kind: Group
   name: system:nodes
   apiGroup: rbac.authorization.k8s.io
 roleRef:
   kind: ClusterRole
   name: system:certificates.k8s.io:certificatesigningrequests:selfnodeclient
   apiGroup: rbac.authorization.k8s.io
---
# A ClusterRole which instructs the CSR approver to approve a node requesting a
# serving cert matching its client cert.
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: approve-node-server-renewal-csr
rules:
- apiGroups: ["certificates.k8s.io"]
  resources: ["certificatesigningrequests/selfnodeserver"]
  verbs: ["create"]
---
 # To let a node of the group "system:nodes" renew its own server credentials
 kind: ClusterRoleBinding
 apiVersion: rbac.authorization.k8s.io/v1
 metadata:
   name: node-server-cert-renewal
 subjects:
 - kind: Group
   name: system:nodes
   apiGroup: rbac.authorization.k8s.io
 roleRef:
   kind: ClusterRole
   name: approve-node-server-renewal-csr
   apiGroup: rbac.authorization.k8s.io
EOF

auto-approve-csrs-for-group:自動 approve node 的第一次 CSR; 注意第一次 CSR 時,請求的 Group 爲 system:bootstrappers;

node-client-cert-renewal:自動 approve node 後續過時的 client 證書,自動生成的證書 Group 爲 system:nodes;

node-server-cert-renewal:自動 approve node 後續過時的 server 證書,自動生成的證書 Group 爲 system:nodes;


生效配置:

kubectl apply -f csr-crb.yaml

查看 kublet 的狀況

等待一段時間(1-10 分鐘),三個節點的 CSR 都被自動 approve:

kubectl get csr
NAME                                                   AGE       REQUESTOR                 CONDITION
csr-98h25                                              6m        system:node:test2    Approved,Issued
csr-lb5c9                                              7m        system:node:test3    Approved,Issued
csr-m2hn4                                              14m       system:node:test1    Approved,Issued
node-csr-7q7i0q4MF_K2TSEJj16At4CJFLlJkHIqei6nMIAaJCU   28m       system:bootstrap:k0s2bj   Approved,Issued
node-csr-ND77wk2P8k2lHBtgBaObiyYw0uz1Um7g2pRvveMF-c4   35m       system:bootstrap:mkus5s   Approved,Issued
node-csr-Nysmrw55nnM48NKwEJuiuCGmZoxouK4N8jiEHBtLQso   6m        system:bootstrap:zkiem5   Approved,Issued
node-csr-QzuuQiuUfcSdp3j5W4B2UOuvQ_n9aTNHAlrLzVFiqrk   1h        system:bootstrap:zkiem5   Approved,Issued
node-csr-oVbPmU-ikVknpynwu0Ckz_MvkAO_F1j0hmbcDa__sGA   1h        system:bootstrap:mkus5s   Approved,Issued
node-csr-u0E1-ugxgotO_9FiGXo8DkD6a7-ew8sX2qPE6KPS2IY   1h        system:bootstrap:k0s2bj   Approved,Issued


全部節點均 ready:

kubectl get nodes
NAME         STATUS    ROLES     AGE       VERSION
test1   Ready     <none>    18m       v1.10.4
test2   Ready     <none>    10m       v1.10.4
test3   Ready     <none>    11m       v1.10.4



查看kube-controller-manager 爲各 node 生成了 kubeconfig 文件和公私鑰:

ls -l /etc/kubernetes/kubelet.kubeconfig
-rw------- 1 root root 2293 Jun 13 17:07 /etc/kubernetes/kubelet.kubeconfig

ls -l /etc/kubernetes/cert/|grep kubelet
-rw-r--r-- 1 root root 1046 Jun 13 17:07 kubelet-client.crt
-rw------- 1 root root  227 Jun 13 17:07 kubelet-client.key
-rw------- 1 root root 1334 Jun 13 17:07 kubelet-server-2018-06-13-17-07-45.pem
lrwxrwxrwx 1 root root   58 Jun 13 17:07 kubelet-server-current.pem -> /etc/kubernetes/cert/kubelet-server-2018-06-13-17-07-45.pem

kubelet-server 證書會週期輪轉;



查看kubelet開啓的端口(親測實驗1.13.0版本看不到4194端口,加上後kubelet就沒法啓動)

kubelet 提供的 API 接口

kublet 啓動後監聽多個端口,用於接收 kube-apiserver 或其它組件發送的請求:

sudo netstat -lnpt|grep kubelet
tcp        0      0 192.168.0.92:4194     0.0.0.0:*               LISTEN      2490/kubelet
tcp        0      0 127.0.0.1:10248         0.0.0.0:*               LISTEN      2490/kubelet
tcp        0      0 192.168.0.92:10250    0.0.0.0:*               LISTEN      2490/kubelet

4194: cadvisor http 服務;

10248: healthz http 服務;

10250: https API 服務;注意:未開啓只讀端口 10255;

例如執行 kubectl ec -it nginx-ds-5rmws -- sh 命令時,kube-apiserver 會向 kubelet 發送以下請求:

POST /exec/default/nginx-ds-5rmws/my-nginx?command=sh&input=1&output=1&tty=1

kubelet 接收 10250 端口的 https 請求:

/pods、/runningpods
/metrics、/metrics/cadvisor、/metrics/probes
/spec
/stats、/stats/container
/logs
/run/、"/exec/", "/attach/", "/portForward/", "/containerLogs/" 等管理;

詳情參考:https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/server/server.go#L434:3

因爲關閉了匿名認證,同時開啓了 webhook 受權,全部訪問 10250 端口 https API 的請求都須要被認證和受權。

預約義的 ClusterRole system:kubelet-api-admin 授予訪問 kubelet 全部 API 的權限:

kubectl describe clusterrole system:kubelet-api-admin
Name:         system:kubelet-api-admin
Labels:       kubernetes.io/bootstrapping=rbac-defaults
Annotations:  rbac.authorization.kubernetes.io/autoupdate=true
PolicyRule:
  Resources      Non-Resource URLs  Resource Names  Verbs
  ---------      -----------------  --------------  -----
  nodes          []                 []              [get list watch proxy]
  nodes/log      []                 []              [*]
  nodes/metrics  []                 []              [*]
  nodes/proxy    []                 []              [*]
  nodes/spec     []                 []              [*]
  nodes/stats    []                 []              [*]

kublet api 認證和受權

kublet 配置了以下認證參數:

authentication.anonymous.enabled:設置爲 false,不容許匿名?訪問 10250 端口;

authentication.x509.clientCAFile:指定簽名客戶端證書的 CA 證書,開啓 HTTPs 證書認證;

authentication.webhook.enabled=true:開啓 HTTPs bearer token 認證;

同時配置了以下受權參數:

authroization.mode=Webhook:開啓 RBAC 受權;

kubelet 收到請求後,使用 clientCAFile 對證書籤名進行認證,或者查詢 bearer token 是否有效。若是二者都沒經過,則拒絕請求,提示 Unauthorized:

curl -s --cacert /etc/kubernetes/cert/ca.pem https://192.168.0.92:10250/metrics
Unauthorized

curl -s --cacert /etc/kubernetes/cert/ca.pem -H "Authorization: Bearer 123456" https://192.168.0.92:10250/metrics
Unauthorized

經過認證後,kubelet 使用 SubjectAccessReview API 向 kube-apiserver 發送請求,查詢證書或 token 對應的 user、group 是否有操做資源的權限(RBAC);



證書認證和受權:

權限不足的證書;

curl -s --cacert /etc/kubernetes/cert/ca.pem --cert /etc/kubernetes/cert/kube-controller-manager.pem --key /etc/kubernetes/cert/kube-controller-manager-key.pem https://192.168.0.92:10250/metrics
Forbidden (user=system:kube-controller-manager, verb=get, resource=nodes, subresource=metrics)


使用部署 kubectl 命令行工具時建立的、具備最高權限的 admin 證書;

curl -s --cacert /etc/kubernetes/cert/ca.pem --cert ./admin.pem --key ./admin-key.pem https://192.168.0.92:10250/metrics|head

# HELP apiserver_client_certificate_expiration_seconds Distribution of the remaining lifetime on the certificate used to authenticate a request.
# TYPE apiserver_client_certificate_expiration_seconds histogram
apiserver_client_certificate_expiration_seconds_bucket{le="0"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="21600"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="43200"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="86400"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="172800"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="345600"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="604800"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="2.592e+06"} 0

--cacert、--cert、--key 的參數值必須是文件路徑,如上面的 ./admin.pem 不能省略 ./,不然返回 401 Unauthorized;




bear token 認證和受權:

建立一個 ServiceAccount,將它和 ClusterRole system:kubelet-api-admin 綁定,從而具備調用 kubelet API 的權限:

kubectl create sa kubelet-api-test

kubectl create clusterrolebinding kubelet-api-test --clusterrole=system:kubelet-api-admin --serviceaccount=default:kubelet-api-test

SECRET=$(kubectl get secrets | grep kubelet-api-test | awk '{print $1}')

TOKEN=$(kubectl describe secret ${SECRET} | grep -E '^token' | awk '{print $2}')

echo ${TOKEN}

curl -s --cacert /etc/kubernetes/cert/ca.pem -H "Authorization: Bearer ${TOKEN}" https://192.168.0.92:10250/metrics|head

# HELP apiserver_client_certificate_expiration_seconds Distribution of the remaining lifetime on the certificate used to authenticate a request.
# TYPE apiserver_client_certificate_expiration_seconds histogram
apiserver_client_certificate_expiration_seconds_bucket{le="0"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="21600"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="43200"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="86400"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="172800"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="345600"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="604800"} 0
apiserver_client_certificate_expiration_seconds_bucket{le="2.592e+06"} 0

cadvisor 和 metrics

cadvisor 統計?所在節點各容器的資源(CPU、內存、磁盤、網卡)使用狀況,分別在本身的 http web 頁面(4194 端口)和 10250 以 promehteus metrics 的形式輸出。


瀏覽器訪問 http://192.168.0.91:4194/containers/ 能夠查看到 cadvisor 的監控頁面:


瀏覽器訪問 https://172.27.129.80:10250/metrics 和 https://172.27.129.80:10250/metrics/cadvisor 分別返回 kublet 和 cadvisor 的 metrics。


注意:

kublet.config.json 設置 authentication.anonymous.enabled 爲 false,不容許匿名證書訪問 10250 的 https 服務;

參考A.瀏覽器訪問kube-apiserver安全端口.md,建立和導入相關證書,而後訪問上面的 10250 端口;




獲取 kublet 的配置

從 kube-apiserver 獲取各 node 的配置:

source /opt/k8s/bin/environment.sh

使用部署 kubectl 命令行工具時建立的、具備最高權限的 admin 證書;

curl -sSL --cacert /etc/kubernetes/cert/ca.pem --cert ./admin.pem --key ./admin-key.pem ${KUBE_APISERVER}/api/v1/nodes/test1/proxy/configz | jq \
  '.kubeletconfig|.kind="KubeletConfiguration"|.apiVersion="kubelet.config.k8s.io/v1beta1"'
{
  "syncFrequency": "1m0s",
  "fileCheckFrequency": "20s",
  "httpCheckFrequency": "20s",
  "address": "172.27.129.80",
  "port": 10250,
  "readOnlyPort": 10255,
  "authentication": {
    "x509": {},
    "webhook": {
      "enabled": false,
      "cacheTTL": "2m0s"
    },
    "anonymous": {
      "enabled": true
    }
  },
  "authorization": {
    "mode": "AlwaysAllow",
    "webhook": {
      "cacheAuthorizedTTL": "5m0s",
      "cacheUnauthorizedTTL": "30s"
    }
  },
  "registryPullQPS": 5,
  "registryBurst": 10,
  "eventRecordQPS": 5,
  "eventBurst": 10,
  "enableDebuggingHandlers": true,
  "healthzPort": 10248,
  "healthzBindAddress": "127.0.0.1",
  "oomScoreAdj": -999,
  "clusterDomain": "cluster.local.",
  "clusterDNS": [
    "10.254.0.2"
  ],
  "streamingConnectionIdleTimeout": "4h0m0s",
  "nodeStatusUpdateFrequency": "10s",
  "imageMinimumGCAge": "2m0s",
  "imageGCHighThresholdPercent": 85,
  "imageGCLowThresholdPercent": 80,
  "volumeStatsAggPeriod": "1m0s",
  "cgroupsPerQOS": true,
  "cgroupDriver": "cgroupfs",
  "cpuManagerPolicy": "none",
  "cpuManagerReconcilePeriod": "10s",
  "runtimeRequestTimeout": "2m0s",
  "hairpinMode": "promiscuous-bridge",
  "maxPods": 110,
  "podPidsLimit": -1,
  "resolvConf": "/etc/resolv.conf",
  "cpuCFSQuota": true,
  "maxOpenFiles": 1000000,
  "contentType": "application/vnd.kubernetes.protobuf",
  "kubeAPIQPS": 5,
  "kubeAPIBurst": 10,
  "serializeImagePulls": false,
  "evictionHard": {
    "imagefs.available": "15%",
    "memory.available": "100Mi",
    "nodefs.available": "10%",
    "nodefs.inodesFree": "5%"
  },
  "evictionPressureTransitionPeriod": "5m0s",
  "enableControllerAttachDetach": true,
  "makeIPTablesUtilChains": true,
  "iptablesMasqueradeBit": 14,
  "iptablesDropBit": 15,
  "featureGates": {
    "RotateKubeletClientCertificate": true,
    "RotateKubeletServerCertificate": true
  },
  "failSwapOn": true,
  "containerLogMaxSize": "10Mi",
  "containerLogMaxFiles": 5,
  "enforceNodeAllocatable": [
    "pods"
  ],
  "kind": "KubeletConfiguration",
  "apiVersion": "kubelet.config.k8s.io/v1beta1"
}







07-3.部署 kube-proxy 組件

kube-proxy 運行在全部 worker 節點上,,它監聽 apiserver 中 service 和 Endpoint 的變化狀況,建立路由規則來進行服務負載均衡。

本文檔講解部署 kube-proxy 的部署,使用 ipvs 模式。

下載和分發 kube-proxy 二進制文件

參考 06-0.部署master節點.md

安裝依賴包

各節點須要安裝 ipvsadm 和 ipset 命令,加載 ip_vs 內核模塊。

參考 07-0.部署worker節點.md



建立 kube-proxy 證書

建立證書籤名請求:

cat > kube-proxy-csr.json <<EOF
{
  "CN": "system:kube-proxy",
  "key": {
    "algo": "rsa",
    "size": 2048
  },
  "names": [
    {
      "C": "CN",
      "ST": "BeiJing",
      "L": "BeiJing",
      "O": "k8s",
      "OU": "4Paradigm"
    }
  ]
}
EOF

CN:指定該證書的 User 爲 system:kube-proxy;

預約義的 RoleBinding system:node-proxier 將User system:kube-proxy 與 Role system:node-proxier 綁定,該 Role 授予了調用 kube-apiserver Proxy 相關 API 的權限;

該證書只會被 kube-proxy 當作 client 證書使用,因此 hosts 字段爲空;



生成證書和私鑰:

cfssl gencert -ca=/etc/kubernetes/cert/ca.pem \
  -ca-key=/etc/kubernetes/cert/ca-key.pem \
  -config=/etc/kubernetes/cert/ca-config.json \
  -profile=kubernetes  kube-proxy-csr.json | cfssljson -bare kube-proxy



建立和分發 kubeconfig 文件

source /opt/k8s/bin/environment.sh
kubectl config set-cluster kubernetes \
  --certificate-authority=/etc/kubernetes/cert/ca.pem \
  --embed-certs=true \
  --server=${KUBE_APISERVER} \
  --kubeconfig=kube-proxy.kubeconfig

kubectl config set-credentials kube-proxy \
  --client-certificate=kube-proxy.pem \
  --client-key=kube-proxy-key.pem \
  --embed-certs=true \
  --kubeconfig=kube-proxy.kubeconfig

kubectl config set-context default \
  --cluster=kubernetes \
  --user=kube-proxy \
  --kubeconfig=kube-proxy.kubeconfig

kubectl config use-context default --kubeconfig=kube-proxy.kubeconfig

--embed-certs=true:將 ca.pem 和 admin.pem 證書內容嵌入到生成的 kubectl-proxy.kubeconfig 文件中(不加時,寫入的是證書文件路徑);



分發 kubeconfig 文件:

source /opt/k8s/bin/environment.sh
for node_name in ${NODE_NAMES[@]}
  do
    echo ">>> ${node_name}"
    scp kube-proxy.kubeconfig k8s@${node_name}:/etc/kubernetes/
  done




建立 kube-proxy 配置文件

從 v1.10 開始,kube-proxy 部分參數能夠配置文件中配置。可使用 --write-config-to 選項生成該配置文件,或者參考 kubeproxyconfig 的類型定義源文件 :https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/apis/kubeproxyconfig/types.go

建立 kube-proxy config 文件模板:

cat >kube-proxy.config.yaml.template <<EOF
apiVersion: kubeproxy.config.k8s.io/v1alpha1
bindAddress: ##NODE_IP##
clientConnection:
  kubeconfig: /etc/kubernetes/kube-proxy.kubeconfig
clusterCIDR: ${CLUSTER_CIDR}
healthzBindAddress: ##NODE_IP##:10256
hostnameOverride: ##NODE_NAME##
kind: KubeProxyConfiguration
metricsBindAddress: ##NODE_IP##:10249
mode: "iptables"
EOF

bindAddress: 監聽地址;

clientConnection.kubeconfig: 鏈接 apiserver 的 kubeconfig 文件;

clusterCIDR: kube-proxy 根據 --cluster-cidr 判斷集羣內部和外部流量,指定 --cluster-cidr 或 --masquerade-all 選項後 kube-proxy 纔會對訪問 Service IP 的請求作 SNAT;

hostnameOverride: 參數值必須與 kubelet 的值一致,不然 kube-proxy 啓動後會找不到該 Node,從而不會建立任何 iptables 規則;

mode: 使用 iptables 模式;

爲各節點建立和分發 kube-proxy 配置文件:

source /opt/k8s/bin/environment.sh
for (( i=0; i < 3; i++ ))
  do 
    echo ">>> ${NODE_NAMES[i]}"
    sed -e "s/##NODE_NAME##/${NODE_NAMES[i]}/" -e "s/##NODE_IP##/${NODE_IPS[i]}/" kube-proxy.config.yaml.template > kube-proxy-${NODE_NAMES[i]}.config.yaml
    scp kube-proxy-${NODE_NAMES[i]}.config.yaml root@${NODE_NAMES[i]}:/etc/kubernetes/kube-proxy.config.yaml
  done



替換後的 kube-proxy.config.yaml 文件:kube-proxy.config.yaml

建立和分發 kube-proxy systemd unit 文件

source /opt/k8s/bin/environment.sh
cat > kube-proxy.service <<EOF
[Unit]
Description=Kubernetes Kube-Proxy Server
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=network.target

[Service]
WorkingDirectory=/var/lib/kube-proxy
ExecStart=/opt/k8s/bin/kube-proxy \\
  --config=/etc/kubernetes/kube-proxy.config.yaml \\
  --alsologtostderr=true \\
  --logtostderr=false \\
  --log-dir=/var/log/kubernetes \\
  --v=2
Restart=on-failure
RestartSec=5
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
EOF




替換後的 unit 文件:kube-proxy.service

分發 kube-proxy systemd unit 文件:

source /opt/k8s/bin/environment.sh
for node_name in ${NODE_NAMES[@]}
  do 
    echo ">>> ${node_name}"
    scp kube-proxy.service root@${node_name}:/etc/systemd/system/
  done




啓動 kube-proxy 服務

source /opt/k8s/bin/environment.sh
for node_ip in ${NODE_IPS[@]}
  do
    echo ">>> ${node_ip}"
    ssh root@${node_ip} "mkdir -p /var/lib/kube-proxy"
    ssh root@${node_ip} "mkdir -p /var/log/kubernetes && chown -R k8s /var/log/kubernetes"
    ssh root@${node_ip} "systemctl daemon-reload && systemctl enable kube-proxy && systemctl restart kube-proxy"
  done

必須先建立工做和日誌目錄;參照文檔:https://github.com/opsnull/follow-me-install-kubernetes-cluster
相關文章
相關標籤/搜索