一、 K8S 的由來
K8S 是 kubernetes 的英文縮寫,是用 8 代替 8 個字符 "ubernete" 而成的縮寫。
二、 K8S 單機版實戰
環境:html
-
ubuntu 16.04node
-
gpu 驅動 418.56linux
-
docker 18.06git
-
k8s 1.13.5spring
1、設置環境
首先備份一下源配置:docker
cp /etc/apt/sources.list /etc/apt/sources.list.cp
編輯,替換爲阿里源:json
vim /etc/apt/sources.list deb-src http://archive.ubuntu.com/ubuntu xenial main restricted deb http://mirrors.aliyun.com/ubuntu/ xenial main restricted deb-src http://mirrors.aliyun.com/ubuntu/ xenial main restricted multiverse universe deb http://mirrors.aliyun.com/ubuntu/ xenial-updates main restricted deb-src http://mirrors.aliyun.com/ubuntu/ xenial-updates main restricted multiverse universe deb http://mirrors.aliyun.com/ubuntu/ xenial universe deb http://mirrors.aliyun.com/ubuntu/ xenial-updates universe deb http://mirrors.aliyun.com/ubuntu/ xenial multiverse deb http://mirrors.aliyun.com/ubuntu/ xenial-updates multiverse deb http://mirrors.aliyun.com/ubuntu/ xenial-backports main restricted universe multiverse deb-src http://mirrors.aliyun.com/ubuntu/xenial-backports main restricted universe multiverse deb http://archive.canonical.com/ubuntu xenial partner deb-src http://archive.canonical.com/ubuntu xenial partner deb http://mirrors.aliyun.com/ubuntu/ xenial-security main restricted deb-src http://mirrors.aliyun.com/ubuntu/ xenial-security main restricted multiverse universe deb http://mirrors.aliyun.com/ubuntu/ xenial-security universe deb http://mirrors.aliyun.com/ubuntu/ xenial-security multiverse
更新源:ubuntu
apt-get update
自動修復安裝出現 broken 的 package:vim
apt --fix-broken install
升級,對於 gpu 機器可不執行,不然可能升級 gpu 驅動致使問題:api
apt-get upgrade
關閉防火牆:
ufw disable
安裝 selinux:
apt install selinux-utils
selinux 防火牆配置:
setenforce 0 vim/etc/selinux/conifg SELINUX=disabled
設置網絡:
tee /etc/sysctl.d/k8s.conf <<-'EOF' net.bridge.bridge-nf-call-ip6tables = 1 net.bridge.bridge-nf-call-iptables = 1 net.ipv4.ip_forward = 1 EOF modprobe br_netfilter
查看 ipv4 與 v6 配置是否生效:
sysctl --system
配置 iptables:
iptables -P FORWARD ACCEPT vim /etc/rc.local /usr/sbin/iptables -P FORWARD ACCEPT
永久關閉 swap 分區:
sed -i 's/.*swap.*/#&/' /etc/fstab
2、安裝 docker
執行下面的命令:
apt-get install apt-transport-https ca-certificates curl software-properties-common curl -fsSL https://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | apt-key add - add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" apt-get update apt-get purge docker-ce docker docker-engine docker.io && rm -rf /var/lib/docker apt-get autoremove docker-ce docker docker-engine docker.io apt-get install -y docker-ce=18.06.3~ce~3-0~ubuntu
啓動 docker 並設置開機自重啓:
systemctl enable docker && systemctl start docker
Docker 配置:
vim /etc/docker/daemon.json { "log-driver": "json-file", "log-opts": { "max-size": "100m", "max-file": "10" }, "insecure-registries": ["http://k8s.gcr.io"], "data-root": "", "default-runtime": "nvidia", "runtimes": { "nvidia": { "path": "/usr/bin/nvidia-container-runtime", "runtimeArgs": [] } } }
上面是含 GPU 的配置,不含 GPU 的配置:
{ "registry-mirrors":[ "https://registry.docker-cn.com" ], "storage-driver":"overlay2", "log-driver":"json-file", "log-opts":{ "max-size":"100m" }, "exec-opts":[ "native.cgroupdriver=systemd" ], "insecure-registries":["http://k8s.gcr.io"], "live-restore":true }
重啓服務並設置開機自動重啓:
systemctl daemon-reload && systemctl restart docker && docker info
3、安裝 k8s
拉取鏡像前的設置:
apt-get update && apt-get install -y apt-transport-https curl curl -s https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | apt-key add - tee /etc/apt/sources.list.d/kubernetes.list <<-'EOF' deb https://mirrors.aliyun.com/kubernetes/apt kubernetes-xenial main EOF
更新:
apt-get update apt-get purge kubelet=1.13.5-00 kubeadm=1.13.5-00 kubectl=1.13.5-00 apt-get autoremove kubelet=1.13.5-00 kubeadm=1.13.5-00 kubectl=1.13.5-00 apt-get install -y kubelet=1.13.5-00 kubeadm=1.13.5-00 kubectl=1.13.5-00 apt-mark hold kubelet=1.13.5-00 kubeadm=1.13.5-00 kubectl=1.13.5-00
啓動服務並設置開機自動重啓:
systemctl enable kubelet && sudo systemctl start kubelet
安裝 k8s 相關鏡像,因爲 gcr.io 網絡訪問不了,從 registry.cn-hangzhou.aliyuncs.com 鏡像地址下載:
docker pull registry.cn-hangzhou.aliyuncs.com/gg-gcr-io/kube-apiserver:v1.13.5 docker pull registry.cn-hangzhou.aliyuncs.com/gg-gcr-io/kube-controller-manager:v1.13.5 docker pull registry.cn-hangzhou.aliyuncs.com/gg-gcr-io/kube-scheduler:v1.13.5 docker pull registry.cn-hangzhou.aliyuncs.com/gg-gcr-io/kube-proxy:v1.13.5 docker pull registry.cn-hangzhou.aliyuncs.com/kuberimages/pause:3.1 docker pull registry.cn-hangzhou.aliyuncs.com/kuberimages/etcd:3.2.24 docker pull registry.cn-hangzhou.aliyuncs.com/kuberimages/coredns:1.2.6
打標籤:
docker tag registry.cn-hangzhou.aliyuncs.com/gg-gcr-io/kube-apiserver:v1.13.5 k8s.gcr.io/kube-apiserver:v1.13.5 docker tag registry.cn-hangzhou.aliyuncs.com/gg-gcr-io/kube-controller-manager:v1.13.5 k8s.gcr.io/kube-controller-manager:v1.13.5 docker tag registry.cn-hangzhou.aliyuncs.com/gg-gcr-io/kube-scheduler:v1.13.5 k8s.gcr.io/kube-scheduler:v1.13.5 docker tag registry.cn-hangzhou.aliyuncs.com/gg-gcr-io/kube-proxy:v1.13.5 k8s.gcr.io/kube-proxy:v1.13.5 docker tag registry.cn-hangzhou.aliyuncs.com/kuberimages/pause:3.1 k8s.gcr.io/pause:3.1 docker tag registry.cn-hangzhou.aliyuncs.com/kuberimages/etcd:3.2.24 k8s.gcr.io/etcd:3.2.24 docker tag registry.cn-hangzhou.aliyuncs.com/kuberimages/coredns:1.2.6 k8s.gcr.io/coredns:1.2.6
4、kubeadm 初始化
利用 kubeadm 初始化 k8s,其中主機 IP 根據本身的實際狀況輸入:
kubeadm init --kubernetes-version=v1.13.5 --pod-network-cidr=10.244.0.0/16 --service-cidr=10.16.0.0/16 --apiserver-advertise-address=${masterIp} | tee kubeadm-init.log
此時,若是未知主機 IP,也可利用 yaml 文件動態初始化:
vi /etc/hosts 10.10.5.100 k8s.api.server vi kube-init.yaml apiVersion: kubeadm.k8s.io/v1beta1 kind: ClusterConfiguration kubernetesVersion: v1.13.5 imageRepository: registry.aliyuncs.com/google_containers apiServer: certSANs: - "k8s.api.server" controlPlaneEndpoint: "k8s.api.server:6443" networking: serviceSubnet: "10.1.0.0/16" podSubnet: "10.244.0.0/16"
HA 版本:
apiVersion: kubeadm.k8s.io/v1beta1 kind: ClusterConfiguration kubernetesVersion: v1.13.5 imageRepository: registry.aliyuncs.com/google_containers apiServer: certSANs: - "api.k8s.com" controlPlaneEndpoint: "api.k8s.com:6443" etcd: external: endpoints: - https://ETCD_0_IP:2379 - https://ETCD_1_IP:2379 - https://ETCD_2_IP:2379 networking: serviceSubnet: 10.1.0.0/16 podSubnet: 10.244.0.0/16
注意: apiVersion 中用 kubeadm,由於須要用 kubeadm 來初始化,最後執行下面來初始化:
kubeadm init --config=kube-init.yaml
出現問題,解決後,reset 後再執行,若是須要更多,執行:
kubeadm --help
5、部署出現問題
先刪除 node 節點(集羣版)
kubectl drain <node name> --delete-local-data --force --ignore-daemonsets kubectl delete node <node name>
清空 init 配置在須要刪除的節點上執行(注意,當執行 init 或者 join 後出現任何錯誤,均可以使用此命令返回):
kubeadm reset
6、查問題
初始化後出現問題,能夠經過如下命令先查看其容器狀態以及網絡狀況:
sudo docker ps -a | grep kube | grep -v pause sudo docker logs CONTAINERID sudo docker images && systemctl status -l kubelet netstat -nlpt kubectl describe ep kubernetes kubectl describe svc kubernetes kubectl get svc kubernetes kubectl get ep netstat -nlpt | grep apiser vi /var/log/syslog
7、給當前用戶配置 k8s apiserver 訪問公鑰
sudo mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config
8、網絡插件
kubectl apply -f https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/rbac-kdd.yaml wget https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/1.7/calico.yaml vi calico.yaml - name: CALICO_IPV4POOL_IPIP value:"off" - name: CALICO_IPV4POOL_CIDR value: "10.244.0.0/16 kubectl apply -f calico.yaml
單機下容許 master 節點部署 pod 命令以下:
kubectl taint nodes --all node-role.kubernetes.io/master-
禁止 master 部署 pod:
kubectl taint nodes k8s node-role.kubernetes.io/master=true:NoSchedule
以上單機版部署結束,若是你的項目中,交付的是軟硬件結合的一體機,那麼到此就結束了。記得單機下要容許 master 節點部署
喲!
接下來,集羣版本上線咯!
以上面部署的機器爲例,做爲 master 節點,繼續執行:
scp /etc/kubernetes/admin.conf $nodeUser@$nodeIp:/home/$nodeUser scp /etc/kubernetes/pki/etcd/* $nodeUser@$nodeIp:/home/$nodeUser/etcd kubeadm token generate kubeadm token create $token_name --print-join-command --ttl=0 kubeadm join $masterIP:6443 --token $token_name --discovery-token-ca-cert-hash $hash
Node 機器執行時,若是須要 cuda ,能夠參考如下資料:
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#ubuntu-installation https://blog.csdn.net/u012235003/article/details/54575758 https://blog.csdn.net/qq_39670011/article/details/90404111
正式執行:
vim /etc/modprobe.d/blacklist-nouveau.conf blacklist nouveau options nouveau modeset=0 update-initramfs -u
重啓 ubuntu 查看是否禁用成功:
lsmod | grep nouveau apt-get remove --purge nvidia* https://developer.nvidia.com/cuda-downloads sudo apt-get install freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libgl1-mesa-glx libglu1-mesa libglu1-mesa-dev
安裝 cuda:
accept select "Install" / Enter select "Yes" sh cuda_10.1.168_418.67_linux.run echo 'export PATH=/usr/local/cuda-10.1/bin:$PATH' >> ~/.bashrc echo 'export PATH=/usr/local/cuda-10.1/NsightCompute-2019.3:$PATH' >> ~/.bashrc echo 'export LD_LIBRARY_PATH=/usr/local/cuda-10.1/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc source ~/.bashrc
重啓機器,檢查 cuda 是否安裝成功。
查看是否有「nvidia*」的設備:
cd /dev && ls -al
若是沒有,建立一個 nv.sh:
vi nv.sh #!/bin/bash /sbin/modprobe nvidia if [ "$?" -eq 0 ]; then NVDEVS=`lspci | grep -i NVIDIA ` N3D=` echo "$NVDEVS" | grep "3D controller" | wc -l ` NVGA=` echo "$NVDEVS" | grep "VGA compatible controller" | wc -l ` N=` expr $N3D + $NVGA - 1 ` for i in ` seq 0 $N `; do mknod -m 666 /dev/nvidia$i c 195 $i done mknod -m 666 /dev/nvidiactl c 195 255 else exit 1 fi chmod +x nv.sh && bash nv.sh
再次重啓機器查看 cuda 版本:
nvcc -V
編譯:
cd /usr/local/cuda-10.1/samples && make cd /usr/local/cuda-10.1/samples/bin/x86_64/linux/release ./deviceQuery
以上若是輸出:「Result = PASS」 表明 cuda 安裝成功。
安裝 nvdocker:
vim /etc/docker/daemon.json { "runtimes":{ "nvidia":{ "path":"nvidia-container-runtime", "runtimeArgs":[] } }, "registry-mirrors":["https://registry.docker-cn.com"], "storage-driver":"overlay2", "default-runtime":"nvidia", "log-driver":"json-file", "log-opts":{ "max-size":"100m" }, "exec-opts": ["native.cgroupdriver=systemd"], "insecure-registries": [$harborRgistry], "live-restore": true }
重啓 docker:
sudo systemctl daemon-reload && sudo systemctl restart docker && docker info
檢查 nvidia-docker 安裝是否成功:
docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi
在節點機器進入 su 模式:
su $nodeUser
給當前節點用戶配置 k8s apiserver 訪問公鑰:
mkdir -p $HOME/.kube cp -i admin.conf $HOME/.kube/config chown $(id -u):$(id -g) $HOME/.kube/config mkdir -p $HOME/etcd sudo rm -rf /etc/kubernetes sudo mkdir -p /etc/kubernetes/pki/etcd sudo cp /home/$nodeUser/etcd/* /etc/kubernetes/pki/etcd sudo kubeadm join $masterIP:6443 --token $token_name --discovery-token-ca-cert-hash $hash
如:
sudo kubeadm join 192.168.8.116:6443 --token vyi4ga.foyxqr2iz9i391q3 --discovery-token-ca-cert-hash sha256:929143bcdaa3e23c6faf20bc51ef6a57df02edf9df86cedf200320a9b4d3220a
檢查 node 是否加入 master:
kubectl get node
以上介紹了單機的 k8s 部署,以及 HA 的 master 節點的部署安裝。
結束福利
開源實戰利用 k8s 做微服務的架構設計代碼:
https://gitee.com/damon_one/spring-cloud-k8s
歡迎你們 star,多多指教。
關於做者
筆名:Damon,技術愛好者,長期從事 Java 開發、Spring Cloud 的微服務架構設計,以及結合docker、k8s作微服務容器化,自動化部署等一站式項目部署、落地。Go 語言學習,k8s研究,邊緣計算框架 KubeEdge 等。公衆號 程序猿Damon
發起人。我的微信 MrNull008
,歡迎來撩。
歡迎關注 InfoQ: https://www.infoq.cn/profile/1905020/following/user
歡迎關注