Kubelet從入門到放棄系列:GPU加持

<Kubelet從入門到放棄>系列將對Kubelet組件由基礎知識到源碼進行深刻梳理。在這篇文章<Kubernetes與GPU齊飛>中zouyee會先介紹Nvidia系列GPU如何加持Kubernetes,後續介紹Device Plugin的相關概念以及Kubelet組件Device Manager的源碼。node

1、背景介紹linux

1.1 需求說明git

在Kubernetes 1.8以前,用戶使用GPU等設備時,推薦使用Accelerators Feature Gate的內置方式,延續Kubernetes的插件化的實現理念,各司其職,在Kubernetes 1.10版本後,引入設備插件框架,用戶能夠將系統硬件資源引入到Kubernetes生態。本文將介紹NVIDIA GPU如何安裝部署,Device Plugins的相關介紹、工做機制和源碼分析,包括插件框架、使用和調度GPU、和異常處理及優化等相關內容。
複製代碼

1.2 相關技術github

在Kubernetes 1.10中Device Plugins升爲Beta特性,在Kubernetes 1.8時,爲了給第三方廠商經過插件化的方式將設備資源接入到Kubernetes,給容器提供Extended Resources。

經過Device Plugins方式,用戶無須要改Kubernetes的代碼,由第三方設備廠商開發插件,實現Kubernetes Device Plugins的相關接口便可(仔細想一想,Kubernetes中的volume管理是否也是相似的邏輯?CSI、CNI、CRI?)。

目前Device Plugins典型實現有:

a) AMD GPU插件

b)Intel設備插件:GPU、FPGA和QuickAssist設備

c)KubeVirt用於硬件輔助的虛擬化設備插件

d)Nvidia提供的GPU插件

e)高性能低延遲RDMA卡插件

f)低延遲Solarflare萬兆網卡驅動

g)SR-IOV網絡設備插件

h)Xilinx FPGA設備插件	

Device plugins啓動時,對外暴露幾個gRPC Service提供服務,並經過/var/lib/kubelet/device-plugins/kubelet.sock與Kubelet通訊。
複製代碼

2、部署介紹docker

當前Nvidia GPU提供三種部署方式:docker方式、Containerd方式及Operator方式。

因docker後續再也不內置,相關說明能夠查看<關於Kubernetes廢棄內置docker CRI功能的說明>,下文將主要介紹 Containerd部署,Operator方式後續單獨成文,當前nvidia-container-toolkit已經支持containerd和cri-o兩種部署方式,在接受containerd部署前,先說明前期遇到的相關問題:

1)Error while dialing dial unix:///run/containerd/containerd.sock

其中Kubelet問題描述:

Events:
  Type     Reason         Age                   From               Message
  ----     ------         ----                  ----               -------
  Normal   Scheduled      10m                   default-scheduler  Successfully assigned gpu-operator-resources/nvidia-device-plugin-daemonset-f99md to cl-gpu-md-0-f4gm6
  Warning  InspectFailed  10m (x3 over 10m)     kubelet            Failed to inspect image "nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda10.2": rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: connection refused"

其中Nvidia Device Plugin Daemonset某一個Pod相關錯誤,以下

# kubectl logs ‐f nvidia‐device‐plugin‐daemonset‐q9svq ‐nkube‐system
2021/02/11 01:32:29 Loading NVML
2021/02/11 01:32:29 Failed to initialize NVML: could not load NVML library.
2021/02/11 01:32:29 If this is a GPU node, did you set the docker default runtime to `nvidia`?
2021/02/11 01:32:29 You can check the prerequisites at: https://github.com/NVIDIA/k8s-device-plugin#prerequisites
2021/02/11 01:32:29 You can learn how to set the runtime at: https://github.com/NVIDIA/k8s-device-plugin#quick-start
2021/02/11 01:32:29 If this is not a GPU node, you should set up a toleration or nodeSelector to only deploy this plugin on GPU nodes

該問題因爲containerd的配置文件containerd.toml未將default_runtime_name = "runc"修改成default_runtime_name = "nvidia"

相關問題:https://github.com/NVIDIA/gpu-operator/issues/143

2)devices.allow: no such file or directory: unknown

	相關問題:https://github.com/NVIDIA/libnvidia-container/issues/119

	在kubelet配置的cgroup driver爲systemd時,Nvidia的container prestart hook在處理cgroup路徑邏輯與containerd不一致。

containerd[76114]: time="2020-12-04T08:52:13.029072066Z" level=error msg="StartContainer for "7a1453c6e7ab8af7395ccc8dac5efcffa94a0834aa7b252e1dcd5b51f92bf13e" failed" error="failed to create containerd task: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: mount error: open failed: /sys/fs/cgroup/devices/system.slice/containerd.service/kubepods-pod80540e95304d8cece2ae2afafd8b8976.slice/devices.allow: no such file or directory: unknown"

解決方案爲升級libnvidia-container或者container-toolkit

接下來,介紹部署相關內容。
複製代碼

2.1 Containerdcentos

版本說明markdown

版本說明 軟件名稱
CentOS 操做系統
4.19.25 內核版本
Tesla T4 GPU型號
418.39 driver版本
10.1 CUDA版本
1.18.5 K8S
v0.7.3 Nvidia Device plugin v1.4.3 Containerd
1.0.0-rc1 runc網絡

安裝app

注:下文爲內網離線部署,若各位在聯網環境下,只需參考部署步驟及部署配置便可框架

a. 安裝驅動

$ tar ‐zxvf gpu.tar.gz 
## 安裝依賴 
$ cd gpu/runtime 
$ tar ‐zxvf dependency.tar.gz 
$ cd dependency 
## 查看是否支持CUDA的Nvidia的GPU 
$ cd ./lspci/ 
$ yum localinstall ‐y *.rpm 
$ lspci | grep ‐i nvidia ## 安裝devel 
$ cd ../devel 
$ yum localinstall ‐y *.rpm 
## 安裝gcc $ cd ../gcc 
$ yum localinstall ‐y *.rpm 
## 卸載nouveau驅動 
$ lsmod | grep nouveau 
$ rmmod nouveau 
## 安裝驅動,過程見下面附的圖片。若是要更新驅動,從https://developer.nvidia.com/cuda‐75‐downloads‐ archive下載 
$ cd ../../../driver 
$ sh cuda_10.1.105_418.39_linux.run ## 測試驅動,有以下輸出則正常安裝


執行命令驗證結果

$ nvidia‐smi
複製代碼

附:安裝驅動圖

(1) 輸入accept,回車

(2) 選擇install,回車
複製代碼

b. 配置Containerd

## 更新runc,下載地址https://github.com/opencontainers/runc/releases 
$ cd ../runtime 
$ cp runc /usr/bin/ 
## 更新containerd,下載地址 https://github.com/containerd/containerd/releases 
$ tar ‐zxvf containerd‐1.4.3‐linux‐amd64.tar.gz 
$ cp bin/* /usr/bin/ 
## 安裝nvidia‐container‐runtime,yum源https://nvidia.github.io/nvidia‐docker/centos7/nvidia‐ docker.repo,yum安裝:yum install ‐y nvidia‐container‐runtime $ tar ‐zxvf nvidia‐container‐runtime.tar.gz 
$ cd nvidia‐container‐runtime 
$ yum localinstall ‐y *.rpm
複製代碼

修改改containerd啓動參數

# 配置containerd的參數 
$ mkdir /etc/containerd/ 
$ vi /etc/containerd/config.toml 
# 配置containerd.service 
$ vi /usr/lib/systemd/system/containerd.service 
$ systemctl daemon‐reload 
$ systemctl restart containerd 
# 配置crictl 
$ tar ‐zxvf crictl‐v1.18.0‐linux‐amd64.tar.gz 
$ mv crictl /usr/bin/ 
$ vi /etc/profile alias crictl='crictl ‐‐runtime‐endpoint unix:///run/containerd/containerd.sock' 
$ source /etc/profile 
# 測試containerd和nvidia‐container‐runtime安裝是否成功 
$ cd test‐image 
$ ctr images import cuda‐vector‐add_v0.1.tar 
$ ctr images push ‐‐plain‐http registry.paas/cmss/cuda‐vector‐add:v0.1

執行檢驗

ctr run ‐t ‐‐gpus 0 registry.paas/cmss/cuda‐vector‐add:v0.1 cp nvidia‐smi

結果以下:



清理容器

ctr c rm cp

1)config.toml

執行containerd config default > /etc/containerd/config.toml生成配置,並作以下修改:



 注意:如上所述,1)default_runtime_name值爲nvidia,2)新增一個runtimes 3)如有內部鏡像倉庫,可修改docker.io爲內部倉庫名稱

2)containerd.service

[Unit] 
Description=containerd container runtime 
Documentation=https://containerd.io After=network.target

[Service] 
ExecStartPre=‐/sbin/modprobe overlay 
ExecStart=/usr/bin/containerd 
KillMode=process 
Delegate=yes 
LimitNOFILE=1048576 
# Having non‐zero Limit*s causes performance problems due to accounting overhead # in the kernel. We recommend using cgroups to do container‐local accounting. LimitNPROC=infinity 
LimitCORE=infinity 
TasksMax=infinity 
[Install] 
WantedBy=multi‐user.target
複製代碼

c. 部署Device Plugin

在部署完Kubernetes集羣后,修改kubelet運行時配置:

$ vi /apps/conf/kubernetes/kubelet.env ‐‐container‐runtime=remote ‐‐container‐runtime‐endpoint=unix:///run/containerd/containerd.sock

$ cd device‐plugin 
$ docker load ‐i k8s‐device‐plugin_v0.7.3.tar 
$ docker push
// https://github.com/NVIDIA/k8s-device-plugin/tree/master/deployments/static
$ kubectl apply ‐f nvidia‐device‐plugin.yml 
$ kubectl logs ‐f nvidia‐device‐plugin‐daemonset‐q9svq ‐nkube‐system 
2021/02/08 06:32:36 Loading NVML 2021/02/08 06:32:42 Starting FS watcher. 2021/02/08 06:32:42 Starting OS watcher. 2021/02/08 06:32:42 Retreiving plugins. 2021/02/08 06:32:42 Starting GRPC server for 'nvidia.com/gpu' 2021/02/08 06:32:42 Starting to serve 'nvidia.com/gpu' on /var/lib/kubelet/device‐ plugins/nvidia‐gpu.sock 2021/02/08 06:32:42 Registered device plugin for 'nvidia.com/gpu' with Kubelet
複製代碼

d. 功能測試

$ cd test‐image 
# 啓動測試pod 
$ kubectl apply ‐f demo.yml
// https://github.com/NVIDIA/gpu-operator/blob/master/tests/gpu-pod.yaml
$ kubectl logs ‐f cuda‐vector‐add 
[Vector addition of 50000 elements] 
Copy input data from the host memory to the CUDA device 
CUDA kernel launch with 196 blocks of 256 threads 
Copy output data from the CUDA device to the host memory 
Test PASSED 
Done
複製代碼

後續相關內容,請查看公衆號:DCOS

mp.weixin.qq.com/s/klpgz3KGO…

相關文章
相關標籤/搜索