NVidia驅動更新及Docker GPU安裝更新

NVidia對Linux支持最近進步挺大的,Docker和Kubernetes能夠直接使用GPU能力。NVidia最新的顯卡驅動是440.31,而Ubuntu 18.04的內置庫也到了430版本,CUDA到了10.1版本。git

一、NVidia驅動更新

Docker中使用GPU原來是須要安裝nvidia-docker2的(方法在下面),已經不須要了:github

Kubernetes中的容器也能夠直接使用GPU了。以下:docker

#### Test nvidia-smi with the latest official CUDA image
$ docker run --gpus all nvidia/cuda:9.0-base nvidia-smi

# Start a GPU enabled container on two GPUs
$ docker run --gpus 2 nvidia/cuda:9.0-base nvidia-smi

# Starting a GPU enabled container on specific GPUs
$ docker run --gpus '"device=1,2"' nvidia/cuda:9.0-base nvidia-smi
$ docker run --gpus '"device=UUID-ABCDEF,1"' nvidia/cuda:9.0-base nvidia-smi

# Specifying a capability (graphics, compute, ...) for my container
# Note this is rarely if ever used this way
$ docker run --gpus all,capabilities=utility nvidia/cuda:9.0-base nvidia-smi

問題:ubuntu

  • 安裝如上方法安裝後,docker ps顯示沒有任何容器,在安裝nvidia-docker2後就能夠了。估計是版本兼容性的問題,須要進一步驗證。

1.1 NVidia驅動下載

直接下載:curl

wget -c http://us.download.nvidia.com/XFree86/Linux-x86_64/440.31/NVIDIA-Linux-x86_64-440.31.run

若是之前安裝過NVidia的驅動,須要先卸載,而後再安裝。參考:測試

AS:this

sudo apt-get --purge remove nvidia-*
# sudo ./NVIDIA-Linux-x86_64-410.57.run -uninstall

sudo update-initramfs -u
sudo reboot now

1.2 CUDA驅動下載

在Ubuntu上,執行:url

wget -c https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget -c http://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda-repo-ubuntu1804-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1804-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb
sudo apt-key add /var/cuda-repo-10-1-local-10.1.243-418.87.00/7fa2af80.pub
sudo apt-get update
sudo apt-get -y install cuda

1.3 測試Docker的GPU能力

Docker版本(須要指定runtime):spa

docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi

原來的--runtime=nvidia也能運行(需安裝nvidia-docker2),但最新的版本使用--gpus參數(不須要安裝nvidia-docker2)。操作系統

二、Docker GPU支持更新問題解決

在Ubuntu 18.04上運行apt update時出現下面的錯誤信息:

「沒法下載 https://nvidia.github.io/nvidia-container-runtime/ubuntu18.04/amd64/InRelease  因爲沒有公鑰,沒法驗證下列簽名: NO_PUBKEY xxx"

估計是之前版本的pubkey過時了,解決辦法:

  • 基於Debian的Linux(如Ubuntu):
DIST=$(. /etc/os-release; echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | \
  sudo apt-key add -
curl -s -L https://nvidia.github.io/libnvidia-container/$DIST/libnvidia-container.list | \
  sudo tee /etc/apt/sources.list.d/libnvidia-container.list
sudo apt-get update

而後,就能夠正常更新了。

三、nvidia-docker2安裝與更新

參考NVidia的主頁(https://github.com/NVIDIA/nvidia-docker)。

  • 從Docker 19.03開始,NVIDIA GPU已經內置在Docker中,再也不須要nvidia-docker2。
  • 原說明以下:
    • Note that with the release of Docker 19.03, usage of nvidia-docker2 packages are deprecated since NVIDIA GPUs are now natively supported as devices in the Docker runtime. If you are an existing user of the nvidia-docker2 packages, review the instructions in the 「Upgrading with nvidia-docker2」 section.

以下:

docker run --gpus all nvidia/cuda:9.0-base nvidia-smi

安裝nvidia-docker2:

# Add the package repositories
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

$ sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
$ sudo systemctl restart docker

 

其它操做系統,參考:

相關文章
相關標籤/搜索