NVidia對Linux支持最近進步挺大的,Docker和Kubernetes能夠直接使用GPU能力。NVidia最新的顯卡驅動是440.31,而Ubuntu 18.04的內置庫也到了430版本,CUDA到了10.1版本。git
Docker中使用GPU原來是須要安裝nvidia-docker2的(方法在下面),已經不須要了:github
Kubernetes中的容器也能夠直接使用GPU了。以下:docker
#### Test nvidia-smi with the latest official CUDA image $ docker run --gpus all nvidia/cuda:9.0-base nvidia-smi # Start a GPU enabled container on two GPUs $ docker run --gpus 2 nvidia/cuda:9.0-base nvidia-smi # Starting a GPU enabled container on specific GPUs $ docker run --gpus '"device=1,2"' nvidia/cuda:9.0-base nvidia-smi $ docker run --gpus '"device=UUID-ABCDEF,1"' nvidia/cuda:9.0-base nvidia-smi # Specifying a capability (graphics, compute, ...) for my container # Note this is rarely if ever used this way $ docker run --gpus all,capabilities=utility nvidia/cuda:9.0-base nvidia-smi
問題:ubuntu
直接下載:curl
wget -c http://us.download.nvidia.com/XFree86/Linux-x86_64/440.31/NVIDIA-Linux-x86_64-440.31.run
若是之前安裝過NVidia的驅動,須要先卸載,而後再安裝。參考:測試
AS:this
sudo apt-get --purge remove nvidia-* # sudo ./NVIDIA-Linux-x86_64-410.57.run -uninstall sudo update-initramfs -u sudo reboot now
在Ubuntu上,執行:url
wget -c https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600 wget -c http://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda-repo-ubuntu1804-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb sudo dpkg -i cuda-repo-ubuntu1804-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb sudo apt-key add /var/cuda-repo-10-1-local-10.1.243-418.87.00/7fa2af80.pub sudo apt-get update sudo apt-get -y install cuda
Docker版本(須要指定runtime):spa
docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
原來的--runtime=nvidia也能運行(需安裝nvidia-docker2),但最新的版本使用--gpus參數(不須要安裝nvidia-docker2)。操作系統
在Ubuntu 18.04上運行apt update時出現下面的錯誤信息:
「沒法下載 https://nvidia.github.io/nvidia-container-runtime/ubuntu18.04/amd64/InRelease 因爲沒有公鑰,沒法驗證下列簽名: NO_PUBKEY xxx"
估計是之前版本的pubkey過時了,解決辦法:
DIST=$(. /etc/os-release; echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | \ sudo apt-key add - curl -s -L https://nvidia.github.io/libnvidia-container/$DIST/libnvidia-container.list | \ sudo tee /etc/apt/sources.list.d/libnvidia-container.list sudo apt-get update
而後,就能夠正常更新了。
參考NVidia的主頁(https://github.com/NVIDIA/nvidia-docker)。
以下:
docker run --gpus all nvidia/cuda:9.0-base nvidia-smi
安裝nvidia-docker2:
# Add the package repositories $ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) $ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - $ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list $ sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit $ sudo systemctl restart docker
其它操做系統,參考: