參考:
Ubuntu16.04.5 desktop 基本配置及遠程桌面html
想要用GPU版的MxNet必須用NVIDIA的GPU,若是沒有禁用Ubuntu自帶的顯卡驅動,更新Nvdia的驅動,就會出現如X server is running或者不停的提示你重啓,
或者即便你安裝成功了,也沒辦鏈接驅動等各類問題。python
桌面版的Ubuntu,就有一個最簡單的方式。在「軟件和更新」裏,有「附加驅動」這一選項,系統會自動檢測到NVIDIA官方的顯卡驅動,只要選中安裝而後重啓便可!
安裝完,查看顯卡驅動信息
linux
user@gpu:~$ nvidia-smi Sat Sep 22 17:50:29 2018 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 384.130 Driver Version: 384.130 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 108... Off | 00000000:03:00.0 On | N/A | | 0% 44C P8 14W / 300W | 249MiB / 11170MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 1184 G /usr/lib/xorg/Xorg 126MiB | | 0 1773 G compiz 120MiB | +-----------------------------------------------------------------------------+ user@gpu:~$
要求驅動版本>=384.81c++
下載:官方
選擇本身的驅動型號,系統版本,語言
個人版本爲:shell
類型 | 型號 |
---|---|
產品類型 | GeForce |
產品系列 | GeForce 10 Servers |
產品家族 | GeForce CTX 1080 Ti |
操做系統 | Linux 64-bit |
語言 | English |
我下載的文件爲:NVIDIA-Linux-x86_64-390.87.run數據庫
user@gpu:~$ mkdir ~/driver user@gpu:~$ cd ~/driver user@gpu:driver$ sudo chmod +x NVIDIA-Linux-x86_64-390.87.run user@gpu:driver$ sudo sh NVIDIA-Linux-x86_64-390.87.run
安裝第一部會提示協議條款,accept便可;以後按照提示進行安裝,中間會提示警告32-bit文件沒法安裝,忽略便可,接着下一步;接下來根據提示一步一步安裝便可…ubuntu
若是安裝nvidia顯卡驅動腳本時報以下錯誤:vim
ERROR: You appear to be running an X server; please exit X before installing. For further details, please see the section INSTALLING THE NVIDIA DRIVER in the README available on the Linux driver download page at www.nvidia.com.
一般中止顯示管理器就足以阻止Xbash
sudo systemctl stop lightdm.service
更廣泛的方法app
sudo systemctl stop display-manager
安裝完成後,重啓:
user@gpu:driver$ sudo reboot
安裝完,查看顯卡驅動信息
user@gpu:driver$ nvidia-smi
卸載方法
user@gpu:driver$ sudo sh NVIDIA-Linux-x86_64-390.87.run --uninstall
刪除Nouveau內核驅動程序(修復Nvidia安裝錯誤)
參考:https://tutorials.technology/tutorials/85-How-to-remove-Nouveau-kernel-driver-Nvidia-install-error.html
警告本教程可能會破壞您的系統,請確保在執行這些步驟以前備份系統。
若是當前正在使用Nouveau內核驅動程序,則安裝Offial nvidia驅動程序將返回錯誤。咱們將解釋如何修復錯誤並安裝官方驅動程序。
ERROR: The Nouveau kernel driver is currently in use by your system. This driver is incompatible with the NVIDIA driver, and must be disabled before proceeding. Please consult the NVIDIA driver README and your Linux distribution's documentation for details on how to correctly disable the Nouveau kernel driver.
在此步驟中,咱們將刪除全部與nvidia相關的包。
user@gpu:~$ sudo apt-get remove nvidia* && sudo apt autoremove
若是您收到如下錯誤,則表示您從未安裝過nvidia軟件包而且沒問題:
no matches found: nvidia*
如今安裝一些必需的依賴項:
user@gpu:~$ sudo apt-get install dkms build-essential linux-headers-generic
如今阻止並禁用nouveau內核驅動程序:
user@gpu:~$ sudo vim /etc/modprobe.d/blacklist.conf #添加 blacklist nouveau blacklist lbm-nouveau options nouveau modeset=0 alias nouveau off alias lbm-nouveau off
鍵入如下命令禁用內核nouveau:
user@gpu:~$ echo options nouveau modeset=0 | sudo tee -a /etc/modprobe.d/nouveau-kms.conf build the new kernel by:
最後更新並重啓:
user@gpu:~$ sudo update-initramfs -u user@gpu:~$ reboot
爲防止安裝cuda時報以下錯誤,修改Ubuntu的默認啓動級別爲3。
Installing the NVIDIA display driver... It appears that an X server is running. Please exit X before installation. If you're sure that X is not running, but are getting this error, please delete any X lock files in /tmp.
user@gpu:~$ runlevel N 5
命令行模式和圖形界面模式的切換
如今若是想進入圖形用戶界面(僅進入一次,重啓系統後仍然會進入命令行模式),可執行以下命令:
user@gpu:~$ sudo systemctl start lightdm
若是想設置爲系統啓動後默認進入圖形用戶界面,執行以下命令:
user@gpu:~$ sudo systemctl set-default graphical.target
而後執行reboot命令重啓系統便可。
user@gpu:~$ sudo reboot
設置爲系統啓動後默認進入命令行,執行以下命令:
user@gpu:~$ sudo systemctl set-default multi-user.target
而後執行reboot命令重啓系統便可。
user@gpu:~$ sudo reboot
user@gpu:~$ runlevel N 3 user@gpu:~$
最新版:
https://developer.nvidia.com/cuda-downloads
存檔版:
https://developer.nvidia.com/cuda-toolkit-archive
user@gpu:/data/tools$ ll 總用量 1952872 drwxr-xr-x 3 user user 269 9月 14 13:25 ./ drwxr-xr-x 3 user user 19 9月 14 10:21 ../ -rw-rw-r-- 1 user user 1643293725 9月 22 16:35 cuda_9.0.176_384.81_linux.run
上面安裝的NviDia驅動版本是384.130,此程序包驅動版本爲384.81。
user@gpu:~$ sudo apt-get install freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libgl1-mesa-glx libglu1-mesa libglu1-mesa-dev
不然安裝cuda會報以下錯誤
Installing the NVIDIA display driver... Installing the CUDA Toolkit in /usr/local/cuda-9.0 ... Missing recommended library: libGLU.so Missing recommended library: libX11.so Missing recommended library: libXi.so Missing recommended library: libXmu.so
user@gpu:/data/tools$ sudo sh cuda_9.0.176_384.81_linux.run ...... # 空格鍵閱讀協議 ...... Do you accept the previously read EULA? accept/decline/quit: accept # 贊成協議 Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 384.81? (y)es/(n)o/(q)uit: y # 安裝NVIDIA加速圖形驅動程序 Do you want to install the OpenGL libraries? (y)es/(n)o/(q)uit [ default is yes ]: n # 不安裝OpenGL庫 Do you want to run nvidia-xconfig? This will update the system X configuration file so that the NVIDIA X driver is used. The pre-existing X configuration file will be backed up. This option should not be used on systems that require a custom X configuration, such as systems with multiple GPU vendors. (y)es/(n)o/(q)uit [ default is no ]: # 默認不安裝nvidia-xconfig Install the CUDA 9.0 Toolkit? (y)es/(n)o/(q)uit: y # 安裝CUDA 9.0 Toolkit Enter Toolkit Location [ default is /usr/local/cuda-9.0 ]: # cuda安裝位置 Do you want to install a symbolic link at /usr/local/cuda? (y)es/(n)o/(q)uit: y # 安裝符號連接 Install the CUDA 9.0 Samples? (y)es/(n)o/(q)uit: y # 安裝CUDA示例 Enter CUDA Samples Location [ default is /home/user ]: # CUDA示例位置 Installing the NVIDIA display driver... Installing the CUDA Toolkit in /usr/local/cuda-9.0 ... Installing the CUDA Samples in /home/user ... Copying samples to /home/user/NVIDIA_CUDA-9.0_Samples now... Finished copying samples. =========== = Summary = =========== Driver: Installed Toolkit: Installed in /usr/local/cuda-9.0 Samples: Installed in /home/user Please make sure that # 提示添加變量 - PATH includes /usr/local/cuda-9.0/bin - LD_LIBRARY_PATH includes /usr/local/cuda-9.0/lib64, or, add /usr/local/cuda-9.0/lib64 to /etc/ld.so.conf and run ldconfig as root - PATH包括/usr/local/cuda-9.0/bin - LD_LIBRARY_PATH包含/usr/local/cuda-9.0/lib64,或者將/usr/local/cuda-9.0/lib64添加到/etc/ld.so.conf並以root身份運行ldconfig To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-9.0/bin To uninstall the NVIDIA Driver, run nvidia-uninstall # 卸載方法 Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-9.0/doc/pdf for detailed information on setting up CUDA. Logfile is /tmp/cuda_install_14141.log user@gpu:/data/tools/tensorflow-gpu$
user@gpu:~$ vim ~/.bashrc # 在最後追加 # cuda export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}} export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}} user@gpu:~$ source ~/.bashrc
user@gpu:/data/tools/tensorflow-gpu$ nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2017 NVIDIA Corporation Built on Fri_Sep__1_21:08:03_CDT_2017 Cuda compilation tools, release 9.0, V9.0.176
GPU加速深度學習
安裝cudnn前先要確保cuda和NVIDIA驅動已正確安裝
須要註冊登陸NVIDIA帳戶
https://developer.nvidia.com/cudnn
選擇系統以及cuda對應的cudnn版本(ubuntu3個包)
user@gpu:/data/tools$ ll 總用量 1952872 drwxr-xr-x 3 user user 269 9月 14 13:25 ./ drwxr-xr-x 3 user user 19 9月 14 10:21 ../ -rw-rw-r-- 1 user user 1643293725 9月 22 16:35 cuda_9.0.176_384.81_linux.run -rw-rw-r-- 1 user user 125687148 9月 22 16:33 libcudnn7_7.3.0.29-1+cuda9.0_amd64.deb -rw-rw-r-- 1 user user 115870862 9月 22 16:33 libcudnn7-dev_7.3.0.29-1+cuda9.0_amd64.deb -rw-rw-r-- 1 user user 4913038 9月 22 16:33 libcudnn7-doc_7.3.0.29-1+cuda9.0_amd64.deb
user@gpu:/data/tools$ sudo dpkg -i libcudnn7_7.3.0.29-1+cuda9.0_amd64.deb 正在選中未選擇的軟件包 libcudnn7。 (正在讀取數據庫 ... 系統當前共安裝有 265027 個文件和目錄。) 正準備解包 libcudnn7_7.3.0.29-1+cuda9.0_amd64.deb ... 正在解包 libcudnn7 (7.3.0.29-1+cuda9.0) ... 正在設置 libcudnn7 (7.3.0.29-1+cuda9.0) ... 正在處理用於 libc-bin (2.23-0ubuntu10) 的觸發器 ... user@gpu:/data/tools$ sudo dpkg -i libcudnn7-dev_7.3.0.29-1+cuda9.0_amd64.deb 正在選中未選擇的軟件包 libcudnn7-dev。 (正在讀取數據庫 ... 系統當前共安裝有 265033 個文件和目錄。) 正準備解包 libcudnn7-dev_7.3.0.29-1+cuda9.0_amd64.deb ... 正在解包 libcudnn7-dev (7.3.0.29-1+cuda9.0) ... 正在設置 libcudnn7-dev (7.3.0.29-1+cuda9.0) ... update-alternatives: 使用 /usr/include/x86_64-linux-gnu/cudnn_v7.h 來在自動模式中提供 /usr/include/cudnn.h (libcudnn) user@gpu:/data/tools$ sudo dpkg -i libcudnn7-doc_7.3.0.29-1+cuda9.0_amd64.deb 正在選中未選擇的軟件包 libcudnn7-doc。 (正在讀取數據庫 ... 系統當前共安裝有 265039 個文件和目錄。) 正準備解包 libcudnn7-doc_7.3.0.29-1+cuda9.0_amd64.deb ... 正在解包 libcudnn7-doc (7.3.0.29-1+cuda9.0) ... 正在設置 libcudnn7-doc (7.3.0.29-1+cuda9.0) ...
user@gpu:/data/tools$ cp -r /usr/src/cudnn_samples_v7 $HOME user@gpu:/data/tools$ cd $HOME/cudnn_samples_v7/mnistCUDNN user@gpu:~/cudnn_samples_v7/mnistCUDNN$ make clean && make rm -rf *o rm -rf mnistCUDNN /usr/local/cuda/bin/nvcc -ccbin g++ -I/usr/local/cuda/include -IFreeImage/include -m64 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_53,code=compute_53 -o fp16_dev.o -c fp16_dev.cu g++ -I/usr/local/cuda/include -IFreeImage/include -o fp16_emu.o -c fp16_emu.cpp g++ -I/usr/local/cuda/include -IFreeImage/include -o mnistCUDNN.o -c mnistCUDNN.cpp /usr/local/cuda/bin/nvcc -ccbin g++ -m64 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_53,code=compute_53 -o mnistCUDNN fp16_dev.o fp16_emu.o mnistCUDNN.o -I/usr/local/cuda/include -IFreeImage/include -LFreeImage/lib/linux/x86_64 -LFreeImage/lib/linux -lcudart -lcublas -lcudnn -lfreeimage -lstdc++ -lm # 執行./mnistCUDNN user@gpu:~/cudnn_samples_v7/mnistCUDNN$ ./mnistCUDNN cudnnGetVersion() : 7300 , CUDNN_VERSION from cudnn.h : 7300 (7.3.0) Host compiler version : GCC 5.4.0 There are 1 CUDA capable devices on your machine : device 0 : sms 28 Capabilities 6.1, SmClock 1683.0 Mhz, MemSize (Mb) 11170, MemClock 5505.0 Mhz, Ecc=0, boardGroupID=0 Using device 0 Testing single precision Loading image data/one_28x28.pgm Performing forward propagation ... Testing cudnnGetConvolutionForwardAlgorithm ... Fastest algorithm is Algo 1 Testing cudnnFindConvolutionForwardAlgorithm ... ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.032384 time requiring 0 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.044032 time requiring 3464 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.053248 time requiring 57600 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.076640 time requiring 2057744 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.090112 time requiring 207360 memory Resulting weights from Softmax: 0.0000000 0.9999399 0.0000000 0.0000000 0.0000561 0.0000000 0.0000012 0.0000017 0.0000010 0.0000000 Loading image data/three_28x28.pgm Performing forward propagation ... Resulting weights from Softmax: 0.0000000 0.0000000 0.0000000 0.9×××88 0.0000000 0.0000711 0.0000000 0.0000000 0.0000000 0.0000000 Loading image data/five_28x28.pgm Performing forward propagation ... Resulting weights from Softmax: 0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 0.9999820 0.0000154 0.0000000 0.0000012 0.0000006 Result of classification: 1 3 5 Test passed! Testing half precision (math in single precision) Loading image data/one_28x28.pgm Performing forward propagation ... Testing cudnnGetConvolutionForwardAlgorithm ... Fastest algorithm is Algo 1 Testing cudnnFindConvolutionForwardAlgorithm ... ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.031712 time requiring 0 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.033664 time requiring 3464 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.074752 time requiring 28800 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.079872 time requiring 207360 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.084992 time requiring 2057744 memory Resulting weights from Softmax: 0.0000001 1.0000000 0.0000001 0.0000000 0.0000563 0.0000001 0.0000012 0.0000017 0.0000010 0.0000001 Loading image data/three_28x28.pgm Performing forward propagation ... Resulting weights from Softmax: 0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000714 0.0000000 0.0000000 0.0000000 0.0000000 Loading image data/five_28x28.pgm Performing forward propagation ... Resulting weights from Softmax: 0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 1.0000000 0.0000154 0.0000000 0.0000012 0.0000006 Result of classification: 1 3 5 Test passed!
成功安裝,會提示「Test passed!」信息
首先安裝Python3.6和pip3
ubuntu 16.04默認安裝Python 2.7.12和Python 3.5.2
sudo add-apt-repository ppa:jonathonf/python-3.6
sudo apt-get update sudo apt-get install python3.6 sudo apt-get install python3.6-gdbm
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.5 1 sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.6 2 sudo update-alternatives --config python3
終端輸入
python3 -V
sudo apt-get install python3-pip python3-dev
sudo pip3 install --upgrade pip
ImportError: cannot import name 'main'
修改以下:
sudo cp -a /usr/bin/pip3{,_backup} sudo vim /usr/bin/pip3
將原來的:
from pip import main if __name__ == '__main__': sys.exit(main())
改成:
from pip import __main__ if __name__ == '__main__': sys.exit(__main__._main())
指定阿里鏡像源,會快好多 sudo pip3 install --index-url https://mirrors.aliyun.com/pypi/simple tensorflow-gpu 也能夠指定版本安裝 sudo pip3 install --upgrade tfBinaryURL
tfBinaryURL 表示 TensorFlow Python 軟件包的網址。tfBinaryURL 的正確值取決於操做系統、Python 版本和 GPU 支持。可在此處查找 tfBinaryURL 的相應值。例如,要爲裝有 Python 3.6 的 Linux 安裝僅支持 CPU 的 TensorFlow,可發出如下命令:
sudo pip3 install --upgrade https://download.tensorflow.google.cn/linux/cpu/tensorflow-1.8.0-cp36-cp36m-linux_x86_64.whl
sudo pip3 uninstall tensorflow
運行一個簡短的 TensorFlow 程序
從 shell 中調用 Python,以下所示:
個人python指向默認2.7,把python3指向了python3.6 $ python3
在 Python 交互式 shell 中輸入如下幾行簡短的程序代碼:
# Python import tensorflow as tf hello = tf.constant('Hello, TensorFlow!') sess = tf.Session() print(sess.run(hello))
若是系統輸出如下內容,說明您能夠開始編寫 TensorFlow 程序了:
Hello, TensorFlow!