Ubuntu16.04 基於NVIDIA 1080Ti安裝TensorFlow-GPU

Ubuntu16.04 基於NVIDIA 1080Ti安裝TensorFlow-GPU

環境

1 安裝Ubuntu 16.04.5系統基本配置,及遠程桌面

參考:
Ubuntu16.04.5 desktop 基本配置及遠程桌面html

2 安裝NVIDIA驅動程序(重要)

2.1 方法一:桌面(desktop)安裝

想要用GPU版的MxNet必須用NVIDIA的GPU,若是沒有禁用Ubuntu自帶的顯卡驅動,更新Nvdia的驅動,就會出現如X server is running或者不停的提示你重啓,
或者即便你安裝成功了,也沒辦鏈接驅動等各類問題。python

桌面版的Ubuntu,就有一個最簡單的方式。在「軟件和更新」裏,有「附加驅動」這一選項,系統會自動檢測到NVIDIA官方的顯卡驅動,只要選中安裝而後重啓便可!
安裝完,查看顯卡驅動信息
Ubuntu16.04 基於NVIDIA 1080Ti安裝TensorFlow-GPUlinux

user@gpu:~$ nvidia-smi 
Sat Sep 22 17:50:29 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.130                Driver Version: 384.130                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:03:00.0  On |                  N/A |
|  0%   44C    P8    14W / 300W |    249MiB / 11170MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1184      G   /usr/lib/xorg/Xorg                           126MiB |
|    0      1773      G   compiz                                       120MiB |
+-----------------------------------------------------------------------------+
user@gpu:~$

要求驅動版本>=384.81c++

2.2 方法二:server版安裝

2.2.1 驅動下載

下載:官方
選擇本身的驅動型號,系統版本,語言
個人版本爲:shell

類型 型號
產品類型 GeForce
產品系列 GeForce 10 Servers
產品家族 GeForce CTX 1080 Ti
操做系統 Linux 64-bit
語言 English

我下載的文件爲:NVIDIA-Linux-x86_64-390.87.run數據庫

2.2.2 安裝

user@gpu:~$ mkdir ~/driver
user@gpu:~$ cd ~/driver
user@gpu:driver$ sudo chmod +x NVIDIA-Linux-x86_64-390.87.run
user@gpu:driver$ sudo sh NVIDIA-Linux-x86_64-390.87.run

安裝第一部會提示協議條款,accept便可;以後按照提示進行安裝,中間會提示警告32-bit文件沒法安裝,忽略便可,接着下一步;接下來根據提示一步一步安裝便可…ubuntu

若是安裝nvidia顯卡驅動腳本時報以下錯誤:vim

ERROR: You appear to be running an X server; please exit X before 
installing. For further details, please see the section INSTALLING 
THE NVIDIA DRIVER in the README available on the Linux driver 
download page at www.nvidia.com.

一般中止顯示管理器就足以阻止Xbash

sudo systemctl stop lightdm.service

更廣泛的方法app

sudo systemctl stop display-manager

安裝完成後,重啓:

user@gpu:driver$ sudo reboot

安裝完,查看顯卡驅動信息

user@gpu:driver$ nvidia-smi

卸載方法

user@gpu:driver$ sudo sh NVIDIA-Linux-x86_64-390.87.run --uninstall

2.3 方法三 禁用Ubuntu自帶顯卡驅動

刪除Nouveau內核驅動程序(修復Nvidia安裝錯誤)
參考:https://tutorials.technology/tutorials/85-How-to-remove-Nouveau-kernel-driver-Nvidia-install-error.html

警告本教程可能會破壞您的系統,請確保在執行這些步驟以前備份系統。

若是當前正在使用Nouveau內核驅動程序,則安裝Offial nvidia驅動程序將返回錯誤。咱們將解釋如何修復錯誤並安裝官方驅動程序。

ERROR: The Nouveau kernel driver is currently in use by your system.  
This driver is incompatible with the NVIDIA driver, and must be disabled before proceeding.  
Please consult the NVIDIA driver README and
your Linux distribution's documentation for details on how to correctly disable the Nouveau kernel driver.

2.3.1 清理全部nvidia包

在此步驟中,咱們將刪除全部與nvidia相關的包。

user@gpu:~$ sudo apt-get remove nvidia* && sudo apt autoremove

若是您收到如下錯誤,則表示您從未安裝過nvidia軟件包而且沒問題:

no matches found: nvidia*

如今安裝一些必需的依賴項:

user@gpu:~$ sudo apt-get install dkms build-essential linux-headers-generic

2.3.2 黑名單nouveau驅動程序

如今阻止並禁用nouveau內核驅動程序:

user@gpu:~$ sudo vim /etc/modprobe.d/blacklist.conf
#添加

blacklist nouveau
blacklist lbm-nouveau
options nouveau modeset=0
alias nouveau off
alias lbm-nouveau off

2.3.3 更新initramfs

鍵入如下命令禁用內核nouveau:

user@gpu:~$ echo options nouveau modeset=0 | sudo tee -a /etc/modprobe.d/nouveau-kms.conf build the new kernel by:

最後更新並重啓:

user@gpu:~$ sudo update-initramfs -u
user@gpu:~$ reboot

3 安裝Nvidia cuda_9.0

3.1 修改Ubuntu的默認啓動級別爲3

爲防止安裝cuda時報以下錯誤,修改Ubuntu的默認啓動級別爲3。

Installing the NVIDIA display driver...
It appears that an X server is running. Please exit X before installation. 
If you're sure that X is not running, but are getting this error, please delete any X lock files in /tmp.

3.1.1 查看系統目前運行級別

user@gpu:~$ runlevel
N 5

3.1.2 修改運行級別爲3

命令行模式和圖形界面模式的切換

命令行 --> 圖形界面:

如今若是想進入圖形用戶界面(僅進入一次,重啓系統後仍然會進入命令行模式),可執行以下命令:

user@gpu:~$ sudo systemctl start lightdm

若是想設置爲系統啓動後默認進入圖形用戶界面,執行以下命令:

user@gpu:~$ sudo systemctl set-default graphical.target

而後執行reboot命令重啓系統便可。

user@gpu:~$ sudo reboot
圖形界面 --> 命令行:

設置爲系統啓動後默認進入命令行,執行以下命令:

user@gpu:~$ sudo systemctl set-default multi-user.target

而後執行reboot命令重啓系統便可。

user@gpu:~$ sudo reboot

3.1.3 驗證

user@gpu:~$ runlevel
N 3
user@gpu:~$

3.2 下載Nvidia cuda_9.0

3.2.1 下載地址

最新版:
https://developer.nvidia.com/cuda-downloads

存檔版:
https://developer.nvidia.com/cuda-toolkit-archive

user@gpu:/data/tools$ ll
總用量 1952872
drwxr-xr-x 3 user user        269 9月  14 13:25 ./
drwxr-xr-x 3 user user         19 9月  14 10:21 ../
-rw-rw-r-- 1 user user 1643293725 9月  22 16:35 cuda_9.0.176_384.81_linux.run

上面安裝的NviDia驅動版本是384.130,此程序包驅動版本爲384.81。

3.2.2 安裝依賴包libGLU.so + libX11.so + libXi.so + libXmu.so

user@gpu:~$ sudo apt-get install freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libgl1-mesa-glx libglu1-mesa libglu1-mesa-dev

不然安裝cuda會報以下錯誤

Installing the NVIDIA display driver...
Installing the CUDA Toolkit in /usr/local/cuda-9.0 ...
Missing recommended library: libGLU.so
Missing recommended library: libX11.so
Missing recommended library: libXi.so
Missing recommended library: libXmu.so

3.3 安裝Nvidia cuda_9.0驅動

user@gpu:/data/tools$ sudo sh cuda_9.0.176_384.81_linux.run
......
# 空格鍵閱讀協議
......
Do you accept the previously read EULA?
accept/decline/quit: accept             # 贊成協議

Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 384.81?
(y)es/(n)o/(q)uit: y                    # 安裝NVIDIA加速圖形驅動程序

Do you want to install the OpenGL libraries?
(y)es/(n)o/(q)uit [ default is yes ]: n # 不安裝OpenGL庫

Do you want to run nvidia-xconfig?
This will update the system X configuration file so that the NVIDIA X driver
is used. The pre-existing X configuration file will be backed up.
This option should not be used on systems that require a custom
X configuration, such as systems with multiple GPU vendors.
(y)es/(n)o/(q)uit [ default is no ]:    # 默認不安裝nvidia-xconfig

Install the CUDA 9.0 Toolkit?
(y)es/(n)o/(q)uit: y                    # 安裝CUDA 9.0 Toolkit

Enter Toolkit Location
 [ default is /usr/local/cuda-9.0 ]:    # cuda安裝位置

Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: y                    # 安裝符號連接

Install the CUDA 9.0 Samples?
(y)es/(n)o/(q)uit: y                    # 安裝CUDA示例

Enter CUDA Samples Location
 [ default is /home/user ]:             # CUDA示例位置

Installing the NVIDIA display driver...
Installing the CUDA Toolkit in /usr/local/cuda-9.0 ...
Installing the CUDA Samples in /home/user ...
Copying samples to /home/user/NVIDIA_CUDA-9.0_Samples now...
Finished copying samples.

===========
= Summary =
===========

Driver:   Installed
Toolkit:  Installed in /usr/local/cuda-9.0
Samples:  Installed in /home/user

Please make sure that                   # 提示添加變量
 -   PATH includes /usr/local/cuda-9.0/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda-9.0/lib64, or, add /usr/local/cuda-9.0/lib64 to /etc/ld.so.conf and run ldconfig as root

 - PATH包括/usr/local/cuda-9.0/bin
 - LD_LIBRARY_PATH包含/usr/local/cuda-9.0/lib64,或者將/usr/local/cuda-9.0/lib64添加到/etc/ld.so.conf並以root身份運行ldconfig

To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-9.0/bin
To uninstall the NVIDIA Driver, run nvidia-uninstall    # 卸載方法

Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-9.0/doc/pdf for detailed information on setting up CUDA.

Logfile is /tmp/cuda_install_14141.log
user@gpu:/data/tools/tensorflow-gpu$

3.4 添加環境變量

user@gpu:~$ vim ~/.bashrc           # 在最後追加
# cuda
export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
user@gpu:~$ source ~/.bashrc

3.5 驗證

user@gpu:/data/tools/tensorflow-gpu$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

4 安裝NVIDIA cuDNN 7_7.3.0.29-1

GPU加速深度學習
安裝cudnn前先要確保cuda和NVIDIA驅動已正確安裝

4.1 下載deb安裝包

須要註冊登陸NVIDIA帳戶
https://developer.nvidia.com/cudnn
選擇系統以及cuda對應的cudnn版本(ubuntu3個包)

user@gpu:/data/tools$ ll
總用量 1952872
drwxr-xr-x 3 user user        269 9月  14 13:25 ./
drwxr-xr-x 3 user user         19 9月  14 10:21 ../
-rw-rw-r-- 1 user user 1643293725 9月  22 16:35 cuda_9.0.176_384.81_linux.run
-rw-rw-r-- 1 user user 125687148 9月  22 16:33 libcudnn7_7.3.0.29-1+cuda9.0_amd64.deb
-rw-rw-r-- 1 user user 115870862 9月  22 16:33 libcudnn7-dev_7.3.0.29-1+cuda9.0_amd64.deb
-rw-rw-r-- 1 user user   4913038 9月  22 16:33 libcudnn7-doc_7.3.0.29-1+cuda9.0_amd64.deb

4.2 安裝cuDNN

user@gpu:/data/tools$ sudo dpkg -i libcudnn7_7.3.0.29-1+cuda9.0_amd64.deb
正在選中未選擇的軟件包 libcudnn7。
(正在讀取數據庫 ... 系統當前共安裝有 265027 個文件和目錄。)
正準備解包 libcudnn7_7.3.0.29-1+cuda9.0_amd64.deb  ...
正在解包 libcudnn7 (7.3.0.29-1+cuda9.0) ...
正在設置 libcudnn7 (7.3.0.29-1+cuda9.0) ...
正在處理用於 libc-bin (2.23-0ubuntu10) 的觸發器 ...

user@gpu:/data/tools$ sudo dpkg -i libcudnn7-dev_7.3.0.29-1+cuda9.0_amd64.deb
正在選中未選擇的軟件包 libcudnn7-dev。
(正在讀取數據庫 ... 系統當前共安裝有 265033 個文件和目錄。)
正準備解包 libcudnn7-dev_7.3.0.29-1+cuda9.0_amd64.deb  ...
正在解包 libcudnn7-dev (7.3.0.29-1+cuda9.0) ...
正在設置 libcudnn7-dev (7.3.0.29-1+cuda9.0) ...
update-alternatives: 使用 /usr/include/x86_64-linux-gnu/cudnn_v7.h 來在自動模式中提供 /usr/include/cudnn.h (libcudnn)

user@gpu:/data/tools$ sudo dpkg -i libcudnn7-doc_7.3.0.29-1+cuda9.0_amd64.deb
正在選中未選擇的軟件包 libcudnn7-doc。
(正在讀取數據庫 ... 系統當前共安裝有 265039 個文件和目錄。)
正準備解包 libcudnn7-doc_7.3.0.29-1+cuda9.0_amd64.deb  ...
正在解包 libcudnn7-doc (7.3.0.29-1+cuda9.0) ...
正在設置 libcudnn7-doc (7.3.0.29-1+cuda9.0) ...

4.3 驗證cudnn是否安裝成功

user@gpu:/data/tools$ cp -r /usr/src/cudnn_samples_v7 $HOME
user@gpu:/data/tools$ cd $HOME/cudnn_samples_v7/mnistCUDNN
user@gpu:~/cudnn_samples_v7/mnistCUDNN$ make clean && make
rm -rf *o
rm -rf mnistCUDNN
/usr/local/cuda/bin/nvcc -ccbin g++ -I/usr/local/cuda/include -IFreeImage/include  -m64    -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_53,code=compute_53 -o fp16_dev.o -c fp16_dev.cu
g++ -I/usr/local/cuda/include -IFreeImage/include   -o fp16_emu.o -c fp16_emu.cpp
g++ -I/usr/local/cuda/include -IFreeImage/include   -o mnistCUDNN.o -c mnistCUDNN.cpp
/usr/local/cuda/bin/nvcc -ccbin g++   -m64      -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_53,code=compute_53 -o mnistCUDNN fp16_dev.o fp16_emu.o mnistCUDNN.o -I/usr/local/cuda/include -IFreeImage/include  -LFreeImage/lib/linux/x86_64 -LFreeImage/lib/linux -lcudart -lcublas -lcudnn -lfreeimage -lstdc++ -lm

#  執行./mnistCUDNN
user@gpu:~/cudnn_samples_v7/mnistCUDNN$ ./mnistCUDNN 
cudnnGetVersion() : 7300 , CUDNN_VERSION from cudnn.h : 7300 (7.3.0)
Host compiler version : GCC 5.4.0
There are 1 CUDA capable devices on your machine :
device 0 : sms 28  Capabilities 6.1, SmClock 1683.0 Mhz, MemSize (Mb) 11170, MemClock 5505.0 Mhz, Ecc=0, boardGroupID=0
Using device 0

Testing single precision
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm ...
Fastest algorithm is Algo 1
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.032384 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.044032 time requiring 3464 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.053248 time requiring 57600 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.076640 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.090112 time requiring 207360 memory
Resulting weights from Softmax:
0.0000000 0.9999399 0.0000000 0.0000000 0.0000561 0.0000000 0.0000012 0.0000017 0.0000010 0.0000000 
Loading image data/three_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 0.9×××88 0.0000000 0.0000711 0.0000000 0.0000000 0.0000000 0.0000000 
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 0.9999820 0.0000154 0.0000000 0.0000012 0.0000006 

Result of classification: 1 3 5

Test passed!

Testing half precision (math in single precision)
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm ...
Fastest algorithm is Algo 1
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.031712 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.033664 time requiring 3464 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.074752 time requiring 28800 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.079872 time requiring 207360 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.084992 time requiring 2057744 memory
Resulting weights from Softmax:
0.0000001 1.0000000 0.0000001 0.0000000 0.0000563 0.0000001 0.0000012 0.0000017 0.0000010 0.0000001 
Loading image data/three_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000714 0.0000000 0.0000000 0.0000000 0.0000000 
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 1.0000000 0.0000154 0.0000000 0.0000012 0.0000006 

Result of classification: 1 3 5

Test passed!

成功安裝,會提示「Test passed!」信息

5 採用原生 pip 方法安裝TensorFlow

首先安裝Python3.6和pip3

5.1 安裝python3.6(可選)

ubuntu 16.04默認安裝Python 2.7.12和Python 3.5.2

5.1.1 配置第三方軟件倉庫

sudo add-apt-repository ppa:jonathonf/python-3.6

5.1.2 檢查系統軟件包並安裝Python3.6

sudo apt-get update
sudo apt-get install python3.6
sudo apt-get install python3.6-gdbm

5.1.3 把Python3.6改成Python3首選項

sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.5 1
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.6 2
sudo update-alternatives --config python3

5.1.4 測試

終端輸入

python3 -V

5.2 安裝pip3

5.2.1 安裝

sudo apt-get install python3-pip python3-dev

5.2.2 升級pip3(可選)

sudo pip3 install --upgrade pip
若是pip3安裝其它軟件報以下錯誤:
ImportError: cannot import name 'main'

修改以下:

sudo cp -a /usr/bin/pip3{,_backup}
sudo vim /usr/bin/pip3

將原來的:

from pip import main
if __name__ == '__main__':
    sys.exit(main())

改成:

from pip import __main__
if __name__ == '__main__':
    sys.exit(__main__._main())

6 安裝 TensorFlow-GPU

6.1 安裝

指定阿里鏡像源,會快好多
sudo pip3 install --index-url https://mirrors.aliyun.com/pypi/simple tensorflow-gpu

也能夠指定版本安裝
sudo pip3 install --upgrade tfBinaryURL

tfBinaryURL 表示 TensorFlow Python 軟件包的網址。tfBinaryURL 的正確值取決於操做系統、Python 版本和 GPU 支持。可在此處查找 tfBinaryURL 的相應值。例如,要爲裝有 Python 3.6 的 Linux 安裝僅支持 CPU 的 TensorFlow,可發出如下命令:

sudo pip3 install --upgrade https://download.tensorflow.google.cn/linux/cpu/tensorflow-1.8.0-cp36-cp36m-linux_x86_64.whl

6.2 卸載(不操做)

sudo pip3 uninstall tensorflow

7 驗證安裝

運行一個簡短的 TensorFlow 程序
從 shell 中調用 Python,以下所示:

個人python指向默認2.7,把python3指向了python3.6
$ python3

在 Python 交互式 shell 中輸入如下幾行簡短的程序代碼:

# Python
import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))

若是系統輸出如下內容,說明您能夠開始編寫 TensorFlow 程序了:

Hello, TensorFlow!

參考

https://blog.csdn.net/Jonms/article/details/79318566

相關文章
相關標籤/搜索