Ubuntu 16.04 + CUDA 8.0 + cuDNN v5.1 + TensorFlow(GPU support)安裝配置詳解

隨着圖像識別和深度學習領域的迅猛發展,GPU時代即未來臨。因爲GPU處理深度學習算法的高效性,使得配置一臺搭載有GPU的服務器變得尤其必要。
本文主要介紹在Ubuntu 16.04環境下如何配置TensorFlow(GPU support)框架,實驗所用的顯卡爲GeForce GTX 1080ti(OC),顯存11G,頻率1569-1708MHz,CUDA核心3584個,Compute Capability爲6.1。下面詳細介紹安裝配置的各個步驟。html

關於本人實驗室所用硬件的配置清單,請訪問深度學習硬件購買指南python

1.安裝Ubuntu 16.04

本文中,咱們將安裝Win 10 pro+Ubuntu 16.04 LTS雙系統。系統安裝的教程網上鋪天蓋地,在此再也不詳述。給出兩個教程連接供你們參考:
http://jingyan.baidu.com/article/d5c4b52bc0ae1fda560dc5ac.html
http://jingyan.baidu.com/article/dca1fa6fa3b905f1a44052bd.html
建議使用Universal USB Installer,簡單小巧易用。本實驗的Ubuntu 16.04的分區大小及設置以下(僅供參考):linux

boot 主分區 Ext4 8192MB(8G)
/ 邏輯分區 Ext4 81920MB(80G)
/home 邏輯分區 Ext4 153600MB(150G)
swap 邏輯分區 8192MB(8G)

安裝完成後,登陸Ubuntu 16.04系統,若進入grub rescue模式,請參考以下方法修復grub:
http://yhz61010.iteye.com/blog/2302418
修復完grub後,若沒法開機或開機黑屏,請參考以下方法解決:
http://blog.csdn.net/Good_Day_Day/article/details/74352534c++

1.1安裝NVIDIA驅動程序

NVIDIA 驅動程序下載連接:
http://www.nvidia.cn/Download/index.aspxgit

1.1.1 For Win 10 pro:

點擊上面連接進入下載頁面,選擇合適版本下載並安裝,或直接安裝驅動精靈更新系統驅動。安裝過程當中若出現兼容性問題,請按win+i進入設置中更新系統。
驅動安裝完成後,能夠在Windows系統中檢測硬件性能,檢測軟件有:AS SSD Benchmark; cpu-z; techpowerup gpu-z; 3d mark; Furamrk等。github

1.1.2 For Ubuntu 16.04:

點擊上面連接進入下載頁面,選擇合適版本下載。安裝方法請參考以下兩個連接:
http://blog.csdn.net/tianrolin/article/details/52830422
http://blog.csdn.net/u012581999/article/details/52433609算法

2.Requirements to run TensorFlow with GPU support

2.1安裝CUDA 8.0

下載CUDA 8.0:
https://developer.nvidia.com/cuda-downloads
Installing from a deb File:shell

# 1.Install kernel headers and development packages for the currently running kernel
$ sudo apt-get install linux-headers-$(uname -r)
# 2.Install repository meta-data
$ sudo dpkg -i cuda-repo-<distro>_<version>_<architecture>.deb
# 3.Update the Apt repository cache
$ sudo apt-get update
# 4.Install CUDA
$ sudo apt-get install cuda

測試CUDA 8.0是否安裝成功,請參考CUDA 8.0官方教程:
http://docs.nvidia.com/cuda/cuda-installation-guide-linux/#recommended-post
請在2.3配置環境變量後進行測試。ubuntu

2.1.1 gcc版本降級

Ubuntu 16.04的gcc編譯器是5.4.0,然而CUDA 8.0不支持5.0以上的編譯器,所以須要降級,把編譯器版本降到4.9。命令以下:api

sudo apt-get install g++-4.9
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.9 20
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 10
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-4.9 20
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-5 10
sudo update-alternatives --install /usr/bin/cc cc /usr/bin/gcc 30
sudo update-alternatives --set cc /usr/bin/gcc
sudo update-alternatives --install /usr/bin/c++ c++ /usr/bin/g++ 30
sudo update-alternatives --set c++ /usr/bin/g++

順便科普一下如何查看Ubuntu 16.04的Kernel,GCC和GLIBC的版本信息。

# Kernel: 
$ uname -sr #或
$ uname -r
# GCC
$ gcc -v #或
$ gcc --version
# GLIBC
$ ldd --version

2.2.安裝cuDNN v5.1

下載cuDNN v5.1(須要註冊並填寫一個簡單的問卷調查):
https://developer.nvidia.com/cudnn
Installing from a Tar File:

# 1.Navigate to your <installpath> directory containing cuDNN.
# 2.Unzip the cuDNN package.
$ tar -xzvf cudnn-9.0-linux-x64-v7.tgz
# 3.Copy the following files into the CUDA Toolkit directory.
$ sudo cp cuda/include/cudnn.h /usr/local/cuda/include
$ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
$ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
# 4.更新軟鏈接
$ cd /usr/local/cuda/lib64/
$ sudo rm -rf libcudnn.so libcudnn.so.5 #刪除原有動態文件
$ sudo ln -s libcudnn.so.5.1.5 libcudnn.so.5 #生成軟銜接
$ sudo ln -s libcudnn.so.5 libcudnn.so #生成軟連接

2.3 配置環境變量

在終端輸入:
$ sudo gedit ~/.bashrc #打開.bashrc文件
在此文件末尾加入以下環境變量設置語句:

export CUDA_HOME=/usr/local/cuda-8.0
export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda-8.0/lib64:/usr/local/cuda-8.0/extras/CUPTI/lib64:~/cuDNN_installpath"

使設置當即生效,在終端執行:
$ source ~/.bashrc
或者重啓電腦便可。

2.4 Install Bazel

If bazel is not installed on your system, install it now by following these directions.

# 1. Install JDK 8
$ sudo apt-get install openjdk-8-jdk
# 2. Add Bazel distribution URI as a package source (one time setup)
$ echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
$ curl https://bazel.build/bazel-release.pub.gpg | sudo apt-key add -
# 3.Install and update Bazel
$ sudo apt-get update && sudo apt-get install bazel
# Once installed, you can upgrade to a newer version of Bazel with:
$ sudo apt-get upgrade bazel

參考連接:
https://bazel.build/versions/master/docs/install.html
若須要安裝curl,請按提示安裝:
$ sudo apt-get install curl

2.5 Install TensorFlow Python dependencies and libcupti-dev

To install these packages for Python 3.n, issue the following command:
$ sudo apt-get install python3-numpy python3-dev python3-pip python3-wheel
You must also install libcupti-dev by invoking the following command:
$ sudo apt-get install libcupti-dev

Ubuntu 16.04的默認python版本是2.*,關於如何將python從2.*升級到3.*,請自行查資料解決。

3.安裝TensorFlow(GPU support)

3.1 Clone the TensorFlow repository

To clone the latest TensorFlow repository, issue the following command:
$ git clone https://github.com/tensorflow/tensorflow

關於git的使用方法,請參考Git學習筆記

3.2 Configure the installation

配置安裝命令以下:

$ cd tensorflow  # cd to the top-level directory created
$ ./configure

Here is an example execution of the configure script. Note that your own input will likely differ from our sample input:

Please specify the location of python. [Default is /usr/bin/python]:
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:
Do you wish to use jemalloc as the malloc implementation? [Y/n]
jemalloc enabled
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N]
No Google Cloud Platform support will be enabled for TensorFlow
Do you wish to build TensorFlow with Hadoop File System support? [y/N]
No Hadoop File System support will be enabled for TensorFlow
Do you wish to build TensorFlow with the XLA just-in-time compiler (experimental)? [y/N]
No XLA JIT support will be enabled for TensorFlow
Found possible Python library paths:
  /usr/local/lib/python2.7/dist-packages
  /usr/lib/python2.7/dist-packages
Please input the desired Python library path to use. Default is [/usr/local/lib/python2.7/dist-packages]
Using python library path: /usr/local/lib/python2.7/dist-packages
Do you wish to build TensorFlow with OpenCL support? [y/N] N
No OpenCL support will be enabled for TensorFlow
Do you wish to build TensorFlow with CUDA support? [y/N] Y
CUDA support will be enabled for TensorFlow
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
Please specify the Cuda SDK version you want to use, e.g. 7.0. [Leave empty to use system default]: 8.0
Please specify the location where CUDA 8.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify the cuDNN version you want to use. [Leave empty to use system default]: 5
Please specify the location where cuDNN 5 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.
[Default is: "3.5,5.2"]: 6.1
Setting up Cuda include
Setting up Cuda lib
Setting up Cuda bin
Setting up Cuda nvvm
Setting up CUPTI include
Setting up CUPTI lib64
Configuration finished

可在配置前先執行bazel clean,避免與其餘配置衝突。

3.3 Build the pip package

To build a pip package for TensorFlow with GPU support, invoke the following command:
$ bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

若出現error load....錯誤,請在bazel build前添加sudo。執行這一步須要有耐心,若是是由於網絡緣由致使所需的包未成功下載,請重複執行此句。

The bazel build command builds a script named build_pip_package. Running this script as follows will build a .whl file within the /tmp/tensorflow_pkg directory:
$ bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

3.4 Install the pip package

Invoke pip install to install that pip package. The filename of the .whl file depends on your platform. For example, the following command will install the pip package.

# for TensorFlow 1.x.x on Linux:
$ sudo pip install /tmp/tensorflow_pkg/tensorflow-1.x.x-xxx....whl

3.5 Validate your installation

Validate your TensorFlow installation by doing the following:
Start a terminal. Change directory (cd) to any directory on your system other than the tensorflow subdirectory from which you invoked the configure command.
Invoke python:
$ python
Enter the following short program inside the python interactive shell:

# Python
import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))

If the system outputs the following, then you are ready to begin writing TensorFlow programs:
Hello, TensorFlow!
執行上述命令時,終端中會顯示關於顯卡的一些信息,如此則表示TensorFlow(GPU support)已正確安裝。

另外,再給出一個驗證程序。可將該程序保存爲xxx.py文件,用python xxx.py在終端中執行。以下:

import tensorflow as tf
# Creates a graph.
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))

其中,sess中的log_device_placement參數用來設置顯示Ops所用設備的名稱。若是出現錯誤信息,說明安裝出現問題。此時,必定要有一顆堅韌不拔,堅強勇敢的心。休息一下,理清思路,再來。

4.總結

至此,本文已接近尾聲。回顧配置該環境的過程,主要有:雙系統和顯卡驅動的安裝,CUDA 8.0和cuDNN v5.1的安裝,環境變量的配置,Bazel的安裝,TensorFlow的安裝配置,bazel build命令的執行等。正確配置以上步驟,便可成功地搭建所需環境。

如您遇到問題,能夠在下方留言或發消息與博主交流。 ^_^

一本正經的整理了這麼多內容,下面要放縱一下本身,開始自由發揮了。 ^_^ 說實話,第一次配置這個環境很消耗內力(哥哥我也是有武功的),會遇到不少坑,會遇到各類網上都搜不到的紅色的Errors。也不知道從何時開始,甚至有些喜歡黃色的Warning了,由於再怎麼說它也不是Errors啊。固然綠色的Info是最好的了,表明安裝successfully。

在這幾天期間,無數次崩潰、無語、想抓狂,感受一切亂如牛毛。大早上醒來就開始想流程走到了哪一步了,多是哪裏的問題,大晚上睡不着覺還在想。 (哭笑哭笑) 看官網教程看的眼快se’ia了,放眼望去,世界一片朦朧。因此準備花三萬塊錢治好個人近視眼,只爲大老遠就能看見你和你say hello(此句來自個不肯戴眼鏡的近視眼)。哈哈哈,越扯越遠,偏離主題了啊,停住停住,趕快停住(偷笑偷笑)……

所幸,最後終於成功地配置了這個環境,也學到了許多新知識。一句話,堅持不懈,堅韌不拔,堅決信念,自強不息。

5.致謝

首先,感謝實驗室各位老師在精神、物質以及其餘方面的支持,特別是王老師(抱拳抱拳)。
王老師曾說過:人生啊,確定是彩色的。不要總羨慕別人。無論你上的是清華北大,仍是普通學校,都要有計劃,不要成天不知道本身該幹什麼。首先要克服的就是本身,不嫉妒別人,也不嘲笑別人,有一個開闊的心胸,大學生要有這樣的品質。
你看你看,多有詩意,多有氛圍。(鼓掌鼓掌熱烈鼓掌)

其次,感謝師兄師姐們,很想念大家。此一別,不知什麼時候再見。往往想起,不由淚盈滿眶,淚溼滿襟,淚眼盈眶。(大哭)願安好……
哎哎哎,別入戲太深啊,鏡頭快轉回來。哈哈哈。發現我表演能力較強,其實我很想往這方面發展的。
另外,師兄師姐們,我霸佔了大家全部人的位置(壞笑),有用做午休的位置,有用做吃飯的位置,有用做裝機的位置,有用做放配件的位置……哈哈,不過等到新學期,新師弟師妹們一來,個人自由空間就沒了。

再次,感謝實驗室的所用同窗們,感謝你們在此期間的包容和關懷。特別是yiqi師弟,沒有yiqi師弟和個人互幫互助,就不會這麼順利地配置好所需的軟硬件環境。

最後,感謝本身,感謝本身的堅持和努力。仍是那句話:堅持不懈,堅韌不拔,堅決信念,自強不息。

6.參考文獻

1.https://www.tensorflow.org/install/install_sources
2.http://docs.nvidia.com/cuda/cuda-installation-guide-linux/
3.http://blog.csdn.net/zhaoyu106/article/details/52793183/
4.https://developer.nvidia.com/cudnn
5.http://www.cnblogs.com/xujianqing/p/6142963.html
6.https://docs.bazel.build/versions/master/install-ubuntu.html
7.https://www.tensorflow.org/tutorials/using_gpu

(更新於20170816)

相關文章
相關標籤/搜索