重裝好幾回了!沒有人比我更懂重裝(不是
如今我默認你們都才裝好ubuntu18.04,就是幹!請注意!我這裏是經過安裝cuda來安裝顯卡驅動!想要單獨安裝顯卡驅動(好比英偉達官網下載run文件或者經過ubuntu-drivers devices來安裝系統推薦的驅動版本)的同窗請看其餘教程!可是(◔◡◔)重裝屢次的我以爲,反正都要裝cuda,因此經過cuda安裝nvidia是最簡單不過啦~
注:sudo是獲取臨時root權限,因此咱們開局直接進root
如今咱們來看下大體流程:
cuda(順便安裝顯卡驅動)–> cudnn --> anaconda3 -->搭建環境–>安裝tensorflow-gpu
python
-
換源(加快下載速度
使用root權限:
sudo -s
備份源碼:
cp /etc/apt/sources.list /etc/apt/sources.list.bak
替換源列表內容:
gedit /etc/apt/sources.list
打開list後,將如下內容替換掉原來的:
linux# 默認註釋了源碼鏡像以提升 apt update 速度,若有須要可自行取消註釋 deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic main restricted universe multiverse # deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic main restricted universe multiverse deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic-updates main restricted universe multiverse # deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic-updates main restricted universe multiverse deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic-backports main restricted universe multiverse # deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic-backports main restricted universe multiverse deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic-security main restricted universe multiverse # deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic-security main restricted universe multiverse # 預發佈軟件源,不建議啓用 # deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic-proposed main restricted universe multiverse # deb-src https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ bionic-proposed main restricted universe multiverse
記得點保存
更新列表:
apt-get update
OK,換源成功!
ubuntu -
禁用系統自帶的顯卡驅動
打開系統禁用列表:
gedit /etc/modprobe.d/blacklist.conf
經過添加如下代碼,將nouveau拉入黑名單!哼,咱們不和它玩兒!:
blacklist nouveau
options nouveau modset=0
而後更新下咱們修改的內容,讓它生效:
update-initramfs -u
重啓:
reboot
再看看這玩意兒還敢出來不:
lsmod | grep nouveau
OK,沒有任何輸出(它怕了 它怕了哈哈
bash -
安裝相關依賴
安裝gcc(記得進入root模式哦:
apt install build-essential
ionic -
安裝cuda(安裝它對應的顯卡驅動
寶貝們乖乖去官網下載哦~
—>指路http://developer.nvidia.com/cuda-downloads
到安裝文件目錄下運行.run文件(萌新小妙招~輸入cd再空一格,將存放run文件的文件夾拖入終端,再回車,就能夠進入安裝目錄啦~而後輸入ls還能夠查看目錄下的文件哦):
sh cuda_10.0.130_410.48_linux.run
舒適提示:記得替換爲本身的cuda文件名
安裝過程當中,輸入accept
若是以前沒有裝顯卡驅動,那麼在安裝cuda的過程當中能夠在這裏安裝哦(是我本人了
測試Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 410.48? (y)es/(n)o/(q)uit: y
不要選擇openGL!ui
Do you want to install the OpenGL libraries? (y)es/(n)o/(q)uit [ default is yes ]: n
關於這個服務(可y可n:url
Do you want to run nvidia-xconfig? This will update the system X configuration file so that the NVIDIA X driver is used. The pre-existing X configuration file will be backed up. This option should not be used on systems that require a custom X configuration, such as systems with multiple GPU vendors. (y)es/(n)o/(q)uit [ default is no ]: n
後面的問題都y或者enter默認,來看看結果:spa
=========== = Summary = =========== Driver: Installed Toolkit: Installed in /usr/local/cuda-10.0 Samples: Installed in /home/yy, but missing recommended libraries
安裝完成後,須要添加環境變量:
gedit ~/.bashrc
在文件最後加入如下代碼(記得改爲本身的cuda版本哦
命令行export PATH="/usr/local/cuda-10.0/bin:$PATH" export LD_LIBRARY_PATH="/usr/lcoal/cuda-10.0/lib64:$LD_LIBRARY_PATH"
添加並保存,將文件生效:
source ~/.bashrc
最後咱們查看下cuda的版本信息以及nvidia驅動信息:
nvcc -V
cuda的版本信息以下:nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2018 NVIDIA Corporation Built on Sat_Aug_25_21:08:01_CDT_2018 Cuda compilation tools, release 10.0, V10.0.130
nvidia驅動信息查詢:
nvidia-smi
查詢結果以下:Wed Aug 12 15:59:46 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 410.48 Driver Version: 410.48 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Graphics Device Off | 00000000:01:00.0 Off | N/A | | N/A 41C P0 N/A / N/A | 0MiB / 3020MiB | 1% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
-
安裝cudnn
去官網下載壓縮包
—>指路https://developer.nvidia.com/rdp/cudnn-archive
下載好後,咱們來解壓它(此時壓縮包在你的下載目錄下:
首先進入下載目錄,而後開始解壓:
tar -zxvf cudnn-10.0-linux-x64-v7.4.2.24.tgz
解壓結果以下:cuda/include/cudnn.h cuda/NVIDIA_SLA_cuDNN_Support.txt cuda/lib64/libcudnn.so cuda/lib64/libcudnn.so.7 cuda/lib64/libcudnn.so.7.4.2 cuda/lib64/libcudnn_static.a
而後咱們須要把cudnn移動到cuda中:
cp -P cuda/lib64/libcudnn* /usr/local/cuda-10.0/lib64/
cp cuda/include/cudnn.h /usr/local/cuda-10.0/include/
爲全部用戶設置讀取權限(記得改爲你本身的版本號!
chmod a+r /usr/local/cuda-10.0/include/cudnn.h
chmod a+r /usr/local/cuda-10.0/lib64/libcudnn*
查看cudnn版本信息:
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
結果以下(個人是7.4.2:#define CUDNN_MAJOR 7 #define CUDNN_MINOR 4 #define CUDNN_PATCHLEVEL 2 -- #define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
-
安裝anaconda3
沒有下載的寶貝,去清華源(速度賊快
請看路—>https://mirrors.tuna.tsinghua.edu.cn/help/anaconda/
進入下載文件的目錄中運行:
bash Anaconda3-2020.02-Linux-x86_64.sh
爲anaconda加入環境變量:
gedit ~/.bashrc
在bashrc的最後加入(記得修改成本身的用戶名:export PATH="/home/yy/anaconda3/bin:$PATH"
最後別忘更新下:
source ~/.bashrc
-
搭建環境
確保本身在root模式下!建立環境(tf是我本身命名的,你們根據本身喜愛改~:
conda create -n tf python=3.7
激活剛剛咱們建立的環境:
source activate tf
激活後,咱們的命令行開頭就有環境名啦~說明此時咱們正處於tf這個環境中:root@yy:~# source activate tf (tf) root@yy:~#
-
安裝tensorflow-gpu
在激活環境中輸入(直接用pip太慢了,因此我後面加上了清華源連接:
pip install tensorflow-gpu==1.13.1 -i https://pypi.tuna.tsinghua.edu.cn/simple
網很差的時候可能就會全紅,就會像下面同樣報錯read timed out,不要緊多安幾回,總有網順的時候:File "/home/yy/anaconda3/envs/tf/lib/python3.7/site-packages/pip/_vendor/urllib3/response.py", line 576, in stream data = self.read(amt=amt, decode_content=decode_content) File "/home/yy/anaconda3/envs/tf/lib/python3.7/site-packages/pip/_vendor/urllib3/response.py", line 541, in read raise IncompleteRead(self._fp_bytes_read, self.length_remaining) File "/home/yy/anaconda3/envs/tf/lib/python3.7/contextlib.py", line 130, in __exit__ self.gen.throw(type, value, traceback) File "/home/yy/anaconda3/envs/tf/lib/python3.7/site-packages/pip/_vendor/urllib3/response.py", line 442, in _error_catcher raise ReadTimeoutError(self._pool, None, "Read timed out.") pip._vendor.urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='pypi.tuna.tsinghua.edu.cn', port=443): Read timed out.
安裝完畢後,進入
python
再輸入import tensorflow as tf
測試下:(tf) root@yy:~# python Python 3.7.7 (default, May 7 2020, 21:25:33) [GCC 7.3.0] :: Anaconda, Inc. on linux Type "help", "copyright", "credits" or "license" for more information. >>> import tensorflow as tf Traceback (most recent call last): File "/home/yy/anaconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module> from tensorflow.python.pywrap_tensorflow_internal import * File "/home/yy/anaconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module> _pywrap_tensorflow_internal = swig_import_helper() File "/home/yy/anaconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description) File "/home/yy/anaconda3/envs/tf/lib/python3.7/imp.py", line 242, in load_module return load_dynamic(name, filename, file) File "/home/yy/anaconda3/envs/tf/lib/python3.7/imp.py", line 342, in load_dynamic return _load(spec) ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory
哇噢,報錯了耶,不要捉雞!先輸入quit()退出python,
再在命令行輸入:
ldconfig /usr/local/cuda-10.0/lib64
結果以下:>>> quit() (tf) root@yy:~# ldconfig /usr/local/cuda-10.0/lib64 (tf) root@yy:~# python Python 3.7.7 (default, May 7 2020, 21:25:33) [GCC 7.3.0] :: Anaconda, Inc. on linux Type "help", "copyright", "credits" or "license" for more information. >>> import tensorflow as tf >>>
呼~報錯解除!此時咱們查看下numpy的版本:
>>> import numpy >>> numpy.__version__ '1.19.1'
好像版本過高啦,咱們下降下版本:
pip install -U numpy==1.16.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
到這裏就所有結束啦~
我跑下pointnet++康康
**作個小測試,只跑一個epoch
parser.add_argument('--num_point', type=int, default=1024, help='Point Number [default: 1024]') parser.add_argument('--max_epoch', type=int, default=1, help='Epoch to run [default: 251]') parser.add_argument('--batch_size', type=int, default=8, help='Batch Size during training [default: 16]')
very good!徹底莫得問題!
(tf) root@yy:/media/yy/Data/ipython_jupyter/pointnet2123# python train.py **** EPOCH 000 **** 2020-08-12 17:13:44.277590 ---- batch: 050 ---- mean loss: 3.805058 accuracy: 0.127500 ---- batch: 100 ---- mean loss: 3.299858 accuracy: 0.205000 .......這裏太多了,省略掉......... ---- batch: 1200 ---- mean loss: 1.797384 accuracy: 0.492500 2020-08-12 17:18:01.698818 ---- EPOCH 000 EVALUATION ---- eval mean loss: 1.345066 eval accuracy: 0.606969 eval avg class acc: 0.502087 Model saved in file: log/model.ckpt