Ubuntu 18.04 基於NVIDIA 2080安裝TensorFlow-GPU 1.13.1node
https://www.tensorflow.org/install
https://www.tensorflow.org/install/gpupython
Ubuntu16.04 基於NVIDIA 1080Ti安裝TensorFlow-GPUlinux
首先要肯定各軟件之間的版本:
https://www.tensorflow.org/install/source
通過測試的構建配置 --> Linux --> 能夠分別查看 CPU 和 GPU 中各版本的對應關係:shell
主要看Tensorflow
是否適配CUDA
版本,其次是CUDA
的版本選擇,推薦9.0
或者10.0
,而後再根據CUDA
版本選擇顯卡驅動
和cudnn
,
安裝版本選擇時不要安裝最新版,往低降一兩個穩定版,注意相應軟件之間的兼容性;markdown
netc@gpu-2:~$ nvidia-smi Mon Mar 25 23:16:33 2019 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 410.48 Driver Version: 410.48 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce RTX 2080 Off | 00000000:03:00.0 Off | N/A | | 24% 40C P0 1W / 225W | 0MiB / 7949MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ netc@gpu-2:~$
netc@gpu-2:/data/tools/GeForce-RTX-2080$ sudo sh cuda_10.0.130_410.48_linux.run ----------------- Do you accept the previously read EULA? accept/decline/quit: accept Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 410.48? (y)es/(n)o/(q)uit: y Do you want to install the OpenGL libraries? (y)es/(n)o/(q)uit [ default is yes ]: n Do you want to run nvidia-xconfig? This will update the system X configuration file so that the NVIDIA X driver is used. The pre-existing X configuration file will be backed up. This option should not be used on systems that require a custom X configuration, such as systems with multiple GPU vendors. (y)es/(n)o/(q)uit [ default is no ]: n Install the CUDA 10.0 Toolkit? (y)es/(n)o/(q)uit: y Enter Toolkit Location [ default is /usr/local/cuda-10.0 ]: Do you want to install a symbolic link at /usr/local/cuda? (y)es/(n)o/(q)uit: y Install the CUDA 10.0 Samples? (y)es/(n)o/(q)uit: y Enter CUDA Samples Location [ default is /home/netc ]: Installing the NVIDIA display driver... Installing the CUDA Toolkit in /usr/local/cuda-10.0 ... Installing the CUDA Samples in /home/netc ... Copying samples to /home/netc/NVIDIA_CUDA-10.0_Samples now... Finished copying samples. =========== = Summary = =========== Driver: Installed Toolkit: Installed in /usr/local/cuda-10.0 Samples: Installed in /home/netc Please make sure that - PATH includes /usr/local/cuda-10.0/bin - LD_LIBRARY_PATH includes /usr/local/cuda-10.0/lib64, or, add /usr/local/cuda-10.0/lib64 to /etc/ld.so.conf and run ldconfig as root To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-10.0/bin To uninstall the NVIDIA Driver, run nvidia-uninstall Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-10.0/doc/pdf for detailed information on setting up CUDA. Logfile is /tmp/cuda_install_13131.log netc@gpu-2:/data/tools/GeForce-RTX-2080$
netc@gpu-2:/data/tools/GeForce-RTX-2080$ nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2018 NVIDIA Corporation Built on Sat_Aug_25_21:08:01_CDT_2018 Cuda compilation tools, release 10.0, V10.0.130 netc@gpu-2:/data/tools/GeForce-RTX-2080$
netc@gpu-2:~/cudnn_samples_v7/mnistCUDNN$ sudo pip3 install --upgrade pip Successfully installed pip-19.0.3
netc@gpu-2:~/cudnn_samples_v7/mnistCUDNN$ sudo pip3 install --index-url https://mirrors.aliyun.com/pypi/simple tensorflow-gpu Successfully installed absl-py-0.7.1 astor-0.7.1 gast-0.2.2 grpcio-1.19.0 h5py-2.9.0 keras-applications-1.0.7 keras-preprocessing-1.0.9 markdown-3.0.1 mock-2.0.0 numpy-1.16.2 pbr-5.1.3 protobuf-3.7.0 tensorboard-1.13.1 tensorflow-estimator-1.13.0 tensorflow-gpu-1.13.1 termcolor-1.1.0 werkzeug-0.15.1
netc@gpu-2:~/cudnn_samples_v7/mnistCUDNN$ python3 Python 3.6.7 (default, Oct 22 2018, 11:32:17) [GCC 8.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import tensorflow as tf >>> hello = tf.constant('Hello, TensorFlow!') >>> sess = tf.Session() 2019-03-25 23:32:23.967770: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-03-25 23:32:23.968691: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x2ce8960 executing computations on platform CUDA. Devices: 2019-03-25 23:32:23.968749: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): GeForce RTX 2080, Compute Capability 7.5 2019-03-25 23:32:23.992261: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2200065000 Hz 2019-03-25 23:32:23.994027: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x33acc10 executing computations on platform Host. Devices: 2019-03-25 23:32:23.994073: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-03-25 23:32:23.994507: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: GeForce RTX 2080 major: 7 minor: 5 memoryClockRate(GHz): 1.8 pciBusID: 0000:03:00.0 totalMemory: 7.76GiB freeMemory: 7.62GiB 2019-03-25 23:32:23.994558: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-03-25 23:32:23.995840: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-03-25 23:32:23.995878: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-03-25 23:32:23.995900: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-03-25 23:32:23.996310: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7413 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080, pci bus id: 0000:03:00.0, compute capability: 7.5) >>> print(sess.run(hello)) b'Hello, TensorFlow!'
ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory
緣由:
tensorflow版本與CUDA的版本不對應,tensorflow須要的cuda爲10.0;
對應關係:https://tensorflow.google.cn/install/sourceapp
查看cuda版本ide
cat /usr/local/cuda/version.txt
查看cudnn版本測試
cat /usr/include/cudnn.h | grep CUDNN_MAJOR -A 2