問題記錄 | deepin15.10重裝nvidia驅動及cuda

時間 2020-04-24

標籤問題記錄 deepin15.10 deepin 重裝 nvidia 驅動 cuda 简体版

原文原文鏈接

問題描述：

nvidia-smi也有顯示，顯卡驅動是在的，並且nvcc顯示出來的cuda版本9.0也沒錯，不是9.1。不知道問題所在，索性重裝所有。node

sudo tee /proc/acpi/bbswitch <<<ON
# ON
nvidia-smi

顯示以下：python

Tue May 28 22:21:07 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.67                 Driver Version: 390.67                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 950M    Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   50C    P0    N/A /  N/A |      0MiB /  2004MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

nvcc --version

顯示以下：linux

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

lspci | grep -i nvidia

顯示以下：shell

01:00.0 3D controller: NVIDIA Corporation GM107M [GeForce GTX 950M] (rev a2)

檢查pytorch調用cuda是否正常：vim

python -c 'import torch; print(torch.cuda.is_available())'

顯示以下：bash

False

卸載cuda

sudo /usr/local/cuda-9.0/bin/uninstall_cuda_9.0.pl  
#這裏以後只剩下cudnn的東西，也能夠徹底刪了。
sudo rm -rf /usr/local/cuda-9.0/

卸載nvidia驅動及大黃蜂bunmblebee

sudo apt-get remove --purge nvidia-cuda-dev nvidia-cuda-toolkit nvidia-nsight nvidia-visual-profiler
sudo apt autoremove --purge bumblebee-nvidia nvidia-driver nvidia-settings

安裝顯卡驅動和大黃蜂bumblebee

sudo apt-get install nvidia-smi
sudo apt-get install bumblebee-nvidia nvidia-driver nvidia-settings

安裝顯卡驅動測試程序

sudo apt-get install mesa-utils

顯示N卡相關信息：測試

optirun glxinfo|grep NVIDIA

運行測試程序ui

optirun glxgears -info

成功調用顯卡驅動，信息以下：this

GL_RENDERER   = GeForce GTX 950M/PCIe/SSE2
GL_VERSION    = 4.6.0 NVIDIA 390.67
GL_VENDOR     = NVIDIA Corporation

安裝cuda

https://developer.nvidia.com/cuda-90-download-archive?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1604&target_type=runfilelocal
下載runfilecode

sudo ./cuda_9.0.176_384.81_linux.run

安裝過程只有這個選no

Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 384.81?
(y)es/(n)o/(q)uit: n

下載安裝cudnn

<https://developer.nvidia.com/rdp/cudnn-archive>

登陸下載對應版本我是選擇了

cudnn-9.0-linux-x64-v7.5.0.56

這個版本的

把對應的額外的cudnn庫放入cuda對應的位置：

sudo cp lib64/* /usr/local/cuda/lib64/
sudo cp include/* /usr/local/cuda/include/

而後檢查環境變量並開啓默認N卡

# 檢查LD_LIABRARY_PATH和PATH
sudo vim ~/.bashrc

# 用大黃蜂開啓默認N卡
sudo tee /proc/acpi/bbswitch<<<ON

再次檢查pytorch是否能調用cuda

python -c "import torch;print(torch.cuda.is_available())"

顯示以下：

True

檢查tensorflow是否正常調用gpu

python3 -c "import tensorflow as tf;print(tf.test.is_gpu_available());print(tf.test.gpu_device_name())"

顯示以下：

2019-05-28 22:52:25.862539: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-05-28 22:52:26.319239: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-05-28 22:52:26.319674: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
name: GeForce GTX 950M major: 5 minor: 0 memoryClockRate(GHz): 1.124
pciBusID: 0000:01:00.0
totalMemory: 1.96GiB freeMemory: 1.92GiB
2019-05-28 22:52:26.319696: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0

都正常了，沒有比我這更復雜了吧，卸了重裝，有卸載過程和安裝過程。

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。