阿里ECS GPU機型如何安裝驅動(系統:CentOS7.3 GPU:Nvidia P100)

1、配置DNS以及yum
一、配置DNSlinux

[root@gpu-test-01 ~]# vim /etc/resolv.conf 
nameserver 223.5.5.5
nameserver 114.114.114.114
options timeout:2 attempts:3 rotate single-request-reopen
說明:我這裏配置了兩個外部DNS 223.5.5.5以及114.114.114.114
[root@gpu-test-01 ~]# chattr +i /etc/reslov.conf

二、配置yumvim

[root@gpu-test-01 ~]# cd /etc/yum.repos.d/
[root@gpu-test-01 yum.repos.d]# rm -rf ./*
[root@gpu-test-01 yum.repos.d]# mv /etc/yum.repos.d/* /tmp
[root@gpu-test-01 yum.repos.d]# wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo
[root@gpu-test-01 yum.repos.d]# wget -O /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo
[root@gpu-test-01 yum.repos.d]# wget https://us.download.nvidia.cn/tesla/418.67/nvidia-diag-driver-local-repo-rhel7-418.67-1.0-1.x86_64.rpm
[root@gpu-test-01 yum.repos.d]# yum install nvidia-diag-driver-local-repo-rhel7-418.67-1.0-1.x86_64.rpm -y
[root@gpu-test-01 yum.repos.d]# mv nvidia-diag-driver-local-repo-rhel7-418.67-1.0-1.x86_64.rpm /tmp/

2、下載驅動包centos

一、下載P100/P4驅動:ide

[root@gpu-test-01 ~]# wget http://us.download.nvidia.com/tesla/396.44/NVIDIA-Linux-x86_64-396.44.run

二、下載內核開發包:ui

[root@gpu-test-01 ~]# wget https://buildlogs.centos.org/c7.1611.u/kernel/20170620132051/3.10.0-514.21.2.el7.x86_64/kernel-devel-3.10.0-514.21.2.el7.x86_64.rpm

三、下載cuda包:(若是使用yum來裝cuda-drivers,這一步也能夠忽略)code

[root@gpu-test-01 ~]# wget https://developer.nvidia.com/compute/cuda/9.1/Prod/local_installers/cuda_9.1.85_387.26_linux

3、配置信息server

下載並安裝kernel對應版本的kernel-devel和kernel-header包開發

[root@gpu-test-01 ~]# rpm -ivh kernel-devel-3.10.0-514.21.2.el7.x86_64.rpm 
Preparing...                          ################################# [100%]
Updating / installing...
   1:kernel-devel-3.10.0-514.21.2.el7 ################################# [100%]
[root@gpu-test-01 ~]# sudo rpm -qa | grep $(uname -r)
kernel-headers-3.10.0-514.21.2.el7.x86_64
kernel-3.10.0-514.21.2.el7.x86_64
kernel-devel-3.10.0-514.21.2.el7.x86_64

說明:kernel-devel和kernel版本不一致會致使在安裝driver rpm過程當中driver編譯出錯。您能夠在實例裏運行rpm –qa | grep kernel檢測版本是否一致。確認版本後,再從新安裝驅動。get

[root@gpu-test-01 ~]# sh NVIDIA-Linux-x86_64-396.44.run
按照引導一路下一步:

驗證下是否安裝成功:it

[root@gpu-test-01 ~]# nvidia-smi 
Sat Jun 22 18:39:14 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.44                 Driver Version: 396.44                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  Off  | 00000000:00:08.0 Off |                    0 |
| N/A   33C    P0    27W / 250W |      0MiB / 16280MiB |      4%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

到此驅動已經安裝完成。

相關文章
相關標籤/搜索