ubuntu16.04git
cuda10.0github
2080Ti顯卡shell
用戶先clone代碼:ubuntu
git clone https://github.com/torch/distro.git ~/torch --recursive
一般下載很慢,--recursive參數下載的submodule會失敗,須要多執行幾回以下命令:vim
cd torch git submodule update --init --recursive
1.torch源碼中明確指定的依賴
須要sudo權限安裝,若是當前帳戶不是sudo帳戶也不是root帳戶則須要聯繫你的管理員來安裝:bash
bash install-deps;
2.torch源碼沒說,但實測下來須要的依賴函數
使用原生的lua5.2而不是luajit,須要apt裝一下lua:ui
sudo apt install lua-5.2
配置cuda/cudnn:系統管理員先前已經安裝了cuda-8.0, cuda-9.0, cuda-10.0到/usr/local/cuda-10.0等目錄,在~/.bashrc中配置PATH和LD_LIBRARY_PATH便可。lua
配置CMake,須要高版本cmake,ubuntu16.04用apt裝的cmake3.5.1版本太老,FindCUDA.cmake相關有問題。.net
手動安裝了cmake-3.15-rc1
拷貝CMake-3.15-rc1安裝路徑下的Modules目錄到~/torch/cmake/3.15/
拷貝~/torch/cmake/3.6/CMakeLists.txt到~/torch/cmake/3.15目錄
compute_75
算力可是編譯失敗,須要配置cuda10。編輯~/.bashrc添加:export CUDA_HOME=/usr/local/cuda-10.0 export PATH=/usr/local/cmake-3.15/bin:/usr/local/cuda-10.0/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64 export TORCH_NVCC_FLAGS="-Xcompiler -D__CUDA_NO_HALF_OPERATORS__"
git clone https://github.com/soumith/cudnn.torch.git -b R7 && cd cudnn.torch && luarocks make cudnn-scm-1.rockspec
(from: https://github.com/soumith/cudnn.torch/issues/383)
而後~/.bashrc配置:
export CUDNN_PATH="/usr/local/cuda-10.0/lib64/libcudnn.so.7"
3.torch源碼修改
幾個坑:
具體包括:
由於torch默認不支持cuda10.0,須要修改extra/cutorch/lib/THC/cmake/select_compute_arch.cmake
:
找到
list(APPEND CUDA_COMMON_GPU_ARCHITECTURES "6.0" "6.1" "6.1+PTX"
改成
list(APPEND CUDA_COMMON_GPU_ARCHITECTURES "6.0" "6.1" "6.1+PTX" "7.5")
找到
if(nvcc_res EQUAL 0) # only keep the last line of nvcc_out STRING(REGEX REPLACE ";" "\\\\;" nvcc_out "${nvcc_out}") STRING(REGEX REPLACE "\n" ";" nvcc_out "${nvcc_out}") list(GET nvcc_out -1 nvcc_out) string(REPLACE "2.1" "2.1(2.0)" nvcc_out "${nvcc_out}") set(CUDA_GPU_DETECT_OUTPUT ${nvcc_out} CACHE INTERNAL "Returned GPU architetures from detect_gpus tool" FORCE) endif()
修改成:
#if(nvcc_res EQUAL 0) # # only keep the last line of nvcc_out # STRING(REGEX REPLACE ";" "\\\\;" nvcc_out "${nvcc_out}") # STRING(REGEX REPLACE "\n" ";" nvcc_out "${nvcc_out}") # list(GET nvcc_out -1 nvcc_out) # string(REPLACE "2.1" "2.1(2.0)" nvcc_out "${nvcc_out}") # set(CUDA_GPU_DETECT_OUTPUT ${nvcc_out} CACHE INTERNAL "Returned GPU architetures from detect_gpus tool" FORCE) #endif() set(__nvcc_out "7.5")
修改torch/install.sh,把裏面全部3.6改爲3.15。
去掉FP16功能相關的宏,由於編譯會失敗。方法:
ag 'CUDA_HAS_FP16'
找到
extra/cutorch/lib/THC/CMakeLists.txt extra/cutorch/CMakeLists.txt
這兩個文件,裏面FLAGS去掉CUDA_HAS_FP16相關功能。
extra/cutorch/lib/THC/THCHalf.h,去掉#define CUDA_HAS_FP16 1
extra/cutorch/lib/THC/THCTensorMode.cuh,找到帶
extra/cutorch/lib/THC/THCGeneral.c,文件最後面兩個函數half2float和float2half,用#ifdef CUDA_HAS_FP16 #endif包裹
error: cannot overload functions distinguished by return type alone
須要添加nvcc的flags,vim ~/torch/extra/cutorch/lib/THC/CMakeLists.txt +65
,添加:
-Xcompiler -D__CORRECT_ISO_CPP11_MATH_H_PROTO
error: more than one operator "==" matches these operands
緣由是cuda和torch的頭文件都提供了相同的重載運算符,編譯器不知道用哪個。輸入下面shell命令禁止使用cuda的頭文件編譯torch便可:
export TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__"
而後從新執行torch的編譯安裝
(from: https://blog.csdn.net/u013066730/article/details/80936627)
TORCH_LUA_VERSION=LUA52 ./install.sh 2>&1 |
./clean.sh
編譯成功,截圖: