Tensorflow 1.8 with GPU on macOS High Sierra 10.13.6

時間 2019-12-12

標籤 tensorflow 1.8 gpu macos high sierra 10.13.6 简体版

原文原文鏈接

老徐
Thursday, 26 July 2018html

Tensorflow 1.8 with GPU on macOS High Sierra 10.13.6

Tensorflow團隊宣佈中止支持1.2之後mac版的tensorflow gpu版本。
所以沒辦法直接安裝只能本身用源碼編譯了。node

Tensorflow 1.8 with CUDA on macOS High Sierra 10.13.6python

CPU 運行 Tensorflow 感受不夠快，想試試 GPU 加速！正好本身有一塊支持CUDA的顯卡。c++

版本

重要的事情說三遍：相關的驅動以及編譯環境工具必須選擇配套的版本，不然編譯不成功！！！

版本：git

TensorFlow r1.8 source code，最新的1.9貌似還有問題
macOS 10.13.6，這個應該關係不大
顯卡驅動 387.10.10.10.40.105，支持的 CUDA 9.1
CUDA 9.2，這個是 CUDA 驅動，能夠高於上面的顯卡支持的CUDA 版本，也就是 CUDA Driver 9.2
cuDNN 7.2，與上面的CUDA對應，直接安裝最新版
XCode 8.2.1，這個是重點，請降級到這個版本，不然會編譯出錯或運行時出錯 Segmentation Fault
bazel 0.14.0，這個是重點，請降級到這個版本
Python 3.6，這個是重點，不要使用最新版的 Python 3.7 截止目前編譯會有問題

準備

須要下載（某些文件較大須要下載，請在繼續閱讀前先開始下載，節省時間）：github

Xcode 8.2.1
https://developer.apple.com/d...macos

Xcode_8.2.1.xipxcode
bazel-0.14.0
https://github.com/bazelbuild...安全
CUDA Toolkit 9.2
https://developer.nvidia.com/...bash
cuDNN v7.2.1
https://developer.nvidia.com/...

Tensorflow source code，333M

$ git clone https://github.com/tensorflow/tensorflow -b r1.8

Python 3.6.5_1

目前裝的是3.7，降級吧

$ brew unlink python
$ brew install https://raw.githubusercontent.com/Homebrew/homebrew-core/f2a764ef944b1080be64bd88dca9a1d80130c558/Formula/python.rb
$ pip3 install --upgrade pip setuptools wheel
# $ brew switch python 3.6.5_1

不要使用 Python 3.7.0，不然編譯會有問題

編譯完後能夠切換回去

$ brew switch python 3.7.0

Xcode 8.2.1

須要降級 Xcode 到 8.2.1

去apple開發者官網下載包，https://developer.apple.com/d...

解壓後複製到/Applications/Xcode.app，而後進行指向

$ sudo xcode-select -s /Applications/Xcode.app

確認安裝是否準確

$ cc -v
Apple LLVM version 8.0.0 (clang-800.0.42.1)
Target: x86_64-apple-darwin17.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

Command Line Tools，cc 即 clang
這個很重要，不然雖然編譯成功可是跑複雜一點項目會出現 Segmentation Fault

環境變量

因爲用到 CUDA 的 lib 不是在系統目錄下，所以須要設置環境變量來指向
在 Mac 下 LD_LIBRARY_PATH 無效，使用的是 DYLD_LIBRARY_PATH

配置環境變量編輯 ~/.bash_profile或 ~/.zshrc

export CUDA_HOME=/usr/local/cuda
export DYLD_LIBRARY_PATH=$CUDA_HOME/lib:$CUDA_HOME/extras/CUPTI/lib
export PATH=$CUDA_HOME/bin:$PATH

安裝 CUDA

CUDA是NVIDIA推出的用於自家GPU的 並行計算框架，也就是說CUDA只能在NVIDIA的GPU上運行， 並且只有當要解決的計算問題是能夠大量並行計算的時候才能發揮CUDA的做用。

第一步：確認顯卡是否支持 GPU 計算

在這裏找到你的顯卡型號，看是否支持
https://developer.nvidia.com/...

個人顯卡是 NVIDIA GeForce GTX 750 Ti:

GPU	Compute Capability
GeForce GTX 750 Ti	5.0

第二步：安裝 CUDA

若是安裝了其餘版本的CUDA，須要卸載請執行

$ sudo /usr/local/bin/uninstall_cuda_drv.pl
$ sudo /usr/local/cuda/bin/uninstall_cuda_9.1.pl
$ sudo rm -rf /Developer/NVIDIA/CUDA-9.1/
$ sudo rm -rf /Library/Frameworks/CUDA.framework
$ sudo rm -rf /usr/local/cuda/

爲了萬無一失，最好仍是重啓一下

首先須要說明的是：CUDA Driver 與 GPU Driver的版本必須一致，才能讓CUDA找到顯卡。

GPU Driver 即顯卡驅動
- http://www.macvidcards.com/dr...
- 個人 macOS 是 10.13.6 對應的驅動已經安裝最新版 387.10.10.10.40.105
  
  https://www.nvidia.com/downlo...
```
Version:    387.10.10.10.40.105
Release Date:    2018.7.10
Operating System:    macOS High Sierra 10.13.6
CUDA Toolkit:    9.1
```

CUDA Driver

http://www.nvidia.com/object/...
單獨先安裝 CUDA Driver，能夠選擇最新版本，看他對顯卡驅動的支持

cudadriver_396.148_macos.dmg

New Release 396.148
CUDA driver update to support CUDA Toolkit 9.2, macOS 10.13.6 and NVIDIA display driver 387.10.10.10.40.105
Recommended CUDA version(s): CUDA 9.2
Supported macOS 10.13

CUDA Toolkit
- https://developer.nvidia.com/...
- 能夠選擇最新版本，這裏選擇 9.2
- cuda_9.2.148_mac.dmg、cuda_9.2.148.1_mac.dmg

安裝完成後檢查：

$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Tue_Jun_12_23:08:12_CDT_2018
Cuda compilation tools, release 9.2, V9.2.148

確認驅動是否已加載

$ kextstat | grep -i cuda.
  149    0 0xffffff7f838d3000 0x2000     0x2000     com.nvidia.CUDA (1.1.0) E13478CB-B251-3C0A-86E9-A6B56F528FE8 <4 1>

測試CUDA可否正常運行：

$ cd /usr/local/cuda/samples
$ sudo make -C 1_Utilities/deviceQuery
$ ./bin/x86_64/darwin/release/deviceQuery
./bin/x86_64/darwin/release/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 750 Ti"
  CUDA Driver Version / Runtime Version          9.2 / 9.2
  CUDA Capability Major/Minor version number:    5.0
  Total amount of global memory:                 2048 MBytes (2147155968 bytes)
  ( 5) Multiprocessors, (128) CUDA Cores/MP:     640 CUDA Cores
  GPU Max Clock rate:                            1254 MHz (1.25 GHz)
  Memory Clock rate:                             2700 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 2097152 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Supports Cooperative Kernel Launch:            No
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.2, CUDA Runtime Version = 9.2, NumDevs = 1
Result = PASS

若是最後顯示 Result = PASS，那麼CUDA就工做正常

若是出現下列錯誤

The version ('9.1') of the host compiler ('Apple clang') is not supported

說明 Xcode 版本太新了，要求降級 Xcode

第三步：安裝 cuDNN

cuDNN（CUDA Deep Neural Network library）：是NVIDIA打造的針對深度神經網絡的加速庫，是一個用於深層神經網絡的GPU加速庫。若是你要用GPU訓練模型，cuDNN不是必須的，可是通常會採用這個加速庫。

cuDNN

https://developer.nvidia.com/...
下載最新版 cuDNN v7.2.1 for CUDA 9.2
cudnn-9.2-osx-x64-v7.2.1.38.tgz

下好後直接把解壓縮合併到CUDA目錄/usr/local/cuda/下便可：

$ tar -xzvf cudnn-9.2-osx-x64-v7.2.1.38.tgz
$ sudo cp cuda/include/cudnn.h /usr/local/cuda/include
$ sudo cp cuda/lib/libcudnn* /usr/local/cuda/lib
$ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib/libcudnn*
$ rm -rf cuda

第四步：安裝 CUDA-Z

用來查看 CUDA 運行狀況

$ brew cask install cuda-z

而後就能夠從 Application 裏運行 CUDA-Z 來查看CUDA運行狀況了

編譯

若是有已經編譯好的版本，則能夠跳過本章直接到"安裝"部分

下面從源碼編譯 Tensorflow GPU 版本

CUDA準備

請參考前面部分

編譯環境準備

Python

$ python3 --version
Python 3.6.5

不要使用 Python 3.7.0，不然編譯會有問題

Python 依賴

$ pip3 install six numpy wheel

Coreutils，llvm，OpenMP

$ brew install coreutils llvm cliutils/apple/libomp

Bazel

須要注意，這裏必須是 0.14.0 版本，新或舊都能致使編譯失敗。下載0.14.0版本， bazel發佈頁

$ curl -O https://github.com/bazelbuild/bazel/releases/download/0.14.0/bazel-0.14.0-installer-darwin-x86_64.sh
$ chmod +x bazel-0.14.0-installer-darwin-x86_64.sh
$ ./bazel-0.14.0-installer-darwin-x86_64.sh
$ bazel version
Build label: 0.14.0

過低版本可能會致使找不到環境變量，從而 Library not loaded

檢查NVIDIA開發環境

$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Tue_Jun_12_23:08:12_CDT_2018
Cuda compilation tools, release 9.2, V9.2.148

檢查clang版本

$ cc -v
Apple LLVM version 8.0.0 (clang-800.0.42.1)
Target: x86_64-apple-darwin17.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

源碼準備

拉取 TensorFlow 源碼 release 1.8 分支並進行修改，使其與macOS兼容

這裏能夠直接下載修改好的源碼

$ curl -O https://raw.githubusercontent.com/SixQuant/tensorflow-macos-gpu/master/tensorflow-macos-gpu-r1.8-src.tar.gz

或者手工修改

$ git clone https://github.com/tensorflow/tensorflow -b r1.8
$ cd tensorflow
$ curl -O https://raw.githubusercontent.com/SixQuant/tensorflow-macos-gpu/master/patch/tensorflow-macos-gpu-r1.8.patch
$ git apply tensorflow-macos-gpu-r1.8.patch
$ curl -o third_party/nccl/nccl.h https://raw.githubusercontent.com/SixQuant/tensorflow-macos-gpu/master/patch/nccl.h

Build

配置

$ which python3
/usr/local/bin/python3

$ ./configure

Please specify the location of python. [Default is /usr/local/opt/python@2/bin/python2.7]: /usr/local/bin/python3

Found possible Python library paths:
  /usr/local/Cellar/python/3.6.5_1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages
Please input the desired Python library path to use.  Default is [/usr/local/Cellar/python/3.6.5_1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages]

Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: n
No Google Cloud Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: n
No Hadoop File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: n
No Amazon S3 File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Apache Kafka Platform support? [Y/n]: n
No Apache Kafka Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with XLA JIT support? [y/N]: n
No XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with GDR support? [y/N]: n
No GDR support will be enabled for TensorFlow.

Do you wish to build TensorFlow with VERBS support? [y/N]: n
No VERBS support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.

Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]: 9.2

Please specify the location where CUDA 9.1 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:

Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: 7.2

Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:

Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.5,5.2]3.0,3.5,5.0,5.2,6.0,6.1

Do you want to use clang as CUDA compiler? [y/N]:n
nvcc will be used as CUDA compiler.

Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:

Do you wish to build TensorFlow with MPI support? [y/N]:
No MPI support will be enabled for TensorFlow.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:

Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]:
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See tools/bazel.rc for more details.
    --config=mkl             # Build with MKL support.
    --config=monolithic      # Config for mostly static monolithic build.
Configuration finished

必定要輸入正確的版本

/usr/local/bin/python3

CUDA 9.2

cuDNN 7.2

compute capability 3.0,3.5,5.0,5.2,6.0,6.1 這個必定要去查你的顯卡支持的版本，能夠輸入多個

上面其實是生成了編譯配置文件 .tf_configure.bazelrc

開始編譯

$ bazel clean --expunge
$ bazel build --config=opt --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" --action_env PATH --action_env DYLD_LIBRARY_PATH //tensorflow/tools/pip_package:build_pip_package

編譯過程當中因爲網絡問題，可能會下載失敗，多重試幾回
若是bazel版本不對，可能會形成 DYLD_LIBRARY_PATH 沒有傳遞過去，從而Library not loaded

編譯說明

--config=opt 的意思應該是

build:opt --copt=-march=native
build:opt --host_copt=-march=native
build:opt --define with_default_optimizations=true

-march=native 表示使用當前CPU支持的優化指令來進行編譯

查看當前 CPU 支持的指令集

$ sysctl machdep.cpu.features
machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C

$ gcc -march=native -dM -E -x c++ /dev/null | egrep "AVX|SSE"

#define __AVX2__ 1
#define __AVX__ 1
#define __SSE2_MATH__ 1
#define __SSE2__ 1
#define __SSE3__ 1
#define __SSE4_1__ 1
#define __SSE4_2__ 1
#define __SSE_MATH__ 1
#define __SSE__ 1
#define __SSSE3__ 1

編譯錯誤 dyld: Library not loaded: @rpath/libcudart.9.2.dylib

ERROR: /Users/c/Downloads/tensorflow-macos-gpu-r1.8/src/tensorflow/python/BUILD:1590:1: Executing genrule //tensorflow/python:string_ops_pygenrule failed (Aborted): bash failed: error executing command /bin/bash bazel-out/host/genfiles/tensorflow/python/string_ops_pygenrule.genrule_script.sh
dyld: Library not loaded: @rpath/libcudart.9.2.dylib
  Referenced from: /private/var/tmp/_bazel_c/ea0f1e868907c49391ddb6d2fb9d5630/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/python/gen_string_ops_py_wrappers_cc
  Reason: image not found

是因爲 bazel 的 bug 致使環境變量 DYLD_LIBRARY_PATH 沒有傳遞過去

解決：安裝正確版本的 bazel

編譯錯誤 PyString_AsStringAndSize

external/protobuf_archive/python/google/protobuf/pyext/descriptor_pool.cc:169:7: error: assigning to 'char *' from incompatible type 'const char *'
  if (PyString_AsStringAndSize(arg, &name, &name_size) < 0) {

這是由於 Python3.7 對 protobuf_python 有 bug, 請換爲 Python3.6 後從新編譯
https://github.com/google/pro...

編譯時間長達1.5小時，請耐心等待

生成PIP安裝包

重編譯而且替換_nccl_ops.so

$ gcc -march=native -c -fPIC tensorflow/contrib/nccl/kernels/nccl_ops.cc -o _nccl_ops.o
$ gcc _nccl_ops.o -shared -o _nccl_ops.so
$ mv _nccl_ops.so bazel-out/darwin-py3-opt/bin/tensorflow/contrib/nccl/python/ops
$ rm _nccl_ops.o

打包

$ bazel-bin/tensorflow/tools/pip_package/build_pip_package ~/Downloads/

清理

$ bazel clean --expunge

安裝

$ pip3 uninstall tensorflow
$ pip3 install ~/Downloads/tensorflow-1.8.0-cp36-cp36m-macosx_10_13_x86_64.whl

也能夠直接經過http安裝

$ pip3 install https://github.com/SixQuant/tensorflow-macos-gpu/releases/download/v1.8.0/tensorflow-1.8.0-cp36-cp36m-macosx_10_13_x86_64.whl

若是是直接安裝，請必定要確認相關的版本是否和編譯的一致或更高

cudadriver_396.148_macos.dmg

cuda_9.2.148_mac.dmg

cuda_9.2.148.1_mac.dmg

cudnn-9.2-osx-x64-v7.2.1.38.tgz

確認

確認 Tensorflow GPU 是否工做正常

確認環境變量

確認Python代碼是否能夠讀取到正確的環境變量DYLD_LIBRARY_PATH

$ nano tensorflow-gpu-01-env.py

#!/usr/bin/env python

import os

print(os.environ["DYLD_LIBRARY_PATH"])

$ python3 tensorflow-gpu-01-env.py
/usr/local/cuda/lib:/usr/local/cuda/extras/CUPTI/lib

確認是否啓用了GPU

若是 TensorFlow 指令中兼有 CPU 和 GPU 實現，當該指令分配到設備時，GPU 設備有優先權。例如，若是 matmul 同時存在 CPU 和 GPU 核函數，在同時有 cpu:0 和 gpu:0 設備的系統中，gpu:0 會被選來運行 matmul。要找出您的指令和張量被分配到哪一個設備，請建立會話並將 log_device_placement 配置選項設爲 True。

$ nano tensorflow-gpu-02-hello.py

#!/usr/bin/env python

import tensorflow as tf

config = tf.ConfigProto()
config.log_device_placement = True

# Creates a graph.
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
with tf.Session(config=config) as sess:
    # Runs the op.
    print(sess.run(c))

$ python3 tensorflow-gpu-02-hello.py
2018-08-26 14:13:45.987276: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: GeForce GTX 750 Ti major: 5 minor: 0 memoryClockRate(GHz): 1.2545
pciBusID: 0000:01:00.0
totalMemory: 2.00GiB freeMemory: 706.66MiB
2018-08-26 14:13:45.987303: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-08-26 14:13:46.245132: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 426 MB memory) -> physical GPU (device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0, compute capability: 5.0)
Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0, compute capability: 5.0
2018-08-26 14:13:46.253938: I tensorflow/core/common_runtime/direct_session.cc:284] Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0, compute capability: 5.0

MatMul: (MatMul): /job:localhost/replica:0/task:0/device:GPU:0
2018-08-26 14:13:46.254406: I tensorflow/core/common_runtime/placer.cc:886] MatMul: (MatMul)/job:localhost/replica:0/task:0/device:GPU:0
b: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2018-08-26 14:13:46.254415: I tensorflow/core/common_runtime/placer.cc:886] b: (Const)/job:localhost/replica:0/task:0/device:GPU:0
a: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2018-08-26 14:13:46.254421: I tensorflow/core/common_runtime/placer.cc:886] a: (Const)/job:localhost/replica:0/task:0/device:GPU:0
[[22. 28.]
 [49. 64.]]

其中一些無用的看起來讓人擔憂的日誌輸出我直接從源碼中註釋掉了，例如：
OS X does not support NUMA - returning NUMA node zero

Not found: TF GPU device with id 0 was not registered

跑複雜一點的

$ nano tensorflow-gpu-04-cnn-gpu.py

#!/usr/bin/env python

from __future__ import absolute_import, division, print_function
import os
import time
import numpy as np
import tflearn
import tensorflow as tf

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '0'

from tensorflow.python.client import device_lib
def print_gpu_info():
    for device in device_lib.list_local_devices():
        print(device.name, 'memory_limit', str(round(device.memory_limit/1024/1024))+'M', 
            device.physical_device_desc)
    print('=======================')

print_gpu_info()


DATA_PATH = "/Volumes/Cloud/DataSet"

mnist = tflearn.datasets.mnist.read_data_sets(DATA_PATH+"/mnist", one_hot=True)

config = tf.ConfigProto()
config.log_device_placement = True
config.allow_soft_placement = True

config.gpu_options.allocator_type = 'BFC'
config.gpu_options.allow_growth = True
#config.gpu_options.per_process_gpu_memory_fraction = 0.3

# Building convolutional network
net = tflearn.input_data(shape=[None, 28, 28, 1], name='input') 
net = tflearn.conv_2d(net, 32, 5, weights_init='variance_scaling', activation='relu', regularizer="L2") 
net = tflearn.conv_2d(net, 64, 5, weights_init='variance_scaling', activation='relu', regularizer="L2") 
net = tflearn.fully_connected(net, 10, activation='softmax') 
net = tflearn.regression(net,
                         optimizer='adam',                  
                         learning_rate=0.01,
                         loss='categorical_crossentropy', 
                         name='target')

# Training
model = tflearn.DNN(net, tensorboard_verbose=3)

start_time = time.time()
model.fit(mnist.train.images.reshape([-1, 28, 28, 1]),
          mnist.train.labels.astype(np.int32),
          validation_set=(
              mnist.test.images.reshape([-1, 28, 28, 1]),
              mnist.test.labels.astype(np.int32)
          ),
          n_epoch=1,
          batch_size=128,
          shuffle=True,
          show_metric=True,
          run_id='cnn_mnist_tflearn')

duration = time.time() - start_time
print('Training Duration %.3f sec' % (duration))

$ python3 tensorflow-gpu-04-cnn-gpu.py
2018-08-26 14:11:00.463212: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: GeForce GTX 750 Ti major: 5 minor: 0 memoryClockRate(GHz): 1.2545
pciBusID: 0000:01:00.0
totalMemory: 2.00GiB freeMemory: 258.06MiB
2018-08-26 14:11:00.463235: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-08-26 14:11:00.717963: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/device:GPU:0 with 203 MB memory) -> physical GPU (device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0, compute capability: 5.0)
/device:CPU:0 memory_limit 256M
/device:GPU:0 memory_limit 204M device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0, compute capability: 5.0
=======================
Extracting /Volumes/Cloud/DataSet/mnist/train-images-idx3-ubyte.gz
Extracting /Volumes/Cloud/DataSet/mnist/train-labels-idx1-ubyte.gz
Extracting /Volumes/Cloud/DataSet/mnist/t10k-images-idx3-ubyte.gz
Extracting /Volumes/Cloud/DataSet/mnist/t10k-labels-idx1-ubyte.gz
2018-08-26 14:11:01.158727: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-08-26 14:11:01.158843: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 203 MB memory) -> physical GPU (device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0, compute capability: 5.0)
2018-08-26 14:11:01.487530: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-08-26 14:11:01.487630: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 203 MB memory) -> physical GPU (device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0, compute capability: 5.0)
---------------------------------
Run id: cnn_mnist_tflearn
Log directory: /tmp/tflearn_logs/
---------------------------------
Training samples: 55000
Validation samples: 10000
--
Training Step: 430  | total loss: 0.16522 | time: 45.764s
| Adam | epoch: 001 | loss: 0.16522 - acc: 0.9660 | val_loss: 0.06837 - val_acc: 0.9780 -- iter: 55000/55000
--
Training Duration 45.898 sec

速度提高明顯：
CPU 版無 AVX2 FMA，time: 168.151s

CPU 版加 AVX2 FMA，time: 147.697s

GPU 版加 AVX2 FMA，time: 45.898s

cuda-smi

cuda-smi 用來在Mac上代替 nvidia-smi

nvidia-smi是用來查看GPU內存使用狀況的。

下載後放到 /usr/local/bin/ 目錄下

$ sudo scp cuda-smi /usr/local/bin/
$ sudo chmod 755 /usr/local/bin/cuda-smi
$ cuda-smi
Device 0 [PCIe 0:1:0.0]: GeForce GTX 750 Ti (CC 5.0): 5.0234 of 2047.7 MB (i.e. 0.245%) Free

問題

錯誤 _ncclAllReduce

從新編譯一個 _nccl_ops.so 複製過去便可

$ gcc -c -fPIC tensorflow/contrib/nccl/kernels/nccl_ops.cc -o _nccl_ops.o
$ gcc _nccl_ops.o -shared -o _nccl_ops.so
$ mv _nccl_ops.so /usr/local/lib/python3.6/site-packages/tensorflow/contrib/nccl/python/ops/
$ rm _nccl_ops.o

Library not loaded: @rpath/libcublas.9.2.dylib

這是由於 Jupyter 中丟失了 DYLD_LIBRARY_PATH 環境變量
或者說是新版本的 MacOS 禁止了你對 DYLD_LIBRARY_PATH 等不安全因素的隨意修改，除非你關閉SIP功能

重現

import os
os.environ['DYLD_LIBRARY_PATH']

上面的代碼在 Jupyter 中會出錯，緣由是由於 SIP的緣由環境變量 DYLD_LIBRARY_PATH 不能被修改

解決：參考前面的「環境變量」設置部分

Segmentation Fault

所謂的段錯誤就是指訪問的內存超過了系統所給這個程序的內存空間

解決：請再次確認使用了正確的版本和編譯參數，尤爲是 XCode

Not found: TF GPU device with id 0 was not registered

直接忽略這個警告

GPU 內存有泄漏？？？

不知道咋解決:(

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。