老徐Thursday, 26 July 2018html
Tensorflow團隊宣佈中止支持1.2之後mac版的tensorflow gpu版本。所以沒辦法直接安裝只能本身用源碼編譯了。node
Tensorflow 1.8 with CUDA on macOS High Sierra 10.13.6python
CPU 運行 Tensorflow 感受不夠快,想試試 GPU 加速!正好本身有一塊支持CUDA的顯卡。c++
重要的事情說三遍:相關的驅動以及編譯環境工具必須選擇配套的版本,不然編譯不成功!!!
版本:git
Segmentation Fault
須要下載(某些文件較大須要下載,請在繼續閱讀前先開始下載,節省時間):github
https://developer.apple.com/d...macos
Xcode_8.2.1.xipxcode
Tensorflow source code,333M
$ git clone https://github.com/tensorflow/tensorflow -b r1.8
目前裝的是3.7,降級吧
$ brew unlink python $ brew install https://raw.githubusercontent.com/Homebrew/homebrew-core/f2a764ef944b1080be64bd88dca9a1d80130c558/Formula/python.rb $ pip3 install --upgrade pip setuptools wheel # $ brew switch python 3.6.5_1
不要使用 Python 3.7.0,不然編譯會有問題
編譯完後能夠切換回去
$ brew switch python 3.7.0
須要降級 Xcode 到 8.2.1
去apple開發者官網下載包,https://developer.apple.com/d...
解壓後複製到/Applications/Xcode.app
,而後進行指向
$ sudo xcode-select -s /Applications/Xcode.app
確認安裝是否準確
$ cc -v Apple LLVM version 8.0.0 (clang-800.0.42.1) Target: x86_64-apple-darwin17.7.0 Thread model: posix InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
Command Line Tools,cc 即 clang這個很重要,不然雖然編譯成功可是跑複雜一點項目會出現
Segmentation Fault
因爲用到 CUDA 的 lib 不是在系統目錄下,所以須要設置環境變量來指向在 Mac 下 LD_LIBRARY_PATH 無效,使用的是 DYLD_LIBRARY_PATH
配置環境變量編輯 ~/.bash_profile
或 ~/.zshrc
export CUDA_HOME=/usr/local/cuda export DYLD_LIBRARY_PATH=$CUDA_HOME/lib:$CUDA_HOME/extras/CUPTI/lib export PATH=$CUDA_HOME/bin:$PATH
CUDA是NVIDIA推出的用於自家GPU的 並行計算框架,也就是說CUDA只能在NVIDIA的GPU上運行, 並且只有當要解決的計算問題是能夠大量並行計算的時候才能發揮CUDA的做用。
在這裏找到你的顯卡型號,看是否支持
個人顯卡是 NVIDIA GeForce GTX 750 Ti:
GPU | Compute Capability |
---|---|
GeForce GTX 750 Ti | 5.0 |
若是安裝了其餘版本的CUDA,須要卸載請執行
$ sudo /usr/local/bin/uninstall_cuda_drv.pl $ sudo /usr/local/cuda/bin/uninstall_cuda_9.1.pl $ sudo rm -rf /Developer/NVIDIA/CUDA-9.1/ $ sudo rm -rf /Library/Frameworks/CUDA.framework $ sudo rm -rf /usr/local/cuda/
爲了萬無一失,最好仍是重啓一下
首先須要說明的是:CUDA Driver 與 GPU Driver的版本必須一致,才能讓CUDA找到顯卡。
GPU Driver 即顯卡驅動
個人 macOS 是 10.13.6 對應的驅動已經安裝最新版 387.10.10.10.40.105
https://www.nvidia.com/downlo...
Version: 387.10.10.10.40.105 Release Date: 2018.7.10 Operating System: macOS High Sierra 10.13.6 CUDA Toolkit: 9.1
CUDA Driver
cudadriver_396.148_macos.dmg
New Release 396.148 CUDA driver update to support CUDA Toolkit 9.2, macOS 10.13.6 and NVIDIA display driver 387.10.10.10.40.105 Recommended CUDA version(s): CUDA 9.2 Supported macOS 10.13
CUDA Toolkit
安裝完成後檢查:
$ nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2018 NVIDIA Corporation Built on Tue_Jun_12_23:08:12_CDT_2018 Cuda compilation tools, release 9.2, V9.2.148
確認驅動是否已加載
$ kextstat | grep -i cuda. 149 0 0xffffff7f838d3000 0x2000 0x2000 com.nvidia.CUDA (1.1.0) E13478CB-B251-3C0A-86E9-A6B56F528FE8 <4 1>
測試CUDA可否正常運行:
$ cd /usr/local/cuda/samples $ sudo make -C 1_Utilities/deviceQuery $ ./bin/x86_64/darwin/release/deviceQuery ./bin/x86_64/darwin/release/deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) Detected 1 CUDA Capable device(s) Device 0: "GeForce GTX 750 Ti" CUDA Driver Version / Runtime Version 9.2 / 9.2 CUDA Capability Major/Minor version number: 5.0 Total amount of global memory: 2048 MBytes (2147155968 bytes) ( 5) Multiprocessors, (128) CUDA Cores/MP: 640 CUDA Cores GPU Max Clock rate: 1254 MHz (1.25 GHz) Memory Clock rate: 2700 Mhz Memory Bus Width: 128-bit L2 Cache Size: 2097152 bytes Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096) Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 1 copy engine(s) Run time limit on kernels: Yes Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled Device supports Unified Addressing (UVA): Yes Supports Cooperative Kernel Launch: No Supports MultiDevice Co-op Kernel Launch: No Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.2, CUDA Runtime Version = 9.2, NumDevs = 1 Result = PASS
若是最後顯示 Result = PASS,那麼CUDA就工做正常
若是出現下列錯誤
The version ('9.1') of the host compiler ('Apple clang') is not supported
說明 Xcode 版本太新了,要求降級 Xcode
cuDNN(CUDA Deep Neural Network library):是NVIDIA打造的針對深度神經網絡的加速庫,是一個用於深層神經網絡的GPU加速庫。若是你要用GPU訓練模型,cuDNN不是必須的,可是通常會採用這個加速庫。
cuDNN
下好後直接把解壓縮合併到CUDA目錄/usr/local/cuda/下便可:
$ tar -xzvf cudnn-9.2-osx-x64-v7.2.1.38.tgz $ sudo cp cuda/include/cudnn.h /usr/local/cuda/include $ sudo cp cuda/lib/libcudnn* /usr/local/cuda/lib $ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib/libcudnn* $ rm -rf cuda
用來查看 CUDA 運行狀況
$ brew cask install cuda-z
而後就能夠從 Application 裏運行 CUDA-Z 來查看CUDA運行狀況了
若是有已經編譯好的版本,則能夠跳過本章直接到"安裝"部分
下面從源碼編譯 Tensorflow GPU 版本
請參考前面部分
Python
$ python3 --version Python 3.6.5
不要使用 Python 3.7.0,不然編譯會有問題
Python 依賴
$ pip3 install six numpy wheel
Coreutils,llvm,OpenMP
$ brew install coreutils llvm cliutils/apple/libomp
Bazel
須要注意,這裏必須是 0.14.0 版本,新或舊都能致使編譯失敗。下載0.14.0版本, bazel發佈頁
$ curl -O https://github.com/bazelbuild/bazel/releases/download/0.14.0/bazel-0.14.0-installer-darwin-x86_64.sh $ chmod +x bazel-0.14.0-installer-darwin-x86_64.sh $ ./bazel-0.14.0-installer-darwin-x86_64.sh $ bazel version Build label: 0.14.0
過低版本可能會致使找不到環境變量,從而 Library not loaded
檢查NVIDIA開發環境
$ nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2018 NVIDIA Corporation Built on Tue_Jun_12_23:08:12_CDT_2018 Cuda compilation tools, release 9.2, V9.2.148
檢查clang版本
$ cc -v Apple LLVM version 8.0.0 (clang-800.0.42.1) Target: x86_64-apple-darwin17.7.0 Thread model: posix InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
拉取 TensorFlow 源碼 release 1.8 分支並進行修改,使其與macOS兼容
這裏能夠直接下載修改好的源碼
$ curl -O https://raw.githubusercontent.com/SixQuant/tensorflow-macos-gpu/master/tensorflow-macos-gpu-r1.8-src.tar.gz
或者手工修改
$ git clone https://github.com/tensorflow/tensorflow -b r1.8 $ cd tensorflow $ curl -O https://raw.githubusercontent.com/SixQuant/tensorflow-macos-gpu/master/patch/tensorflow-macos-gpu-r1.8.patch $ git apply tensorflow-macos-gpu-r1.8.patch $ curl -o third_party/nccl/nccl.h https://raw.githubusercontent.com/SixQuant/tensorflow-macos-gpu/master/patch/nccl.h
配置
$ which python3 /usr/local/bin/python3
$ ./configure
Please specify the location of python. [Default is /usr/local/opt/python@2/bin/python2.7]: /usr/local/bin/python3 Found possible Python library paths: /usr/local/Cellar/python/3.6.5_1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages Please input the desired Python library path to use. Default is [/usr/local/Cellar/python/3.6.5_1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages] Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: n No Google Cloud Platform support will be enabled for TensorFlow. Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: n No Hadoop File System support will be enabled for TensorFlow. Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: n No Amazon S3 File System support will be enabled for TensorFlow. Do you wish to build TensorFlow with Apache Kafka Platform support? [Y/n]: n No Apache Kafka Platform support will be enabled for TensorFlow. Do you wish to build TensorFlow with XLA JIT support? [y/N]: n No XLA JIT support will be enabled for TensorFlow. Do you wish to build TensorFlow with GDR support? [y/N]: n No GDR support will be enabled for TensorFlow. Do you wish to build TensorFlow with VERBS support? [y/N]: n No VERBS support will be enabled for TensorFlow. Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n No OpenCL SYCL support will be enabled for TensorFlow. Do you wish to build TensorFlow with CUDA support? [y/N]: y CUDA support will be enabled for TensorFlow. Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]: 9.2 Please specify the location where CUDA 9.1 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: 7.2 Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: Please specify a list of comma-separated Cuda compute capabilities you want to build with. You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.5,5.2]3.0,3.5,5.0,5.2,6.0,6.1 Do you want to use clang as CUDA compiler? [y/N]:n nvcc will be used as CUDA compiler. Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: Do you wish to build TensorFlow with MPI support? [y/N]: No MPI support will be enabled for TensorFlow. Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]: Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: Not configuring the WORKSPACE for Android builds. Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See tools/bazel.rc for more details. --config=mkl # Build with MKL support. --config=monolithic # Config for mostly static monolithic build. Configuration finished
必定要輸入正確的版本
- /usr/local/bin/python3
- CUDA 9.2
- cuDNN 7.2
- compute capability 3.0,3.5,5.0,5.2,6.0,6.1 這個必定要去查你的顯卡支持的版本,能夠輸入多個
上面其實是生成了編譯配置文件 .tf_configure.bazelrc
開始編譯
$ bazel clean --expunge $ bazel build --config=opt --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" --action_env PATH --action_env DYLD_LIBRARY_PATH //tensorflow/tools/pip_package:build_pip_package
編譯過程當中因爲網絡問題,可能會下載失敗,多重試幾回若是bazel版本不對,可能會形成 DYLD_LIBRARY_PATH 沒有傳遞過去,從而Library not loaded
--config=opt 的意思應該是
build:opt --copt=-march=native build:opt --host_copt=-march=native build:opt --define with_default_optimizations=true
-march=native 表示使用當前CPU支持的優化指令來進行編譯
查看當前 CPU 支持的指令集
$ sysctl machdep.cpu.features machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C
$ gcc -march=native -dM -E -x c++ /dev/null | egrep "AVX|SSE" #define __AVX2__ 1 #define __AVX__ 1 #define __SSE2_MATH__ 1 #define __SSE2__ 1 #define __SSE3__ 1 #define __SSE4_1__ 1 #define __SSE4_2__ 1 #define __SSE_MATH__ 1 #define __SSE__ 1 #define __SSSE3__ 1
ERROR: /Users/c/Downloads/tensorflow-macos-gpu-r1.8/src/tensorflow/python/BUILD:1590:1: Executing genrule //tensorflow/python:string_ops_pygenrule failed (Aborted): bash failed: error executing command /bin/bash bazel-out/host/genfiles/tensorflow/python/string_ops_pygenrule.genrule_script.sh dyld: Library not loaded: @rpath/libcudart.9.2.dylib Referenced from: /private/var/tmp/_bazel_c/ea0f1e868907c49391ddb6d2fb9d5630/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/python/gen_string_ops_py_wrappers_cc Reason: image not found
是因爲 bazel 的 bug 致使環境變量 DYLD_LIBRARY_PATH 沒有傳遞過去
解決:安裝正確版本的 bazel
external/protobuf_archive/python/google/protobuf/pyext/descriptor_pool.cc:169:7: error: assigning to 'char *' from incompatible type 'const char *' if (PyString_AsStringAndSize(arg, &name, &name_size) < 0) {
這是由於 Python3.7 對 protobuf_python 有 bug, 請換爲 Python3.6 後從新編譯
編譯時間長達1.5小時,請耐心等待
重編譯而且替換_nccl_ops.so
$ gcc -march=native -c -fPIC tensorflow/contrib/nccl/kernels/nccl_ops.cc -o _nccl_ops.o $ gcc _nccl_ops.o -shared -o _nccl_ops.so $ mv _nccl_ops.so bazel-out/darwin-py3-opt/bin/tensorflow/contrib/nccl/python/ops $ rm _nccl_ops.o
打包
$ bazel-bin/tensorflow/tools/pip_package/build_pip_package ~/Downloads/
清理
$ bazel clean --expunge
$ pip3 uninstall tensorflow $ pip3 install ~/Downloads/tensorflow-1.8.0-cp36-cp36m-macosx_10_13_x86_64.whl
也能夠直接經過http安裝
$ pip3 install https://github.com/SixQuant/tensorflow-macos-gpu/releases/download/v1.8.0/tensorflow-1.8.0-cp36-cp36m-macosx_10_13_x86_64.whl
若是是直接安裝,請必定要確認相關的版本是否和編譯的一致或更高
- cudadriver_396.148_macos.dmg
- cuda_9.2.148_mac.dmg
- cuda_9.2.148.1_mac.dmg
- cudnn-9.2-osx-x64-v7.2.1.38.tgz
確認 Tensorflow GPU 是否工做正常
確認Python代碼是否能夠讀取到正確的環境變量DYLD_LIBRARY_PATH
$ nano tensorflow-gpu-01-env.py
#!/usr/bin/env python import os print(os.environ["DYLD_LIBRARY_PATH"])
$ python3 tensorflow-gpu-01-env.py /usr/local/cuda/lib:/usr/local/cuda/extras/CUPTI/lib
若是 TensorFlow 指令中兼有 CPU 和 GPU 實現,當該指令分配到設備時,GPU 設備有優先權。例如,若是 matmul
同時存在 CPU 和 GPU 核函數,在同時有 cpu:0
和 gpu:0
設備的系統中,gpu:0
會被選來運行 matmul
。要找出您的指令和張量被分配到哪一個設備,請建立會話並將 log_device_placement
配置選項設爲 True
。
$ nano tensorflow-gpu-02-hello.py
#!/usr/bin/env python import tensorflow as tf config = tf.ConfigProto() config.log_device_placement = True # Creates a graph. a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a') b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b') c = tf.matmul(a, b) # Creates a session with log_device_placement set to True. with tf.Session(config=config) as sess: # Runs the op. print(sess.run(c))
$ python3 tensorflow-gpu-02-hello.py 2018-08-26 14:13:45.987276: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: name: GeForce GTX 750 Ti major: 5 minor: 0 memoryClockRate(GHz): 1.2545 pciBusID: 0000:01:00.0 totalMemory: 2.00GiB freeMemory: 706.66MiB 2018-08-26 14:13:45.987303: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0 2018-08-26 14:13:46.245132: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 426 MB memory) -> physical GPU (device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0, compute capability: 5.0) Device mapping: /job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0, compute capability: 5.0 2018-08-26 14:13:46.253938: I tensorflow/core/common_runtime/direct_session.cc:284] Device mapping: /job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0, compute capability: 5.0 MatMul: (MatMul): /job:localhost/replica:0/task:0/device:GPU:0 2018-08-26 14:13:46.254406: I tensorflow/core/common_runtime/placer.cc:886] MatMul: (MatMul)/job:localhost/replica:0/task:0/device:GPU:0 b: (Const): /job:localhost/replica:0/task:0/device:GPU:0 2018-08-26 14:13:46.254415: I tensorflow/core/common_runtime/placer.cc:886] b: (Const)/job:localhost/replica:0/task:0/device:GPU:0 a: (Const): /job:localhost/replica:0/task:0/device:GPU:0 2018-08-26 14:13:46.254421: I tensorflow/core/common_runtime/placer.cc:886] a: (Const)/job:localhost/replica:0/task:0/device:GPU:0 [[22. 28.] [49. 64.]]
其中一些無用的看起來讓人擔憂的日誌輸出我直接從源碼中註釋掉了,例如:OS X does not support NUMA - returning NUMA node zero
Not found: TF GPU device with id 0 was not registered
$ nano tensorflow-gpu-04-cnn-gpu.py
#!/usr/bin/env python from __future__ import absolute_import, division, print_function import os import time import numpy as np import tflearn import tensorflow as tf os.environ['TF_CPP_MIN_LOG_LEVEL'] = '0' from tensorflow.python.client import device_lib def print_gpu_info(): for device in device_lib.list_local_devices(): print(device.name, 'memory_limit', str(round(device.memory_limit/1024/1024))+'M', device.physical_device_desc) print('=======================') print_gpu_info() DATA_PATH = "/Volumes/Cloud/DataSet" mnist = tflearn.datasets.mnist.read_data_sets(DATA_PATH+"/mnist", one_hot=True) config = tf.ConfigProto() config.log_device_placement = True config.allow_soft_placement = True config.gpu_options.allocator_type = 'BFC' config.gpu_options.allow_growth = True #config.gpu_options.per_process_gpu_memory_fraction = 0.3 # Building convolutional network net = tflearn.input_data(shape=[None, 28, 28, 1], name='input') net = tflearn.conv_2d(net, 32, 5, weights_init='variance_scaling', activation='relu', regularizer="L2") net = tflearn.conv_2d(net, 64, 5, weights_init='variance_scaling', activation='relu', regularizer="L2") net = tflearn.fully_connected(net, 10, activation='softmax') net = tflearn.regression(net, optimizer='adam', learning_rate=0.01, loss='categorical_crossentropy', name='target') # Training model = tflearn.DNN(net, tensorboard_verbose=3) start_time = time.time() model.fit(mnist.train.images.reshape([-1, 28, 28, 1]), mnist.train.labels.astype(np.int32), validation_set=( mnist.test.images.reshape([-1, 28, 28, 1]), mnist.test.labels.astype(np.int32) ), n_epoch=1, batch_size=128, shuffle=True, show_metric=True, run_id='cnn_mnist_tflearn') duration = time.time() - start_time print('Training Duration %.3f sec' % (duration))
$ python3 tensorflow-gpu-04-cnn-gpu.py 2018-08-26 14:11:00.463212: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: name: GeForce GTX 750 Ti major: 5 minor: 0 memoryClockRate(GHz): 1.2545 pciBusID: 0000:01:00.0 totalMemory: 2.00GiB freeMemory: 258.06MiB 2018-08-26 14:11:00.463235: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0 2018-08-26 14:11:00.717963: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/device:GPU:0 with 203 MB memory) -> physical GPU (device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0, compute capability: 5.0) /device:CPU:0 memory_limit 256M /device:GPU:0 memory_limit 204M device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0, compute capability: 5.0 ======================= Extracting /Volumes/Cloud/DataSet/mnist/train-images-idx3-ubyte.gz Extracting /Volumes/Cloud/DataSet/mnist/train-labels-idx1-ubyte.gz Extracting /Volumes/Cloud/DataSet/mnist/t10k-images-idx3-ubyte.gz Extracting /Volumes/Cloud/DataSet/mnist/t10k-labels-idx1-ubyte.gz 2018-08-26 14:11:01.158727: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0 2018-08-26 14:11:01.158843: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 203 MB memory) -> physical GPU (device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0, compute capability: 5.0) 2018-08-26 14:11:01.487530: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0 2018-08-26 14:11:01.487630: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 203 MB memory) -> physical GPU (device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0, compute capability: 5.0) --------------------------------- Run id: cnn_mnist_tflearn Log directory: /tmp/tflearn_logs/ --------------------------------- Training samples: 55000 Validation samples: 10000 -- Training Step: 430 | total loss: 0.16522 | time: 45.764s | Adam | epoch: 001 | loss: 0.16522 - acc: 0.9660 | val_loss: 0.06837 - val_acc: 0.9780 -- iter: 55000/55000 -- Training Duration 45.898 sec
速度提高明顯:CPU 版 無 AVX2 FMA,time: 168.151s
CPU 版 加 AVX2 FMA,time: 147.697s
GPU 版 加 AVX2 FMA,time: 45.898s
cuda-smi 用來在Mac上代替 nvidia-smi
nvidia-smi是用來查看GPU內存使用狀況的。
下載後放到 /usr/local/bin/ 目錄下
$ sudo scp cuda-smi /usr/local/bin/ $ sudo chmod 755 /usr/local/bin/cuda-smi $ cuda-smi Device 0 [PCIe 0:1:0.0]: GeForce GTX 750 Ti (CC 5.0): 5.0234 of 2047.7 MB (i.e. 0.245%) Free
從新編譯一個 _nccl_ops.so 複製過去便可
$ gcc -c -fPIC tensorflow/contrib/nccl/kernels/nccl_ops.cc -o _nccl_ops.o $ gcc _nccl_ops.o -shared -o _nccl_ops.so $ mv _nccl_ops.so /usr/local/lib/python3.6/site-packages/tensorflow/contrib/nccl/python/ops/ $ rm _nccl_ops.o
這是由於 Jupyter 中丟失了 DYLD_LIBRARY_PATH 環境變量或者說是新版本的 MacOS 禁止了你對 DYLD_LIBRARY_PATH 等不安全因素的隨意修改,除非你關閉SIP功能
重現
import os os.environ['DYLD_LIBRARY_PATH']
上面的代碼在 Jupyter 中會出錯,緣由是由於 SIP的緣由環境變量 DYLD_LIBRARY_PATH 不能被修改
解決:參考前面的 「環境變量」 設置部分
所謂的段錯誤就是指訪問的內存超過了系統所給這個程序的內存空間
解決:請再次確認使用了正確的版本和編譯參數,尤爲是 XCode
直接忽略這個警告
不知道咋解決:(