1. 簡單(Simple)github
• 簡單安裝部署:提供安裝文件網絡下載,30分鐘完成從裸機到雲平臺的安裝部署。sql
• 簡單搭建雲平臺:支持雲主機的批量(生成,刪除等)操做,提供列表展現和滑窗詳情。shell
• 簡單實用操做:詳細的用戶手冊,足量的幫助信息,良好的社區,標準的API提供。數據庫
• 友好UI交互:設計精良的專業操做界面,精簡操做實現強大的功能。
2. 健壯(Strong)
• 穩定且高效的系統架構設計:擁有全異步的後臺架構,進程內微服務架構,無鎖架構,無狀態服務架構,一致性哈希環,保證系統架構的高效穩定。目前已實現:單管理節點管理上萬臺物理主機、數十萬臺雲主機;而多個管理節點構建的集羣使用一個數據庫、一套消息總線可管理十萬臺物理主機、數百萬臺雲主機、併發處理數萬個API。
• 支撐高併發的API請求:單ZStack管理節點能夠輕鬆處理每秒上萬個併發API調用請求。
• 支持HA的嚴格要求:在網絡或節點失效狀況下,業務雲主機可自動切換到其它健康節點運行;利用管理節點虛擬化實現了單管理節點的高可用,故障時支持管理節點動態遷移。
3. 彈性(Scalable)
• 支撐規模無限制:單管理節點可管理從一臺到上萬臺物理主機,數十萬臺雲主機。
• 全API交付:ZStack提供了全套IaaS API,用戶可以使用這些APIs完成全新跨地域的可用區域搭建、
• 資源可按需調配:雲主機和雲存儲等重要資源可根據用戶需求進行擴縮容。ZStack不只支持對雲主
4. 智能(Smart)
• 自動化運維管理:在ZStack環境裏,一切由APIs來管理。ZStack利用Ansible庫實現全自動部署和
• 在線無縫升級:5分鐘一鍵無縫升級,用戶只需升級管控節點。計算節點、存儲節點、網絡節點在
• 智能化的UI交互界面:實時的資源計算,避免用戶誤操做。
• 實時的全局監控:實時掌握整個雲平臺當前系統資源的消耗狀況,經過實時監控,智能化調配,從
2.1 組件部署介紹
是一個開放源代碼軟件庫,用於進行高性能數值計算。藉助其靈活的架構,用戶能夠輕鬆地將計算工做部署到多種平臺(CPU、GPU、TPU)和設備(桌面設備、服務器集羣、移動設備、邊緣設備等)。TensorFlow最初是由 Google Brain 團隊中的研究人員和工程師開發的,可爲機器學習和深度學習提供強力支持,而且其靈活的數值計算核心普遍應用於許多其餘科學領域。
NVIDIA CUDA深層神經網絡庫(cuDNN)是一種用於深層神經網絡的GPU加速庫原始圖形。cuDNN爲標準例程提供了高度調優的實現,如前向和後向卷積、池化、歸一化和激活層。cuDNN是NVIDIA深度學習SDK的一部分。
Jupyter是一個交互式的筆記本,能夠很方便地建立和共享文學化程序文檔,支持實時代碼,數學方程,可視化和 markdown。通常用與作數據清理和轉換,數值模擬,統計建模,機器學習等等。
2.2 雲平臺環境準備
物理服務器配置 |
GPU型號 |
雲主機配置 |
雲主機系統 |
IP地址 |
主機名 |
Intel(R) i5-3470 DDR3 24G |
NVIDIA QuadroP2000 |
8vCPU16G |
CentOS7.4 | |
本次使用一臺普通PC機部署ZStack雲平臺,使用雲平臺中GPU透傳功能將一塊NVIDIA QuadroP2000顯卡透傳給一個CentOS7.4虛擬機,進行平臺的構建。
2.2.1 建立雲主機
一、選擇添加方式; 平臺支持建立單個雲主機和建立多個雲主機,根據需求進行選擇。
2.2.2 透傳GPU操做
0x3 開始部署
3.1 運行環境準備
安裝pip # curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py # python get-pip.py # pip --version pip 18.1 from /usr/lib/python2.7/site-packages/pip (python 2.7) # python --version Python 2.7.5 安裝GCC G++ # yum install gcc gcc-c++ # gcc --version gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36) 安裝一些須要的包 #yum -y install zlib* #yum install openssl-devel -y #yum install sqlite* -y 升級CentOS默認Python2.7.5版本到3.6.5 下載Python源碼包 # wget -c https://www.python.org/ftp/python/3.6.5/Python-3.6.5.tgz 解壓源碼包 # tar -zvxf Python-3.6.5.tgz 進入源碼目錄 # cd Python-3.6.5/ # ./configure --with-ssl 編譯並安裝 # make && make install 查看一下新安裝的python3的文件位置 # ll /usr/local/bin/python* 設置python默認版本號爲3.x # mv /usr/bin/python /usr/bin/python.bak 查看一下2.x版本的文件位置 # ll /usr/bin/python* 爲使yum命令正常使用,須要將其配置的python依然指向2.x版本 # vim /usr/bin/yum #vim /usr/libexec/urlgrabber-ext-down 將上面兩個文件的頭部文件修改成老版本便可 安裝python-dev、python-pip # yum install python-dev python-pip -y 禁用自帶Nouveau驅動 Nouveau使用 # lsmod | grep nouveau nouveau 1662531 0 mxm_wmi 13021 1 nouveau wmi 19086 2 mxm_wmi,nouveau video 24538 1 nouveau i2c_algo_bit 13413 1 nouveau drm_kms_helper 176920 2 qxl,nouveau ttm 99555 2 qxl,nouveau drm 397988 5 qxl,ttm,drm_kms_helper,nouveau i2c_core 63151 5 drm,i2c_piix4,drm_kms_helper,i2c_algo_bit,nouveau #vim /usr/lib/modprobe.d/dist-blacklist.conf # nouveau blacklist nouveau options nouveau modeset=0 :wq 保存退出 # mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak 備份引導鏡像 # dracut /boot/initramfs-$(uname -r).img $(uname -r) 重建引導鏡像 # reboot #lsmod | grep nouveau 再次驗證禁用是否生效
3.2 安裝CUDA
升級內核: # rpm -import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org # rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm # yum -y --enablerepo=elrepo-kernel install kernel-ml.x86_64 kernel-ml-devel.x86_64 查看內核版本默認啓動順序: awk -F\' '$1=="menuentry " {print $2}' /etc/grub2.cfg CentOS Linux (4.20.0-1.el7.elrepo.x86_64) 7 (Core) CentOS Linux (3.10.0-862.el7.x86_64) 7 (Core) CentOS Linux (0-rescue-c4581dac5b734c11a1881c8eb10d6b09) 7 (Core) #vim /etc/default/grub GRUB_DEFAULT=saved 改成GRUB_0=saved 運行grub2-mkconfig命令來從新建立內核配置 # grub2-mkconfig -o /boot/grub2/grub.cfg #reboot # uname -r 重啓後驗證一下內核版本 4.20.0-1.el7.elrepo.x86_64 CUDA Toolkit安裝有兩種方式:
這裏選擇使用Runfile模式進行安裝 安裝包下載:https://developer.nvidia.com/compute/cuda/10.0/Prod/local_installers/cuda_10.0.130_410.48_linux 根據自身操做系統進行安裝包篩選,並下載。複製下載連接直接用wget -c命令進行下載 # wget -c https://developer.nvidia.com/compute/cuda/10.0/Prod/local_installers/cuda_10.0.130_410.48_linux #chmod +x cuda_10.0.130_410.48_linux #./cuda_10.0.130_410.48_linux Do you accept the previously read EULA? accept/decline/quit: accept Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 410.48? (y)es/(n)o/(q)uit: y Install the CUDA 10.0 Toolkit? (y)es/(n)o/(q)uit: y Enter Toolkit Location [ default is /usr/local/cuda-10.0 ]: Do you want to install a symbolic link at /usr/local/cuda? (y)es/(n)o/(q)uit: y Install the CUDA 10.0 Samples? (y)es/(n)o/(q)uit: y Enter CUDA Samples Location [ default is /root ]: 配置CUDA運行環境變量: # vim /etc/profile # CUDA export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}} export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}} # source /etc/profile 檢查版本 # nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2018 NVIDIA Corporation Built on Sat_Aug_25_21:08:01_CDT_2018 Cuda compilation tools, release 10.0, V10.0.130
使用實例驗證測試CUDA是否正常: #cd /root/NVIDIA_CUDA-10.0_Samples/1_Utilities/deviceQuery # make "/usr/local/cuda-10.0"/bin/nvcc -ccbin g++ -I../../common/inc -m64 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75 -o deviceQuery.o -c deviceQuery.cpp "/usr/local/cuda-10.0"/bin/nvcc -ccbin g++ -m64 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75 -o deviceQuery deviceQuery.o mkdir -p ../../bin/x86_64/linux/release cp deviceQuery ../../bin/x86_64/linux/release # cd ../../bin/x86_64/linux/release/ # ./deviceQuery #./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "Quadro P2000" CUDA Driver Version / Runtime Version 10.0 / 10.0 CUDA Capability Major/Minor version number: 6.1 Total amount of global memory: 5059 MBytes (5304745984 bytes) ( 8) Multiprocessors, (128) CUDA Cores/MP: 1024 CUDA Cores GPU Max Clock rate: 1481 MHz (1.48 GHz) Memory Clock rate: 3504 Mhz Memory Bus Width: 160-bit L2 Cache Size: 1310720 bytes Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384) Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 2 copy engine(s) Run time limit on kernels: No Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled Device supports Unified Addressing (UVA): Yes Device supports Compute Preemption: Yes Supports Cooperative Kernel Launch: Yes Supports MultiDevice Co-op Kernel Launch: Yes Device PCI Domain ID / Bus ID / location ID: 0 / 0 / 11 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.0, CUDA Runtime Version = 10.0, NumDevs = 1 Result = PASS Result = PASS且測試過程當中無報錯,表示測試經過!
3.3安裝 cuDNN
cuDNN的全稱爲NVIDIA CUDA® Deep Neural Network library,是NVIDIA專門針對深度神經網絡(Deep Neural Networks)中的基礎操做而設計基於GPU的加速庫。cuDNN爲深度神經網絡中的標準流程提供了高度優化的實現方式。
下載安裝包:https://developer.nvidia.com/rdp/cudnn-download 注:下載前需先註冊 NVIDIA Developer Program,而後才能下載。 能夠根據自身的環境選擇相應版本進行下載,這個有身份驗證只能瀏覽器下載而後再上傳到雲主機中。 安裝: #rpm -ivh libcudnn7- libcudnn7-devel- libcudnn7-doc- 準備中... ################################# [100%] 正在升級/安裝... 1:libcudnn7- ################################# [ 33%] 2:libcudnn7-devel- [ 67%] 3:libcudnn7-doc- [100%] 驗證cuDNN: # cp -r /usr/src/cudnn_samples_v7/ $HOME # cd $HOME/cudnn_samples_v7/mnistCUDNN # make clean && make rm -rf *o rm -rf mnistCUDNN /usr/local/cuda/bin/nvcc -ccbin g++ -I/usr/local/cuda/include -IFreeImage/include -m64 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_53,code=compute_53 -o fp16_dev.o -c fp16_dev.cu g++ -I/usr/local/cuda/include -IFreeImage/include -o fp16_emu.o -c fp16_emu.cpp g++ -I/usr/local/cuda/include -IFreeImage/include -o mnistCUDNN.o -c mnistCUDNN.cpp /usr/local/cuda/bin/nvcc -ccbin g++ -m64 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_53,code=compute_53 -o mnistCUDNN fp16_dev.o fp16_emu.o mnistCUDNN.o -I/usr/local/cuda/include -IFreeImage/include -LFreeImage/lib/linux/x86_64 -LFreeImage/lib/linux -lcudart -lcublas -lcudnn -lfreeimage -lstdc++ -lm
# ./mnistCUDNN cudnnGetVersion() : 7402 , CUDNN_VERSION from cudnn.h : 7402 (7.4.2) Host compiler version : GCC 4.8.5 There are 1 CUDA capable devices on your machine : device 0 : sms 8 Capabilities 6.1, SmClock 1480.5 Mhz, MemSize (Mb) 5059, MemClock 3504.0 Mhz, Ecc=0, boardGroupID=0 Using device 0
Testing single precision Loading image data/one_28x28.pgm Performing forward propagation ... Testing cudnnGetConvolutionForwardAlgorithm ... Fastest algorithm is Algo 1 Testing cudnnFindConvolutionForwardAlgorithm ... ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.036864 time requiring 0 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.044032 time requiring 3464 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.053248 time requiring 57600 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.116544 time requiring 207360 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.181248 time requiring 2057744 memory Resulting weights from Softmax: 0.0000000 0.9999399 0.0000000 0.0000000 0.0000561 0.0000000 0.0000012 0.0000017 0.0000010 0.0000000 Loading image data/three_28x28.pgm Performing forward propagation ... Resulting weights from Softmax: 0.0000000 0.0000000 0.0000000 0.9999288 0.0000000 0.0000711 0.0000000 0.0000000 0.0000000 0.0000000 Loading image data/five_28x28.pgm Performing forward propagation ... Resulting weights from Softmax: 0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 0.9999820 0.0000154 0.0000000 0.0000012 0.0000006 Result of classification: 1 3 5 Test passed! Testing half precision (math in single precision) Loading image data/one_28x28.pgm Performing forward propagation ... Testing cudnnGetConvolutionForwardAlgorithm ... Fastest algorithm is Algo 1 Testing cudnnFindConvolutionForwardAlgorithm ... ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.032896 time requiring 0 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.036448 time requiring 3464 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.044000 time requiring 28800 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.115488 time requiring 207360 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.180224 time requiring 2057744 memory Resulting weights from Softmax: 0.0000001 1.0000000 0.0000001 0.0000000 0.0000563 0.0000001 0.0000012 0.0000017 0.0000010 0.0000001 Loading image data/three_28x28.pgm Performing forward propagation ... Resulting weights from Softmax: 0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000714 0.0000000 0.0000000 0.0000000 0.0000000 Loading image data/five_28x28.pgm Performing forward propagation ... Resulting weights from Softmax: 0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 1.0000000 0.0000154 0.0000000 0.0000012 0.0000006 Result of classification: 1 3 5 Test passed! Test passed!且測試過程當中無報錯,表示測試經過! |
# pip3 install --upgrade setuptools==30.1.0 # pip3 install tf-nightly-gpu 驗證測試: 在 Python 交互式 shell 中輸入如下幾行簡短的程序代碼: # python 若是系統輸出如下內容,就說明您能夠開始編寫 TensorFlow 程序了: Hello, TensorFlow! 同時使用nvidia-smi命令能夠看到當前顯卡的處理任務。 |
能夠用 TensorBoard 來展示 TensorFlow 圖,繪製圖像生成的定量指標圖以及顯示附加數據(如其中傳遞的圖像)。經過 pip 安裝 TensorFlow 時,也會自動安裝 TensorBoard:
# pip3 show tensorboard
Name: tensorboard
Version: 1.12.2
Summary: TensorBoard lets you watch Tensors Flow
Home-page: https://github.com/tensorflow/tensorboard
Author: Google Inc.
Author-email: opensource@google.com
License: Apache 2.0
Location: /usr/lib/python2.7/site-packages
Requires: protobuf, numpy, futures, grpcio, wheel, markdown, werkzeug, six
# tensorboard --logdir /var/log/tensorboard.log
TensorBoard 1.13.0a20190107 at http://GPU-TF:6006 (Press CTRL+C to quit)
Jupyter是一個交互式的筆記本,能夠很方便地建立和共享文學化程序文檔,支持實時代碼,數學方程,可視化和 markdown。通常用與作數據清理和轉換,數值模擬,統計建模,機器學習等等。
# sudo pip3 install jupyter
# jupyter notebook --generate-config
Writing default config to: /root/.jupyter/jupyter_notebook_config.py
# python
Python 3.6.5 (default, Jan 15 2019, 02:51:51)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from notebook.auth import passwd;
>>> passwd()
Enter password:
Verify password:
# vim /root/.jupyter/jupyter_notebook_config.py
# jupyter notebook --allow-root --ip=''
運行TensorFlow Demo示例
Jupyter中新建 HelloWorld 示例,代碼以下:
import tensorflow as tf
# Simple hello world using TensorFlow
# Create a Constant op
# The op is added as a node to the default graph.
# The value returned by the constructor represents the output
# of the Constant op.
hello = tf.constant('Hello, TensorFlow!')
# Start tf session
sess = tf.Session()
# Run the op