摘要: 本系列將介紹如何在阿里雲容器服務上運行Kubeflow, 本文介紹如何使用`TensorFlow Serving`加載訓練模型而且進行模型預測。python
本系列將介紹如何在阿里雲容器服務上運行Kubeflow, 本文介紹如何使用TensorFlow Serving
加載訓練模型而且進行模型預測。git
TensorFlow Serving是Google開源的一個靈活的、高性能的機器學習模型服務系統,可以簡化並加速從模型到生產應用的過程。它除了原生支持TensorFlow模型,還能夠擴展支持其餘類型的機器學習模型。github
在前面的文章中,已經介紹瞭如何進行單機和分佈式的模型訓練,而且能夠將訓練的導出模型放置到分佈式存儲上。在本文中,會介紹模型如何被髮布到TensorFlow Serving系統服務器端。並經過gRPC客戶端提交請求,由服務端返回預測結果。api
在前一篇文章中,咱們已經將訓練的模型導出到NAS上,能夠先查看一下導出的模型。在serving的文件夾指定了模型的,即mnist名稱;而mnist的下一層是模型的版本。bash
mkdir -p /nfs mount -t nfs -o vers=4.0 0fc844b526-rqx39.cn-hangzhou.nas.aliyuncs.com:/ /nfs cd /nfs tree serving serving └── mnist └── 1 ├── saved_model.pb └── variables ├── variables.data-00000-of-00001 └── variables.index
在模型導出的章節中,已經在這個NAS存儲上建立了對應的pv: tf-serving-pv
和pvc: tf-serving-pvc
, 而TensorFlow Serving
將從pvc中加載模型。服務器
kubectl get pv tf-serving-pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE tf-serving-pv 10Gi RWX Retain Bound default/tf-serving-pvc nas 2d kubectl get pvc tf-serving-pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE tf-serving-pvc Bound tf-serving-pv 10Gi RWX nas 2d
TensorFlow Serving
# 建立TensorFlow Serving的namespace export NAMESPACE=default # 指定Kubeflow的版本 VERSION=v0.2.0-rc.0 APP_NAME=tf-serving # 初始化Kubeflow應用,而且將其namespace設置爲default環境 ks init ${APP_NAME} --api-spec=version:v1.9.3 cd ${APP_NAME} ks env add ack ks env set ack --namespace ${NAMESPACE} # 安裝 Kubeflow 模塊 ks registry add kubeflow github.com/kubeflow/kubeflow/tree/${VERSION}/kubeflow ks pkg install kubeflow/tf-serving@${VERSION} # 指定配置TensorFlow Serving所需環境變量 MODEL_COMPONENT=mnist-serving MODEL_NAME=mnist MODEL_PATH=/mnt/mnist MODEL_STORAGE_TYPE=nfs SERVING_PVC_NAME=tf-serving-pvc MODEL_SERVER_IMAGE=registry.aliyuncs.com/kubeflow-images-public/tensorflow-serving-1.7:v20180604-0da89b8a # 建立TensorFlow Serving的模板 ks generate tf-serving ${MODEL_COMPONENT} --name=${MODEL_NAME} ks param set ${MODEL_COMPONENT} modelPath ${MODEL_PATH} ks param set ${MODEL_COMPONENT} modelStorageType ${MODEL_STORAGE_TYPE} ks param set ${MODEL_COMPONENT} nfsPVC ${SERVING_PVC_NAME} ks param set ${MODEL_COMPONENT} modelServerImage $MODEL_SERVER_IMAGE # 設置tf-serving ks param set ${MODEL_COMPONENT} cloud ack # 若是須要暴露對外部系統的服務 ks param set ${MODEL_COMPONENT} serviceType LoadBalancer # 若是使用GPU, 請使用如下配置 NUMGPUS=1 ks param set ${MODEL_COMPONENT} numGpus ${NUMGPUS} MODEL_GPU_SERVER_IMAGE=registry.aliyuncs.com/kubeflow-images-public/tensorflow-serving-1.6gpu:v20180604-0da89b8a ks param set ${MODEL_COMPONENT} modelServerImage $MODEL_SERVER_IMAGE ks apply ack -c mnist-serving
部署完成後能夠經過kubectl get deploy
查詢到TensorFlow Serving
運行狀態app
# kubectl get deploy -lapp=$MODEL_NAME NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE mnist-v1 1 1 1 1 4m
查看TensorFlow Serving
運行日誌,發現模型已經加載負載均衡
2018-06-19 06:50:19.185785: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2018-06-19 06:50:19.202907: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:161] Restoring SavedModel bundle. 2018-06-19 06:50:19.418625: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:196] Running LegacyInitOp on SavedModel bundle. 2018-06-19 06:50:19.425357: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:291] SavedModel load for tags { serve }; Status: success. Took 550707 microseconds. 2018-06-19 06:50:19.430435: I tensorflow_serving/core/loader_harness.cc:86] Successfully loaded servable version {name: mnist version: 1}
以及對外的暴露的服務ip和端口python2.7
kubectl get svc -lapp=$MODEL_NAME NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE mnist LoadBalancer 172.19.4.241 xx.xx.xx.xx 9000:32697/TCP,8000:32166/TCP 7m
這裏能夠看到gRPC對外服務ip爲
xx.xx.xx.xx
,對外服務的端口爲9000
機器學習
TensorFlow Serving
經過kubectl run
運行gRPC客戶端, 而且點擊回車,登陸到Pod裏
kubectl run -i --tty mnist-client --image=registry.cn-hangzhou.aliyuncs.com/tensorflow-samples/tf-mnist-client-demo --restart=Never --command -- /bin/bash If you don't see a command prompt, try pressing enter.
運行客戶端python代碼:
# export TF_MNIST_IMAGE_PATH=1.png # export TF_MODEL_SERVER_HOST=172.19.4.241 # python mnist_client.py /usr/local/lib/python2.7/dist-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`. from ._conv import register_converters as _register_converters outputs { key: "scores" value { dtype: DT_FLOAT tensor_shape { dim { size: 1 } dim { size: 10 } } float_val: 1.0 float_val: 0.0 float_val: 9.85347854001e-34 float_val: 1.00954509814e-35 float_val: 0.0 float_val: 0.0 float_val: 1.5053762612e-14 float_val: 0.0 float_val: 5.21842267799e-22 float_val: 0.0 } } ............................ ............................ ............................ ............................ .............@@............. .............@@@............ .............@@@............ .............@@@............ .............@@@............ .............@@@............ .............@@@............ .............@@@............ .............@@@............ .............@@@............ .............@@@@........... .............@@@@........... ..............@@@........... ..............@@@........... ..............@@@........... ..............@@@........... ..............@@@........... ..............@@@........... ..............@@@........... ..............@@@........... ............................ ............................ ............................ ............................ Your model says the above number is... 1!
這樣咱們訓練導出的模型,就能夠直接經過gRPC的客戶端訪問了,從而實如今線預測。結合前面的文章,咱們已經介紹了從深度學習的模型訓練,模型導出到模型部署上線的全路徑通路。
TensorFlow Serving
ks delete ack -c mnist-serving
這個例子介紹瞭如何經過Kubeflow部署TensorFlow Serving, 而且加載阿里雲NAS上存儲的模型,而且提供模型預測服務。
Kubeflow部署機器學習應用很是簡單,可是隻有應用層的簡即是不夠的;雲端基礎設施的自動化集成也是很是重要的,好比GPU/NAS/OSS以及負載均衡的無縫調用,而這方面使用阿里雲Kubernetes容器服務能夠大幅度下降數據科學家的模型交付難度。