機器學習在360私有云容器服務上的實踐

時間 2019-11-20

標籤機器學習私有容器服務實踐简体版

原文原文鏈接

近期，360私有云容器服務團隊和人工智能研究院團隊展開配合，在雲端提高機器學習的效能方面進行了深刻的研究和實踐，爲業務提供了「人臉檢測」、「花屏檢測」、「色情檢測」、「寵物檢測」、「圖片風格化」、「文字識別」、「智能裁圖」等多種深度學習服務。html

下面主要介紹下這次實踐的兩方面技術：「TensorFlow Serving」和「微服務網關與容器服務」。python

TensorFlow Serving

簡介

TensorFlow Serving是2016 年 2 月發佈並開源的一種用於機器學習模型的靈活、高性能的 serving 平臺。它使得部署新的模型變得更加容易，同時保持了相同的服務器架構和API。並且，它還提供了TensorFlow模型的開箱即用的集成，可是能夠很容易地擴展爲其餘類型的模型和數據。TensorFlow Serving 就是一個專爲生產環境設計的。標準化了模型的定義及模型發佈的流程（模型領域的CI/CD）。nginx

機器/深度學習服務在模型發佈方面有不少需求：git

支持模型版本化
多種模型（經過 A/B 測試進行的實驗）並行提供服務
確保並行模型在硬件加速器（GPU 和 TPU）上實現高吞吐量和低延遲時間
模型的動態加載
模型的對外接口支持（RPC、restful 等）
批量任務的支持

工做機制

TensorFlow Serving 將每一個模型視爲可服務對象。它按期掃描本地文件系統，根據文件系統的狀態和模型版本控制策略來加載和卸載模型。這使得能夠在TensorFlow Serving繼續運行的狀況下，經過將導出的模型複製到指定的文件路徑，而輕鬆地熱部署通過訓練的模型。
其工做流程和原理可參考下面這張圖：github

從 1.8.0 版本開始提供了 restful api 的支持，上圖中只給出了 gRPC 的接口方式。
對於gRPC api server 和 restful api server 都是經過 C++ 實現。主要功能就是以接口的形式對外暴露模型的能力。
restful api 的實現：web

https://github.com/tensorflow/serving/blob/master/tensorflow_serving/model_servers/http_server.ccdocker

除了提供基礎的rpc server的功能外，亮點在於一下幾個feature:shell

標準化的模型格式
多模型管理：從一個模型到多個併發的服務模型，呈現出幾個性能障礙。經過
1. 在隔離的線程池中加載多個模型，以免在其餘模型中出現延遲峯值;
2. 在服務器啓動時，加速全部模型的初始加載;
3. 多模型批處理到多路複用硬件加速器(GPU/TPU)。
多版本管理：
1. 能夠同時load多個版本的model，而且客戶端能夠訪問指定的版本。
2. 模型熱加載：新版本的model發佈後，自動加載新版本。
3. 版本管理的policy是能夠定製的。默認主要實現的有兩種：Availability Preserving Policy和Resource Preserving Policy。
支持從多種存儲上加載模型：
1. 默認支持本地存儲、hdfs存儲、S3存儲（不過S3須要在編譯時打開開關）。
2. 能夠擴展支持更多種類的存儲（能夠經過插件形式提供對其它存儲類型的支持）。
client端訪問的批處理功能：一樣，這個功能也是能夠自定義policy。
靈活的擴展性。能夠實現本身的plugin對其實現定製和功能擴充。

TensorFlow Serving設計的很是靈活，擴展性也很是好，能夠自定義插件來添加新功能的支持。例如，你能夠添加一個數據源插件監聽雲存儲來替代本地存儲方式，或者你還能夠添加新的版本管理策略插件來控制多版本切換的策略，甚至還能夠經過插件的方式添加對非TensorFlow 模型的支持。詳情見custom source 和 custom servable。數據庫

模型管理策略

當有新版本的model添加時，AspiredVersionsManager 會加載新版本的模型，而且默認行爲是卸載掉舊版本模型。
當前是能夠配置的，目前支持3中類型：json

加載全部版本模型
加載最近的幾個版本模型

加載執行版本的模型
模型管理和加載策略經過配置文件控制，配置文件格式：

model_config_list: {
    config: {
        name: "mnist",
        base_path: "/tmp/monitored/_model",mnist
        model_platform: "tensorflow",
        model_version_policy: {
           all: {}
        }
    },
    config: {
        name: "inception",
        base_path: "/tmp/monitored/inception_model",
        model_platform: "tensorflow",
        model_version_policy: {
           latest: {
            num_versions: 2
           }
        }
    },
    config: {
        name: "mxnet",
        base_path: "/tmp/monitored/mxnet_model",
        model_platform: "tensorflow",
        model_version_policy: {
           specific: {
            versions: 1
           }
        }
    }
}

模型存儲支持

默認是本地存儲。

如今已經支持從HDFS加載模型：

--model_base_path=hdfs://xx.xx.xx.xx:zz/data/serving_model"]

S3 也增長了支持，可是須要本身在編譯tensorflow_model_server時添加支持，詳見連接：
https://github.com/tensorflow/serving/issues/669
https://github.com/tensorflow/serving/issues/615

API 實現機制

從 1.8.0 開始提供了兩種 API： gRPC 和 Restful，他們都是經過C++實現的。它們的職責很簡單也很清晰就是將 ServerCore 的功能和能力暴露出來對外使用。只是協議不通，核心的邏輯是複用的。
接口是否支持異步？目前支持 batching 的處理。Batching 設計詳見：https://github.com/tensorflow/serving/blob/master/tensorflow_serving/batching/README.md

模型訓練和導出

模型訓練過程就是Tensorflow模型訓練那一套，如今主要着重說一下如何把訓練好的模型導出成TFServing的標準格式。TFServing中模型是標準化的，模型的導出須要按照官方相應的規範導出才能在TFServing中被識別和加載。在官方提供的例子中，例如mnist_saved_model.py所示，加載模型以後，須要構建signature_def_map，而後以這種方式導出成TFserving可使用的格式。
須要注意區分prediction，classification和regression的不一樣map的定義，在以後的Restful API的調用中，須要和定義的signature_name及key保持一致。
構建完成以後，文件目錄爲這樣：

assets/
assets.extra/
variables/
    variables.data-?????-of-?????
    variables.index
saved_model.pb

模型的部署

有了模型後，就須要經過 TensorFlow Serving 來加載模型並提供服務。在這裏咱們是以容器的方式來運行 TensorFlow Serving，官方有已經構建好的鏡像。
先來看看 tensorflow_model_server 的使用：

/usr/local/bin/tensorflow_model_server --help
usage: tensorflow_model_server
Flags:
--port=8500 int32 Port to listen on for gRPC API
--rest_api_port=0 int32 Port to listen on for HTTP/REST API. If set to zero HTTP/REST API will not be exported. This port must be different than the one specified in --port.
--rest_api_num_threads=128 int32 Number of threads for HTTP/REST API processing. If not set, will be auto set based on number of CPUs.
--rest_api_timeout_in_ms=30000 int32 Timeout for HTTP/REST API calls.
--enable_batching=false bool enable batching
--batching_parameters_file="" string If non-empty, read an ascii BatchingParameters protobuf from the supplied file name and use the contained values instead of the defaults.
--model_config_file="" string If non-empty, read an ascii ModelServerConfig protobuf from the supplied file name, and serve the models in that file. This config file can be used to specify multiple models to serve and other advanced parameters including non-default version policy. (If used, --model_name, --model_base_path are ignored.)
--model_name="default" string name of model (ignored if --model_config_file flag is set
--model_base_path="" string path to export (ignored if --model_config_file flag is set, otherwise required)
--file_system_poll_wait_seconds=1 int32 interval in seconds between each poll of the file system for new model version
--flush_filesystem_caches=true bool If true (the default), filesystem caches will be flushed after the initial load of all servables, and after each subsequent individual servable reload (if the number of load threads is 1). This reduces memory consumption of the model server, at the potential cost of cache misses if model files are accessed after servables are loaded.
--tensorflow_session_parallelism=0 int64 Number of threads to use for running a Tensorflow session. Auto-configured by default.Note that this option is ignored if --platform_config_file is non-empty.
--platform_config_file="" string If non-empty, read an ascii PlatformConfigMap protobuf from the supplied file name, and use that platform config instead of the Tensorflow platform. (If used, --enable_batching is ignored.)
--per_process_gpu_memory_fraction=0.000000 float Fraction that each process occupies of the GPU memory space the value is between 0.0 and 1.0 (with 0.0 as the default) If 1.0, the server will allocate all the memory when the server starts, If 0.0, Tensorflow will automatically select a value.
--saved_model_tags="serve" string Comma-separated set of tags corresponding to the meta graph def to load from SavedModel.
--grpc_channel_arguments="" string A comma separated list of arguments to be passed to the grpc server. (e.g. grpc.max_connection_age_ms=2000)

鏡像

注意鏡像有 CPU 版本和 GPU 版本，根據須要來選擇。同時，若是官方給的鏡像知足不了需求，還能夠本身使用官方的Dockerfile來構建。不過默認的功能已經能知足大部分的需求，不多去對 TensorFlow Serving 自己添加功能，因此官方構建好的鏡像已經夠用。

官方鏡像的坑
使用官方默認的鏡像在加載模型時提示找不到可用的GPU設備，問題出在環境變量 LD_LIBRARY_PATH 的設置上。
默認官方鏡像爲：
LD_LIBRARY_PATH=/usr/local/cuda/lib64/stubs:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
須要修改成：
LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64/stubs:/usr/local/cuda/extras/CUPTI/lib64
默認搜索路徑是順序搜索，搜索到後就再也不日後匹配。使用/usr/local/cuda/lib64/stubs:/usr/local/cuda/extras/CUPTI/lib64中的libcuda.so庫沒辦法驅動GPU設備。

模型存儲及策略配置

模型的存儲位置及策略能夠經過 --model_config_file 參數指定
model_version_policy目前支持三種選項：

all: {} 表示加載全部發現的model；
latest: { num_versions: n } 表示只加載最新的那n個model，也是默認選項；
specific: { versions: m } 表示只加載指定versions的model，一般用來測試；
model_config_file 配置文件樣例：

model_config_list: {
    config: {
        name: "mnist",
        base_path: "/tmp/monitored/_model",mnist
        model_platform: "tensorflow",
        model_version_policy: {
           all: {}
        }
    },
    config: {
        name: "inception",
        base_path: "/tmp/monitored/inception_model",
        model_platform: "tensorflow",
        model_version_policy: {
           latest: {
            num_versions: 2
           }
        }
    },
    config: {
        name: "mxnet",
        base_path: "/tmp/monitored/mxnet_model",
        model_platform: "tensorflow",
        model_version_policy: {
           specific: {
            versions: 1
           }
        }
    }
}

這裏很少解釋，很容易看明白。

啓動服務

這裏以在本地容器方式啓動。
模型文件存放在 /data/ 目錄下，內容以下

-rw-r--r-- 1 root root 235 Jul 10 17:00 model-config.json
drwxr-xr-x 1 root root  15 Jul 10 11:15 models
 
model-config.json 是要使用的 model-config-file。
models 目錄下都是一些官方的模型例子。
 
[root /mnt/zzzc]# ll models/
total 12
drwxr-xr-x 1 root root    1 Jul  6 12:15 bad_half_plus_two
-rw-r--r-- 1 root root   25 Jul  6 12:15 bad_model_config.txt
-rw-r--r-- 1 root root  135 Jul  6 12:15 batching_config.txt
-rw-r--r-- 1 root root 2205 Jul  6 12:15 BUILD
-rw-r--r-- 1 root root 1988 Jul  6 12:15 export_bad_half_plus_two.py
-rw-r--r-- 1 root root 3831 Jul  6 12:15 export_counter.py
-rw-r--r-- 1 root root 1863 Jul  6 12:15 export_half_plus_two.py
-rw-r--r-- 1 root root  268 Jul  6 12:15 good_model_config.txt
drwxr-xr-x 1 root root    1 Jul  6 12:15 half_plus_two
drwxr-xr-x 1 root root    2 Jul  6 12:15 half_plus_two_2_versions
drwxr-xr-x 1 root root    1 Jul 10 11:22 huaping
drwxr-xr-x 1 root root    1 Jul  9 18:31 porn
drwxr-xr-x 1 root root    1 Jul  6 12:15 saved_model_counter
drwxr-xr-x 1 root root    1 Jul  6 12:15 saved_model_half_plus_three
drwxr-xr-x 1 root root    2 Jul  6 12:15 saved_model_half_plus_two_2_versions

model-config.json 內容以下（這裏我以json文件後綴命名，這個都行）：

# cat model-config.json
model_config_list: {
    config: {
        name: "half_plus_three",
        base_path: "/data/models/saved_model_half_plus_two_2_versions",
        model_platform: "tensorflow",
        model_version_policy: {
            all: {}
        }
    }
}

saved_model_half_plus_two_2_versions 包含了 2 個版本（123和124）模型，這裏也是演示的加載多個版本模型的場景。

[root@dlgpu12 /mnt/zzzc/models/saved_model_half_plus_two_2_versions]# tree
.
├── 00000123
│   ├── assets
│   │   └── foo.txt
│   ├── saved_model.pb
│   └── variables
│       ├── variables.data-00000-of-00001
│       └── variables.index
└── 00000124
    ├── assets
    │   └── foo.txt
    ├── saved_model.pb
    └── variables
        ├── variables.data-00000-of-00001
        └── variables.index

咱們將這個目錄掛載到容器內部（線上能夠經過 CephFs 數據卷的方式使用）

nvidia-docker run -it --rm --entrypoint="/usr/local/bin/tensorflow_model_server" \
                        -v /data:/data \
                        tensorflow-serving:1.8.0-devel-gpu \
                        --model_config_file=/data/model-config.json

這樣，咱們就經過TensorFlow Serving把模型給跑起來了。
這裏只是指定了 model_config_file，其它的配置能夠根據實際須要使用。

Client 端的接入

Client端的接入有兩種方式： gRPC 和 Restful。
具體使用可詳見：
https://www.tensorflow.org/serving/serving_inception
https://www.tensorflow.org/serving/api_rest

模型的動態更新和發佈

模型的動態更新和發佈是經過 model-config-file配置來實現的，默認將新版本模型放到指定的目錄下便可，系統會自動掃描並加載新的模型。

微服務網關(kong)與容器服務

背景

公司的一些團隊想將他們的服務已容器的方式進行快速部署交付。對於一些簡單的單體應用，直接經過公司的負載均衡就能夠了。可是若是想要將多個小的應用(如:人臉檢測，圖片檢測等)最終歸類爲一個大的應用來對外提供服務的話，就須要網關來作這件事情。基於該背景，咱們調研了當下比較流行的微服務網關(Kong)。

什麼是微服務架構

微服務是一種構建軟件的架構和方法。在微服務中將之前的單體應用被拆分紅多個小的組件，並彼此獨立。不一樣於將全部組件內置於一個架構中的傳統單體式應用的構建方法，在微服務架構中，全部的部分都是相互獨立的(可使用不一樣的語言，不一樣團隊來開發不一樣的服務模塊)。經過合做來完成相同的任務。其中的每個組件或流程都是微服務。總結微服務的特色就是:更小, 更快， 更強。

可能經過上面對微服務的描述仍是不是特別的直觀，將傳統的單體應用架構和微服務架構進行下比較，就比較直觀了。

1.單體應用架構

最先對於web程序的開發(好比JAVA)，一般將整個程序打包到一個WAR文件中，而後直接部署到服務器便可。

單體應用架構易於測試和部署，可是在服務的可伸縮性，可靠性， 系統迭代， 跨語言程序， 團隊協做等方便沒有微服務方便。

2.微服務架構

爲了解決單體應用架構的這些諸多弊端(不是說單體應用架構很差，須要根據不一樣的業務場景選擇不一樣的服務架構)。能夠將單體應用架構拆分紅多個獨立的小的組件。
這樣就能夠每一個團隊使用本身的技術棧來實現本身的組件，並在系統迭代的時候獨立的進行迭代而不影響整個應用的總體使用。