BERT 服務化 bert-as-service

bert-as-service 用 BERT 做爲句子編碼器, 並經過 ZeroMQ 服務託管, 只需兩行代碼就能夠將句子映射成固定長度的向量表示;html

準備

windows10 + python3.5 + tensorflow1.2.1前端

安裝流程

  1. 安裝 tensorflow, 參考
  2. 安裝 bert-as-service

bert-as-service, 依賴於 python≥3.5 AND tensorflow≥1.10;python

pip install bert-serving-server
pip instlal bert-serving-client
  1. 下載中文 bert 預訓練的模型git

    BERT-Base, Uncased 12-layer, 768-hidden, 12-heads, 110M parameters
    BERT-Large, Uncased 24-layer, 1024-hidden, 16-heads, 340M parameters
    BERT-Base, Cased 12-layer, 768-hidden, 12-heads , 110M parameters
    BERT-Large, Cased 24-layer, 1024-hidden, 16-heads, 340M parameters
    BERT-Base, Multilingual Cased (New) 104 languages, 12-layer, 768-hidden, 12-heads, 110M parameters
    BERT-Base, Multilingual Cased (Old) 102 languages, 12-layer, 768-hidden, 12-heads, 110M parameters
    BERT-Base, Chinese Chinese Simplified and Traditional, 12-layer, 768-hidden, 12-heads, 110M parameters
  2. 啓動 bert-as-serving 服務github

bert-serving-start -model_dir /tmp/english_L-12_H-768_A-12/ -num_worker=2  //模型路徑自改
usage: xxxx\Anaconda3\envs\py35\Scripts\bert-serving-start -model_dir D:\env\bert\chinese_L-12_H-768_A-12 -num_worker=2
                 ARG   VALUE
__________________________________________________
           ckpt_name = bert_model.ckpt
         config_name = bert_config.json
                cors = *
                 cpu = False
          device_map = []
       do_lower_case = True
  fixed_embed_length = False
                fp16 = False
 gpu_memory_fraction = 0.5
       graph_tmp_dir = None
    http_max_connect = 10
           http_port = None
        mask_cls_sep = False
      max_batch_size = 256
         max_seq_len = 25
           model_dir = D:\env\bert\chinese_L-12_H-768_A-12
no_position_embeddings = False
    no_special_token = False
          num_worker = 2
       pooling_layer = [-2]
    pooling_strategy = REDUCE_MEAN
                port = 5555
            port_out = 5556
       prefetch_size = 10
 priority_batch_size = 16
show_tokens_to_client = False
     tuned_model_dir = None
             verbose = False
                 xla = False

I:VENTILATOR:freeze, optimize and export graph, could take a while...
I:GRAPHOPT:model config: D:\env\bert\chinese_L-12_H-768_A-12\bert_config.json
I:GRAPHOPT:checkpoint: D:\env\bert\chinese_L-12_H-768_A-12\bert_model.ckpt
I:GRAPHOPT:build graph...
I:GRAPHOPT:load parameters from checkpoint...
I:GRAPHOPT:optimize...
I:GRAPHOPT:freeze...
I:GRAPHOPT:write graph to a tmp file: C:\Users\Memento\AppData\Local\Temp\tmpo07002um
I:VENTILATOR:bind all sockets
I:VENTILATOR:open 8 ventilator-worker sockets
I:VENTILATOR:start the sink
I:SINK:ready
I:VENTILATOR:get devices
W:VENTILATOR:no GPU available, fall back to CPU
I:VENTILATOR:device map:
                worker  0 -> cpu
                worker  1 -> cpu
I:WORKER-0:use device cpu, load graph from C:\Users\Memento\AppData\Local\Temp\tmpo07002um
I:WORKER-1:use device cpu, load graph from C:\Users\Memento\AppData\Local\Temp\tmpo07002um
I:WORKER-0:ready and listening!
I:WORKER-1:ready and listening!
I:VENTILATOR:all set, ready to serve request!
  1. 用 python 模擬調用 bert-as-service 服務
bc = BertClient(ip="localhost", check_version=False, check_length=False)
vec = bc.encode(['你好', '你好呀', '我很好'])
print(vec)

輸出結果:shell

[[ 0.2894022  -0.13572647  0.07591158 ... -0.14091237  0.54630077
  -0.30118054]
 [ 0.4535432  -0.03180456  0.3459639  ... -0.3121457   0.42606848
  -0.50814617]
 [ 0.6313594  -0.22302179  0.16799903 ... -0.1614125   0.23098437
  -0.5840646 ]]

亮點

  • 🔭 State-of-the-art: build on pretrained 12/24-layer BERT models released by Google AI, which is considered as a milestone in the NLP community.
  • 🐣 Easy-to-use: require only two lines of code to get sentence/token-level encodes.
  • Fast: 900 sentences/s on a single Tesla M40 24GB. Low latency, optimized for speed. See benchmark.
  • 🐙 Scalable: scale nicely and smoothly on multiple GPUs and multiple clients without worrying about concurrency. See benchmark.
  • 💎 Reliable: tested on multi-billion sentences; days of running without a break or OOM or any nasty exceptions.

可視化監控

啓動服務時加入參數 -http_port 8081 便可經過 8081 端口對外提供查詢服務;json

請求 http://localhost:8081/status/server 能夠查看到服務的狀態:windows

{
    "ckpt_name": "bert_model.ckpt",
    "client": "7a033047-f177-45fd-9ef5-45781b10d322",
    "config_name": "bert_config.json",
    "cors": "*",
    "cpu": false,
    "device_map": [],
    "do_lower_case": true,
    "fixed_embed_length": false,
    "fp16": false,
    "gpu_memory_fraction": 0.5,
    "graph_tmp_dir": null,
    "http_max_connect": 10,
    "http_port": 8081,
    "mask_cls_sep": false,
    "max_batch_size": 256,
    "max_seq_len": 25,
    "model_dir": "D:\\env\\bert\\chinese_L-12_H-768_A-12",
    "no_position_embeddings": false,
    "no_special_token": false,
    "num_concurrent_socket": 8,
    "num_process": 3,
    "num_worker": 1,
    "pooling_layer": [
        -2
    ],
    "pooling_strategy": 2,
    "port": 5555,
    "port_out": 5556,
    "prefetch_size": 10,
    "priority_batch_size": 16,
    "python_version": "3.5.6 |Anaconda, Inc.| (default, Aug 26 2018, 16:05:27) [MSC v.1900 64 bit (AMD64)]",
    "pyzmq_version": "20.0.0",
    "server_current_time": "2021-03-03 15:53:03.859211",
    "server_start_time": "2021-03-03 10:00:21.128310",
    "server_version": "1.10.0",
    "show_tokens_to_client": false,
    "statistic": {
        "avg_last_two_interval": 1665.306127225,
        "avg_request_per_client": 8.333333333333334,
        "avg_request_per_second": 0.09246377980293276,
        "avg_size_per_request": 102.58333333333333,
        "max_last_two_interval": 17484.7365829,
        "max_request_per_client": 53,
        "max_request_per_second": 0.9194538223647459,
        "max_size_per_request": 601,
        "min_last_two_interval": 1.087602199997491,
        "min_request_per_client": 2,
        "min_request_per_second": 0.00005719274038008647,
        "min_size_per_request": 1,
        "num_active_client": 0,
        "num_data_request": 12,
        "num_max_last_two_interval": 1,
        "num_max_request_per_client": 1,
        "num_max_request_per_second": 1,
        "num_max_size_per_request": 1,
        "num_min_last_two_interval": 1,
        "num_min_request_per_client": 6,
        "num_min_request_per_second": 1,
        "num_min_size_per_request": 1,
        "num_sys_request": 63,
        "num_total_client": 9,
        "num_total_request": 75,
        "num_total_seq": 1231
    },
    "status": 200,
    "tensorflow_version": [
        "1",
        "10",
        "0"
    ],
    "tuned_model_dir": null,
    "ventilator -> worker": [
        "tcp://127.0.0.1:52440",
        "tcp://127.0.0.1:52441",
        "tcp://127.0.0.1:52442",
        "tcp://127.0.0.1:52443",
        "tcp://127.0.0.1:52444",
        "tcp://127.0.0.1:52445",
        "tcp://127.0.0.1:52446",
        "tcp://127.0.0.1:52447"
    ],
    "ventilator <-> sink": "tcp://127.0.0.1:52439",
    "verbose": false,
    "worker -> sink": "tcp://127.0.0.1:52467",
    "xla": false,
    "zmq_version": "4.3.3"
}

而後作個可視化的前端呈現數據便可, 也能夠直接使用 bert-as-service 項目裏的 plugin/dashboard;api

bert-as-service monitor

參考:cors

  1. https://github.com/hanxiao/bert-as-service#monitoring-the-service-status-in-a-dashboard
  2. https://bert-as-service.readthedocs.io/en/latest/tutorial/add-monitor.html

QA

Q: 啓動 bert-as-service 服務提示缺乏 cudart64_100.dll dll 文件

A: 從網上下載個 dll 文件, 而後放置在 C:\Windows\System32 目錄下, 從新啓動命令行窗口執行命令便可;

Q: fail to optimize the graph!, TypeError: cannot unpack non-iterable NoneType object

A: 降級安裝 TF 1.10.0 版本; 確認 model 路徑是絕對路徑;

pip uninstall tensorflow
pip uninstall tensorflow-estimator
conda install --channel https://conda.anaconda.org/aaronzs tensorflow

參考:

  1. https://github.com/hanxiao/bert-as-service/issues/467
  2. https://blog.csdn.net/cktcrawl/article/details/103028725

參考資料

  1. Elasticsearch meets BERT
  2. windows下的啓動bert-serving-server
  3. bert+es7實現類似度搜索(待測試與更新bert中文預處理模型)
  4. bert-as-service
  5. Bert 中文使用方式
  6. 使用文檔
相關文章
相關標籤/搜索