elasticsearch裏面bulk的用法

時間 2019-11-17

原文原文鏈接

上篇文章介紹了在es裏面批量讀取數據的方法mget，本篇咱們來看下關於批量寫入的方法bulk。java

bulk api能夠在單個請求中一次執行多個索引或者刪除操做，使用這種方式能夠極大的提高索引性能。json

bulk的語法格式是：api

action and meta_data \n
optional source \n

action and meta_data \n
optional source \n

action and meta_data \n
optional source \n

從上面可以看到，兩行數據構成了一次操做，第一行是操做類型能夠index，create，update，或者delete，第二行就是咱們的可選的數據體，使用這種方式批量插入的時候，咱們須要設置的它的Content-Type爲application/json。app

針對不一樣的操做類型，第二行裏面的可選的數據體是不同的，以下：less

（1）index 和 create  第二行是source數據體
（2）delete 沒有第二行
（3）update 第二行能夠是partial doc，upsert或者是script

咱們能夠將咱們的操做直接寫入到一個文本文件中，而後使用curl命令把它發送到服務端：curl

一個requests文件內容以下：post

{ "index" : { "_index" : "test", "_type" : "_doc", "_id" : "1" } }
{ "field1" : "value1" }

發送命令以下：性能

curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/_bulk --data-binary "@requests"; echo

響應結果以下：url

{"took":7, "errors": false, "items":[{"index":{"_index":"test","_type":"_doc","_id":"1","_version":1,"result":"created","forced_refresh":false}}]}

注意因爲咱們每行必須有一個換行符，因此json格式只能在一行裏面而不能使用格式化後的內容，下面看一個正確的post bulk的請求數據體：code

{ "index" : { "_index" : "test", "_type" : "_doc", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_type" : "_doc", "_id" : "2" } }
{ "create" : { "_index" : "test", "_type" : "_doc", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_type" : "_doc", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }

bulk請求的返回操做的結果也是批量的，每個action都會有具體的應答體，來告訴你當前action是成功執行仍是失敗：

{
   "took": 30,
   "errors": false,
   "items": [
      {
         "index": {
            "_index": "test",
            "_type": "_doc",
            "_id": "1",
            "_version": 1,
            "result": "created",
            "_shards": {
               "total": 2,
               "successful": 1,
               "failed": 0
            },
            "status": 201,
            "_seq_no" : 0,
            "_primary_term": 1
         }
      },
      {
         "delete": {
            "_index": "test",
            "_type": "_doc",
            "_id": "2",
            "_version": 1,
            "result": "not_found",
            "_shards": {
               "total": 2,
               "successful": 1,
               "failed": 0
            },
            "status": 404,
            "_seq_no" : 1,
            "_primary_term" : 2
         }
      },
      {
         "create": {
            "_index": "test",
            "_type": "_doc",
            "_id": "3",
            "_version": 1,
            "result": "created",
            "_shards": {
               "total": 2,
               "successful": 1,
               "failed": 0
            },
            "status": 201,
            "_seq_no" : 2,
            "_primary_term" : 3
         }
      },
      {
         "update": {
            "_index": "test",
            "_type": "_doc",
            "_id": "1",
            "_version": 2,
            "result": "updated",
            "_shards": {
                "total": 2,
                "successful": 1,
                "failed": 0
            },
            "status": 200,
            "_seq_no" : 3,
            "_primary_term" : 4
         }
      }
   ]
}

bulk請求的路徑有三種和前面的mget的請求相似：

（1） /_bulk  

（2）/{index}/_bulk

（3）/{index}/{type}/_bulk

上面的三種格式，若是提供了index和type那麼在數據體裏面的action就能夠不提供，同理提供了index但沒有type，那麼就須要在數據體裏面本身添加type。

此外，還有幾個參數能夠用來控制一些操做：

（1）數據體裏面可使用_version字段

（2）數據體裏面可使用_routing字段

（3）能夠設置wait_for_active_shards參數，數據拷貝到多個shard以後才進行bulk操做

（4）refresh控制多久間隔多搜索可見

最後重點介紹下update操做，update操做在前面的文章也介紹過，es裏面提供了多種更新數據的方法如：

（1）doc
（2）upsert
（3）doc_as_upsert
（4）script
（5）params ，lang ，source

在bulk裏面的使用update方法和java api裏面相似，前面的文章也介紹過詳細的使用，如今咱們看下在bulk的使用方式：

POST _bulk
{ "update" : {"_id" : "1", "_type" : "_doc", "_index" : "index1", "retry_on_conflict" : 3} }
{ "doc" : {"field" : "value"} }
{ "update" : { "_id" : "0", "_type" : "_doc", "_index" : "index1", "retry_on_conflict" : 3} }
{ "script" : { "source": "ctx._source.counter += params.param1", "lang" : "painless", "params" : {"param1" : 1}}, "upsert" : {"counter" : 1}}
{ "update" : {"_id" : "2", "_type" : "_doc", "_index" : "index1", "retry_on_conflict" : 3} }
{ "doc" : {"field" : "value"}, "doc_as_upsert" : true }
{ "update" : {"_id" : "3", "_type" : "_doc", "_index" : "index1", "_source" : true} }
{ "doc" : {"field" : "value"} }
{ "update" : {"_id" : "4", "_type" : "_doc", "_index" : "index1"} }
{ "doc" : {"field" : "value"}, "_source": true}

其實就是非格式化的內容，放在一行而後提交就好了，不一樣之處在於前面的文章介紹的是單次請求，而使用bulk以後就能夠一次請求批量發送多個操做了。

總結：

本篇文章介紹了在es裏面bulk操做的用法，使用bulk操做咱們能夠批量的插入數據來提高寫入性能，但針對不一樣的action的它的數據格式體是不同的，這一點須要注意，同時在每行數據結束時必須加一個換行符，否則es是不能正確識別其格式的。