Elastic Search中Document的CRUD操做

時間 2019-12-02

標籤 elastic search document crud 欄目網站開發简体版

原文原文鏈接

1、新增Document
在索引中增長文檔。在index中增長document。
ES有自動識別機制。若是增長的document對應的index不存在。自動建立，若是index存在，type不存在自動建立。若是index和type都存在，則使用現有的。java

1.1 PUT語法
此操做爲手工指定id的Document新增方式。
PUT /index_name/type_name/id{field_name:field_value}
如：node

PUT /test_index/my_type/1
{
    "name":"test_doc_01",
    "remark":"first test elastic search",
    "order_no":1
}

PUT /test_index/my_type/2
{
    "name":"test_doc_02",
    "remark":"second test elastic search",
    "order_no":2
}

PUT /test_index/my_type/3
{
    "name":"test_doc_03",
    "remark":"third test elastic search",
    "order_no":3
}

結果：算法

{
    "_index": "test_index", 新增的document在什麼index中，
    "_type": "my_type", 新增的document在index中的哪個type中。
    "_id": "1", 指定的id是多少
    "_version": 1, document的版本是多少，版本從1開始遞增，每次寫操做都會+1
    "result": "created", 本次操做的結果，created建立，updated修改，deleted刪除
    "_shards": { 分片信息
        "total": 2, 分片數量只提示primary shard
        "successful": 1, 數據document必定只存放在index中的某一個primary shard中
        "failed": 0
    },
    "_seq_no": 0, 執行的序列號
    "_primary_term": 1 詞條比對。
}

1.2 POST語法
此操做爲ES自動生成id的新增Document方式。
語法：POST /index_name/type_name{fieldname:fieldvalue}
如：json

POST /test_index/my_type
{
    "name":"test_doc_04",
    "remark":"forth test elastic search",
    "order_no":4
}

注意：在ES中，一個index中的全部type類型的Document是存儲在一塊兒的，若是index中的不一樣的type之間的field差異太大，也會影響到磁盤的存儲結構和存儲空間的佔用。如：test_index中有test_type1和test_type2兩個不一樣的類型，type1中的document結構爲：{"_id":"1","f1":"v1","f2":"v2"}，type2中的document結構爲：{"_id":"2","f3":"v3","f4":"v4"}，那麼ES在存儲的時候，統一的存儲方式是{"_id":"1","f1":"v1","f2":"v2","f3":"","f4":""}, {"_id":"2","f1":"","f2":"","f3":"v3","f4","v4"}、建議，每一個index中存儲的document結構不要有太大的差異。儘可能控制在總計字段數據的10%之內。api

2、查詢Document
2.1 GET查詢
GET /index_name/type_name/id
如：併發

GET /test_index/my_type/1

結果：負載均衡

{
    "_index": "test_index",
    "_type": "my_type",
    "_id": "1",
    "_version": 1,
    "found": true,
    "_source": { 找到的document數據內容。
        "name": "test_doc_01",
        "remark": "first test elastic search",
        "order_no":1
    }
}

2.2 GET _mget批量查詢
批量查詢能夠提升查詢效率。推薦使用（相對於單數據查詢來講）。
語法：jvm

GET /_mget
{
    "docs" : [
        {
            "_index" : "value",
            "_type" : "value",
            "_id" : "value"
        },{}, {}
    ]
}

GET /index_name/_mget
{
    "docs" : [
        {
            "_type" : "value",
            "_id" : "value"
        }, {}, {}
    ]
}

GET /index_name/type_name/_mget
{
    "docs" : [
        {
            "_id" : "value1"
        },
        {
            "_id" : "value2"
        }
    ]
}

3、修改Document
3.1 替換Document（全量替換）
和新增的PUT語法是一致。
PUT /index_name/type_name/id{field_name:new_field_value}
要求新數據的字段信息和原數據的字段信息一致。也就是必須包括Document中的全部field才行。本操做至關於覆蓋操做。全量替換的過程當中，ES不會真的修改Document中的數據，而是標記ES中原有的Document爲deleted狀態，再建立一個新的Document來存儲數據，當ES中的數據量過大時，ES後臺回收deleted狀態的Document（現階段理解，後續課程中會詳細說明）。
如：性能

PUT /test_index/my_type/1
{
    "name":"new_test_doc_01",
    "remark":"first test elastic search",
    "order_no":1
}

結果：spa

{
    "_index": "test_index",
    "_type": "my_type",
    "_id": "1",
    "_version": 2,
    "result": "updated",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "_seq_no": 1,
    "_primary_term": 1
}

3.2 PUT語法強制新增
若是使用PUT語法對同一個Document執行屢次操做。是一種全量替換操做。若是須要ES輔助檢查PUT的Document是否已存在，可使用強制新增語法。使用強制新增語法時，若是Document的id在ES中已存在，則會報錯。（version conflict, document already exists）
語法：
PUT /index_name/type_name/id/_create
或
PUT /index_name/type_name/id?op_type=create。
如：

PUT /test_index/my_type/1/_create
{
    "name":"new_test_doc_01",
    "remark":"first test elastic search",
    "order_no":1
}

3.3 更新Document（partial update）
POST /index_name/type_name/id{field_name:field_value_for_update}
只更新某Document中的部分字段。這種更新方式也是標記原有數據爲deleted狀態，建立一個新的Document數據，將新的字段和未更新的原有字段組成這個新的Document，並建立。對比全量替換而言，只是操做上的方便，在底層執行上幾乎沒有區別。
如：

POST /test_index/my_type/1/_update
{
    "doc":{
        "name":" test_doc_01_for_update"
    }
}

結果：

{
    "_index": "test_index",
    "_type": "my_type",
    "_id": "1",
    "_version": 5,
    "result": "updated",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "_seq_no": 2,
    "_primary_term": 1
}

4、刪除Document
ES中執行刪除操做時，ES先標記Document爲deleted狀態，而不是直接物理刪除。當ES存儲空間不足或工做空閒時，纔會執行物理刪除操做。標記爲deleted狀態的數據不會被查詢搜索到。
ES中刪除index，也是標記。後續纔會執行物理刪除。全部的標記動做都是爲了NRT實現（近實時）。
DELETE /index_name/type_name/id
如：
DELETE /test_index/my_type/1
結果：

{
    "_index": "test_index",
    "_type": "my_type",
    "_id": "1",
    "_version": 6,
    "result": "deleted",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "_seq_no": 5,
    "_primary_term": 1
}

5、 bulk批量增刪改
使用bulk語法執行批量增刪改。語法格式以下：
POST /_bulk
{ "action_type" : { "metadata_name" : "metadata_value" } }
{ document datas | action datas }
語法中的action_type可選值爲：
create : 強制建立，至關於PUT /index_name/type_name/id/_create
index: 普通的PUT操做，至關於建立Document或全量替換
update: 更新操做（partial update）,至關於 POST /index_name/type_name/id/_update
delete: 刪除操做
案例以下：下述案例中將全部的操做語法分離了。能夠一次性執行增刪改的全部功能。最後的語法是批量操做語法。

POST /_bulk
{ "create" : { "_index" : "test_index" , "_type" : "my_type", "_id" : "1" } }
{ "field_name" : "field value" }

POST /_bulk
{ "index" : { "_index" : "test_index", "_type" : "my_type" , "_id" : "2" } }
{ "field_name" : "field value 2" }

POST /bulk
{ "update" : { "_index" : "test_index", "_type" : "my_type" , "_id" : 2", "_retry_on_conflict" : 3 } }
{ "doc" : { "field_name" : "partial update field value" } }

POST /_bulk
{ "delete" : { "_index" : "test_index", "_type" : "my_type", "_id" : "2" } }

POST /_bulk
{ "create" : { "_index" : "test_index" , "_type" : "my_type", "_id" : "10" } }
{ "field_name" : "field value" }
{ "index" : { "_index" : "test_index", "_type" : "my_type" , "_id" : "20" } }
{ "field_name" : "field value 2" }
{ "update" : { "_index" : "test_index", "_type" : "my_type" , "_id" : 20, "_retry_on_conflict" : 3 } }
{ "doc" : { "field_name" : "partial update field value" } }
{ "delete" : { "_index" : "test_index", "_type" : "my_type", "_id" : "2" } }

注意：bulk語法中要求一個完整的json串不能有換行。不一樣的json串必須使用換行分隔。多個操做中，若是有錯誤狀況，不會影響到其餘的操做，只會在批量操做返回結果中標記失敗。bulk語法批量操做時，bulk request會一次性加載到內存中，若是請求數據量太大，性能反而降低（內存壓力太高），須要反覆嘗試一個最佳的bulk request size。通常從1000~5000條數據開始嘗試，逐漸增長。若是查看bulk request size的話，通常是5~15MB之間爲好。
解釋：bulk語法要求json格式是爲了對內存的方便管理，和儘量下降內存的壓力。若是json格式沒有特殊的限制，ES在解釋bulk請求時，須要對任意格式的json進行解釋處理，須要對bulk請求數據作json對象會json array對象的轉化，那麼內存的佔用量至少翻倍，當請求量過大的時候，對內存的壓力會直線上升，且須要jvm gc進程對垃圾數據作頻繁回收，影響ES效率。
生成環境中，bulk api經常使用。都是使用java代碼實現循環操做。通常一次bulk請求，執行一種操做。如：批量新增10000條數據等。

6、 Document routing 機制
ES對Document的管理有一個路由算法，這種算法決定了Document存放在哪個primary shard中。算法爲：primary shard = hash(routing) % number_of_primary_shards。其中的routing默認爲Document中的元數據_id，也能夠手工指定routing的值，指定方式爲：PUT /index_name/type_name/id?routing=xxx。手工指定routing在海量數據中很是有用，經過手工指定的routing，ES會將相關聯的Document存儲在同一個shard中，方便後期進行應用級別的負載均衡並能夠提升數據檢索的效率。如：存電商中的商品，使用商品類型的編號做爲routing，ES會把同一個類型的商品document數據，存在同一個shard中。查詢的時候，同一個類型的商品，在一個shard上查詢，效率最高。
若是是寫操做。計算routing結果後，決定本次寫操做定位到哪個primary shard分片上，primary shard 分片寫成功後，自動同步到對應replica shard上。若是是讀操做，計算routing結果後，決定本次讀操做定位到哪個primary shard 或其對應的replica shard上。實現讀負載均衡，replica shard數量越多，併發讀能力越強。
PUT /test_index/my_type/10?routing=type_id{}

7、 Document增刪改原理簡圖

解釋：
1 ：客戶端發起請求，執行增刪改操做。全部的增刪改操做都由primary shard直接處理，replica shard只被動的備份數據。此操做請求到節點2（請求發送到的節點隨機），這個節點稱爲協調節點（coordinate node）。
2 ：協調節點經過路由算法，計算出本次操做的Document所在的shard。假設本次操做的Document所在shard爲 primary shard 0。協調節點計算後，會將操做請求轉發到節點1。
3 ：節點1中的primary shard 0在處理請求後，會將數據的變化同步到對應的replica shard 0中，也就是發送一個同步數據的請求到節點3中。
4 ： replica shard 0在同步數據後，會響應通知請求這同步成功，也就是響應給primary shard 0（節點1）。
5 ： primary shard 0（節點1）接收到replica shard 0的同步成功響應後，會響應請求者，本次操做完成。也就是響應給協調節點（節點2）。
6 ：協調節點返回響應給客戶端，通知操做結果。

8、 Document查詢簡圖

解釋：1 ：客戶端發起請求，執行查詢操做。查詢操做都由primary shard和replica shard共同處理。此操做請求到節點2（請求發送到的節點隨機），這個節點稱爲協調節點（coordinate node）。2 ：協調節點經過路由算法，計算出本次查詢的Document所在的shard。假設本次查詢的Document所在shard爲 shard 0。協調節點計算後，會將操做請求轉發到節點1或節點3。分配請求到節點1仍是節點3經過隨機算法計算，ES會保證當請求量足夠大的時候，primary shard和replica shard處理的查詢請求數是均等的（是不絕對一致）。3 ：節點1或節點3中的primary shard 0或replica shard 0在處理請求後，會將查詢結果返回給協調節點（節點2）。4 ：協調節點獲得查詢結果後，再將查詢結果返回給客戶端。

1. Elastic Search對Document的搜索
2. Document以及Document CRUD操作
3. Elastic Search
4. elastic search
5. JPA 中的CRUD操做
6. Elastic Search的學習
7. iframe的document操做
8. 【Elastic Search學習總結】6. Kibana實現ES索引的CRUD
9. Elastic Search 初探
10. elastic search介紹
更多相關文章...
• Java 中操作 R - R 語言教程
• Docker search 命令 - Docker命令大全
• C# 中 foreach 遍歷的用法
• Scala 中文亂碼解決

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。