ES--03

時間 2019-11-09

原文原文鏈接

第二十一講！node

一、上機動手實戰演練基於_version進行樂觀鎖併發控制mysql

（1）先構造一條數據出來算法

PUT /test_index/test_type/7
{
"test_field": "test test"
}sql

（2）模擬兩個客戶端，都獲取到了同一條數據數據庫

GET test_index/test_type/7json

{
"_index": "test_index",
"_type": "test_type",
"_id": "7",
"_version": 1,
"found": true,
"_source": {
"test_field": "test test"
}
}api

（3）其中一個客戶端，先更新了一下這個數據網絡

同時帶上數據的版本號，確保說，es中的數據的版本號，跟客戶端中的數據的版本號是相同的，才能修改數據結構

PUT /test_index/test_type/7?version=1
{
"test_field": "test client 1"
}多線程

{
"_index": "test_index",
"_type": "test_type",
"_id": "7",
"_version": 2,
"result": "updated",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"created": false
}

（4）另一個客戶端，嘗試基於version=1的數據去進行修改，一樣帶上version版本號，進行樂觀鎖的併發控制

PUT /test_index/test_type/7?version=1
{
"test_field": "test client 2"
}

{
"error": {
"root_cause": [
{
"type": "version_conflict_engine_exception",
"reason": "[test_type][7]: version conflict, current version [2] is different than the one provided [1]",
"index_uuid": "6m0G7yx7R1KECWWGnfH1sw",
"shard": "3",
"index": "test_index"
}
],
"type": "version_conflict_engine_exception",
"reason": "[test_type][7]: version conflict, current version [2] is different than the one provided [1]",
"index_uuid": "6m0G7yx7R1KECWWGnfH1sw",
"shard": "3",
"index": "test_index"
},
"status": 409
}

（5）在樂觀鎖成功阻止併發問題以後，嘗試正確的完成更新

GET /test_index/test_type/7

{
"_index": "test_index",
"_type": "test_type",
"_id": "7",
"_version": 2,
"found": true,
"_source": {
"test_field": "test client 1"
}
}

基於最新的數據和版本號，去進行修改，修改後，帶上最新的版本號，可能這個步驟會須要反覆執行好幾回，才能成功，特別是在多線程併發更新同一條數據很頻繁的狀況下

PUT /test_index/test_type/7?version=2
{
"test_field": "test client 2"
}

{
"_index": "test_index",
"_type": "test_type",
"_id": "7",
"_version": 3,
"result": "updated",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"created": false
}

帶上版本號更新

樂觀鎖生效

基於最新的數據和版本號去更新修改

第二十二講！

課程大綱

一、上機動手實戰演練基於external version進行樂觀鎖併發控制

external version

es提供了一個feature，就是說，你能夠不用它提供的內部_version版本號來進行併發控制，能夠基於你本身維護的一個版本號來進行併發控制。舉個列子，加入你的數據在mysql裏也有一份，而後你的應用系統自己就維護了一個版本號，不管是什麼本身生成的，程序控制的。這個時候，你進行樂觀鎖併發控制的時候，可能並非想要用es內部的_version來進行控制，而是用你本身維護的那個version來進行控制。

?version=1
?version=1&version_type=external

version_type=external，惟一的區別在於，_version，只有當你提供的version與es中的_version如出一轍的時候，才能夠進行修改，只要不同，就報錯；當version_type=external的時候，只有當你提供的version比es中的_version大的時候，才能完成修改

es，_version=1，?version=1，才能更新成功
es，_version=1，?version>1&version_type=external，才能成功，好比說?version=2&version_type=external

（1）先構造一條數據

PUT /test_index/test_type/8
{
"test_field": "test"
}

{
"_index": "test_index",
"_type": "test_type",
"_id": "8",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"created": true
}

（2）模擬兩個客戶端同時查詢到這條數據

GET /test_index/test_type/8

{
"_index": "test_index",
"_type": "test_type",
"_id": "8",
"_version": 1,
"found": true,
"_source": {
"test_field": "test"
}
}

（3）第一個客戶端先進行修改，此時客戶端程序是在本身的數據庫中獲取到了這條數據的最新版本號，好比說是2

PUT /test_index/test_type/8?version=2&version_type=external
{
"test_field": "test client 1"
}

{
"_index": "test_index",
"_type": "test_type",
"_id": "8",
"_version": 2,
"result": "updated",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"created": false
}

（4）模擬第二個客戶端，同時拿到了本身數據庫中維護的那個版本號，也是2，同時基於version=2發起了修改

PUT /test_index/test_type/8?version=2&version_type=external
{
"test_field": "test client 2"
}

{
"error": {
"root_cause": [
{
"type": "version_conflict_engine_exception",
"reason": "[test_type][8]: version conflict, current version [2] is higher or equal to the one provided [2]",
"index_uuid": "6m0G7yx7R1KECWWGnfH1sw",
"shard": "1",
"index": "test_index"
}
],
"type": "version_conflict_engine_exception",
"reason": "[test_type][8]: version conflict, current version [2] is higher or equal to the one provided [2]",
"index_uuid": "6m0G7yx7R1KECWWGnfH1sw",
"shard": "1",
"index": "test_index"
},
"status": 409
}

（5）在併發控制成功後，從新基於最新的版本號發起更新

GET /test_index/test_type/8

{
"_index": "test_index",
"_type": "test_type",
"_id": "8",
"_version": 2,
"found": true,
"_source": {
"test_field": "test client 1"
}
}

PUT /test_index/test_type/8?version=3&version_type=external
{
"test_field": "test client 2"
}

{
"_index": "test_index",
"_type": "test_type",
"_id": "8",
"_version": 3,
"result": "updated",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"created": false
}

第二十三講！

課程大綱

一、什麼是partial update？

PUT /index/type/id，建立文檔&替換文檔，就是同樣的語法

通常對應到應用程序中，每次的執行流程基本是這樣的：

（1）應用程序先發起一個get請求，獲取到document，展現到前臺界面，供用戶查看和修改
（2）用戶在前臺界面修改數據，發送到後臺
（3）後臺代碼，會將用戶修改的數據在內存中進行執行，而後封裝好修改後的全量數據
（4）而後發送PUT請求，到es中，進行全量替換
（5）es將老的document標記爲deleted，而後從新建立一個新的document

partial update

post /index/type/id/_update
{
"doc": {
"要修改的少數幾個field便可，不須要全量的數據"
}
}

看起來，好像就比較方便了，每次就傳遞少數幾個發生修改的field便可，不須要將全量的document數據發送過去

二、圖解partial update實現原理以及其優勢

partial update，看起來很方便的操做，實際內部的原理是什麼樣子的，而後它的優勢是什麼

三、上機動手實戰演練partial update

PUT /test_index/test_type/10
{
"test_field1": "test1",
"test_field2": "test2"
}

POST /test_index/test_type/10/_update
{
"doc": {
"test_field2": "updated test2"
}
}

查詢修改和寫回都發生在shard內部

第二十四講！

課程大綱

es，實際上是有個內置的腳本支持的，能夠基於groovy腳本實現各類各樣的複雜操做
基於groovy腳本，如何執行partial update
es scripting module，咱們會在高手進階篇去講解，這裏就只是初步講解一下

PUT /test_index/test_type/11
{
"num": 0,
"tags": []
}

（1）內置腳本

POST /test_index/test_type/11/_update
{
"script" : "ctx._source.num+=1"
}

{
"_index": "test_index",
"_type": "test_type",
"_id": "11",
"_version": 2,
"found": true,
"_source": {
"num": 1,
"tags": []
}
}

（2）外部腳本

ctx._source.tags+=new_tag

POST /test_index/test_type/11/_update
{
"script": {
"lang": "groovy",
"file": "test-add-tags",
"params": {
"new_tag": "tag1"
}
}
}

（3）用腳本刪除文檔

ctx.op = ctx._source.num == count ? 'delete' : 'none'

POST /test_index/test_type/11/_update
{
"script": {
"lang": "groovy",
"file": "test-delete-document",
"params": {
"count": 1
}
}
}

（4）upsert操做

POST /test_index/test_type/11/_update
{
"doc": {
"num": 1
}
}

{
"error": {
"root_cause": [
{
"type": "document_missing_exception",
"reason": "[test_type][11]: document missing",
"index_uuid": "6m0G7yx7R1KECWWGnfH1sw",
"shard": "4",
"index": "test_index"
}
],
"type": "document_missing_exception",
"reason": "[test_type][11]: document missing",
"index_uuid": "6m0G7yx7R1KECWWGnfH1sw",
"shard": "4",
"index": "test_index"
},
"status": 404
}

若是指定的document不存在，就執行upsert中的初始化操做；若是指定的document存在，就執行doc或者script指定的partial update操做

POST /test_index/test_type/11/_update
{
"script" : "ctx._source.num+=1",
"upsert": {
"num": 0,
"tags": []
}
}

第二十五講！

課程大綱

（1）partial update內置樂觀鎖併發控制
（2）retry_on_conflict
（3）_version

post /index/type/id/_update?retry_on_conflict=5&version=6

retry策略：

　　再次獲取 document數據和最新版本號

第二十六講！

課程大綱

一、批量查詢的好處

就是一條一條的查詢，好比說要查詢100條數據，那麼就要發送100次網絡請求，這個開銷仍是很大的
若是進行批量查詢的話，查詢100條數據，就只要發送1次網絡請求，網絡請求的性能開銷縮減100倍

二、mget的語法

（1）一條一條的查詢

GET /test_index/test_type/1
GET /test_index/test_type/2

（2）mget批量查詢

GET /_mget
{
"docs" : [
{
"_index" : "test_index",
"_type" : "test_type",
"_id" : 1
},
{
"_index" : "test_index",
"_type" : "test_type",
"_id" : 2
}
]
}

{
"docs": [
{
"_index": "test_index",
"_type": "test_type",
"_id": "1",
"_version": 2,
"found": true,
"_source": {
"test_field1": "test field1",
"test_field2": "test field2"
}
},
{
"_index": "test_index",
"_type": "test_type",
"_id": "2",
"_version": 1,
"found": true,
"_source": {
"test_content": "my test"
}
}
]
}

（3）若是查詢的document是一個index下的不一樣type種的話

GET /test_index/_mget
{
"docs" : [
{
"_type" : "test_type",
"_id" : 1
},
{
"_type" : "test_type",
"_id" : 2
}
]
}

（4）若是查詢的數據都在同一個index下的同一個type下，最簡單了

GET /test_index/test_type/_mget
{
"ids": [1, 2]
}

三、mget的重要性

能夠說mget是很重要的，通常來講，在進行查詢的時候，若是一次性要查詢多條數據的話，那麼必定要用batch批量操做的api
儘量減小網絡開銷次數，可能能夠將性能提高數倍，甚至數十倍，很是很是之重要

第二十七講！

課程大綱

一、bulk語法

POST /_bulk
{ "delete": { "_index": "test_index", "_type": "test_type", "_id": "3" }}
{ "create": { "_index": "test_index", "_type": "test_type", "_id": "12" }}
{ "test_field": "test12" }
{ "index": { "_index": "test_index", "_type": "test_type", "_id": "2" }}
{ "test_field": "replaced test2" }
{ "update": { "_index": "test_index", "_type": "test_type", "_id": "1", "_retry_on_conflict" : 3} }
{ "doc" : {"test_field2" : "bulk test1"} }

每個操做要兩個json串，語法以下：

{"action": {"metadata"}}
{"data"}

舉例，好比你如今要建立一個文檔，放bulk裏面，看起來會是這樣子的：

{"index": {"_index": "test_index", "_type", "test_type", "_id": "1"}}
{"test_field1": "test1", "test_field2": "test2"}

有哪些類型的操做能夠執行呢？
（1）delete：刪除一個文檔，只要1個json串就能夠了
（2）create：PUT /index/type/id/_create，強制建立
（3）index：普通的put操做，能夠是建立文檔，也能夠是全量替換文檔
（4）update：執行的partial update操做

bulk api對json的語法，有嚴格的要求，每一個json串不能換行，只能放一行，同時一個json串和一個json串之間，必須有一個換行

{
"error": {
"root_cause": [
{
"type": "json_e_o_f_exception",
"reason": "Unexpected end-of-input: expected close marker for Object (start marker at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@5a5932cd; line: 1, column: 1])\n at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@5a5932cd; line: 1, column: 3]"
}
],
"type": "json_e_o_f_exception",
"reason": "Unexpected end-of-input: expected close marker for Object (start marker at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@5a5932cd; line: 1, column: 1])\n at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@5a5932cd; line: 1, column: 3]"
},
"status": 500
}

{
"took": 41,
"errors": true,
"items": [
{
"delete": {
"found": true,
"_index": "test_index",
"_type": "test_type",
"_id": "10",
"_version": 3,
"result": "deleted",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"status": 200
}
},
{
"create": {
"_index": "test_index",
"_type": "test_type",
"_id": "3",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"created": true,
"status": 201
}
},
{
"create": {
"_index": "test_index",
"_type": "test_type",
"_id": "2",
"status": 409,
"error": {
"type": "version_conflict_engine_exception",
"reason": "[test_type][2]: version conflict, document already exists (current version [1])",
"index_uuid": "6m0G7yx7R1KECWWGnfH1sw",
"shard": "2",
"index": "test_index"
}
}
},
{
"index": {
"_index": "test_index",
"_type": "test_type",
"_id": "4",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"created": true,
"status": 201
}
},
{
"index": {
"_index": "test_index",
"_type": "test_type",
"_id": "2",
"_version": 2,
"result": "updated",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"created": false,
"status": 200
}
},
{
"update": {
"_index": "test_index",
"_type": "test_type",
"_id": "1",
"_version": 3,
"result": "updated",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"status": 200
}
}
]
}

bulk操做中，任意一個操做失敗，是不會影響其餘的操做的，可是在返回結果裏，會告訴你異常日誌

POST /test_index/_bulk
{ "delete": { "_type": "test_type", "_id": "3" }}
{ "create": { "_type": "test_type", "_id": "12" }}
{ "test_field": "test12" }
{ "index": { "_type": "test_type" }}
{ "test_field": "auto-generate id test" }
{ "index": { "_type": "test_type", "_id": "2" }}
{ "test_field": "replaced test2" }
{ "update": { "_type": "test_type", "_id": "1", "_retry_on_conflict" : 3} }
{ "doc" : {"test_field2" : "bulk test1"} }

POST /test_index/test_type/_bulk
{ "delete": { "_id": "3" }}
{ "create": { "_id": "12" }}
{ "test_field": "test12" }
{ "index": { }}
{ "test_field": "auto-generate id test" }
{ "index": { "_id": "2" }}
{ "test_field": "replaced test2" }
{ "update": { "_id": "1", "_retry_on_conflict" : 3} }
{ "doc" : {"test_field2" : "bulk test1"} }

二、bulk size最佳大小

bulk request會加載到內存裏，若是太大的話，性能反而會降低，所以須要反覆嘗試一個最佳的bulk size。通常從1000~5000條數據開始，嘗試逐漸增長。另外，若是看大小的話，最好是在5~15MB之間。

第二十八講！

課程大綱

一、階段性總結

1~8講：快速入門了一下，最基本的原理，最基本的操做
9~13講：在入門以後，對ES的分佈式的基本原理，進行了相對深刻一些的剖析
14~27講：圍繞着document這個東西，進行操做，進行講解和分析

二、什麼是distributed document store

到目前爲止，你以爲你在學什麼東西，給你們一個直觀的感受，好像已經知道了es是分佈式的，包括一些基本的原理，而後花了很多時間在學習document自己相關的操做，增刪改查。一句話點出來，給你們概括總結一下，其實咱們應該思考一下，es的一個最最核心的功能，已經被咱們相對完整的講完了。

Elasticsearch在跑起來之後，其實起到的第一個最核心的功能，就是一個分佈式的文檔數據存儲系統。ES是分佈式的。文檔數據存儲系統。文檔數據，存儲系統。
文檔數據：es能夠存儲和操做json文檔類型的數據，並且這也是es的核心數據結構。
存儲系統：es能夠對json文檔類型的數據進行存儲，查詢，建立，更新，刪除，等等操做。其實已經起到了一個什麼樣的效果呢？其實ES知足了這些功能，就能夠說已是一個NoSQL的存儲系統了。

圍繞着document在操做，其實就是把es當成了一個NoSQL存儲引擎，一個能夠存儲文檔類型數據的存儲系統，在操做裏面的document。

es能夠做爲一個分佈式的文檔存儲系統，因此說，咱們的應用系統，是否是就能夠基於這個概念，去進行相關的應用程序的開發了。

什麼類型的應用程序呢？

（1）數據量較大，es的分佈式本質，能夠幫助你快速進行擴容，承載大量數據
（2）數據結構靈活多變，隨時可能會變化，並且數據結構之間的關係，很是複雜，若是咱們用傳統數據庫，那是否是很坑，由於要面臨大量的表
（3）對數據的相關操做，較爲簡單，好比就是一些簡單的增刪改查，用咱們以前講解的那些document操做就能夠搞定
（4）NoSQL數據庫，適用的也是相似於上面的這種場景

舉個例子，好比說像一些網站系統，或者是普通的電商系統，博客系統，面向對象概念比較複雜，可是做爲終端網站來講，沒什麼太複雜的功能，就是一些簡單的CRUD操做，並且數據量可能還比較大。這個時候選用ES這種NoSQL型的數據存儲，比傳統的複雜的功能務必強大的支持SQL的關係型數據庫，更加合適一些。不管是性能，仍是吞吐量，可能都會更好。

第二十九講！

課程大綱

（1）document路由到shard上是什麼意思？

數據路由：當客戶端建立document的時候，es此時須要決定將這個document放到index的哪一個shard上面，這個過程就叫作document routing，數據路由

（2）路由算法：shard = hash(routing) % number_of_primary_shards

舉個例子，一個index有3個primary shard，P0，P1，P2

每次增刪改查一個document的時候，都會帶過來一個routing number，默認就是這個document的_id（多是手動指定，也多是自動生成）
routing = _id，假設_id=1

會將這個routing值，傳入一個hash函數中，產出一個routing值的hash值，hash(routing) = 21
而後將hash函數產出的值對這個index的primary shard的數量求餘數，21 % 3 = 0
就決定了，這個document就放在P0上。

決定一個document在哪一個shard上，最重要的一個值就是routing值，默認是_id，也能夠手動指定，相同的routing值，每次過來，從hash函數中，產出的hash值必定是相同的

不管hash值是幾，不管是什麼數字，對number_of_primary_shards求餘數，結果必定是在0~number_of_primary_shards-1之間這個範圍內的。0,1,2。

（3）_id or custom routing value

默認的routing就是_id
也能夠在發送請求的時候，手動指定一個routing value，好比說put /index/type/id?routing=user_id

手動指定routing value是頗有用的，能夠保證說，某一類document必定被路由到一個shard上去，那麼在後續進行應用級別的負載均衡，以及提高批量讀取的性能的時候，是頗有幫助的

（4）primary shard數量不可變的謎底

shard不可變是由於hash 路由算法都固定了要是primary shard 的數量多了document的獲取結果可能出錯找不到對應的document

第三十講！

document的crud內部實現原理

課程大綱

（增刪改操做只能有primary shard處理不能有replica shard 處理先由primary shard 處理而後將操做同步到對應的replica shard）

分清楚 node 、shard的區別

（1）客戶端選擇一個node發送請求過去，這個node就是coordinating node（協調節點）
（2）coordinating node，對document進行路由，將請求轉發給對應的node（有primary shard）
（3）實際的node上的primary shard處理請求，而後將數據同步到replica node
（4）coordinating node，若是發現primary node和全部replica node都搞定以後，就返回響應結果給客戶端

相關標籤/搜索

es03

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。