ES ElasticSearch 7.x 下動態擴大索引的shard數量

時間 2021-01-19

標籤 html json api 數據結構 app curl elasticsearch ide 測試欄目日誌分析简体版

原文原文鏈接

ES ElasticSearch 7.x 下動態擴大索引的shard數量

背景

在老版本的ES（例如2.3版本）中， index的shard數量定好後，就不能再修改，除非重建數據才能實現。html

從ES6.1開始，ES 支持能夠在線操做擴大shard的數量（注意：操做期間也須要對index鎖寫）json

從ES7.0開始，split時候，再也不須要加參數 index.number_of_routing_shardsapi

具體參考官方文檔：數據結構

https://www.elastic.co/guide/en/elasticsearch/reference/7.5/indices-split-index.htmlapp

https://www.elastic.co/guide/en/elasticsearch/reference/6.1/indices-split-index.htmlcurl

split的過程：elasticsearch

一、建立一個新的目標index，其定義與源index相同，可是具備更多的primary shard。ide

二、將segment從源index硬連接到目標index。（若是文件系統不支持硬連接，則將全部segment都複製到新索引中，這是一個很是耗時的過程。）測試

三、建立低級文件後，再次對全部文檔進行哈希處理，以刪除屬於不一樣shard的documentsui

四、恢復目標索引，就像它是剛剛從新打開的封閉索引同樣。

爲啥ES不支持增量resharding？

從N個分片到N + 1個分片。增量從新分片確實是許多鍵值存儲支持的功能。僅添加一個新的分片並將新的數據推入該新的分片是不可行的：這多是一個索引瓶頸，並根據給定的_id來肯定文檔所屬的分片，這對於獲取，刪除和更新請求是必需的，會變得很複雜。這意味着咱們須要使用其餘哈希方案從新平衡現有數據。

鍵值存儲有效執行此操做的最多見方式是使用一致的哈希。當分片的數量從N增長到N + 1時，一致的哈希僅須要重定位鍵的1 / N。可是，Elasticsearch的存儲單位（碎片）是Lucene索引。因爲它們以搜索爲導向的數據結構，僅佔Lucene索引的很大一部分，即僅佔5％的文檔，將其刪除並在另外一個分片上創建索引一般比鍵值存儲要高得多的成本。如上節所述，當經過增長乘數來增長分片數量時，此成本保持合理：這容許Elasticsearch在本地執行拆分，這又容許在索引級別執行拆分，而不是爲須要從新索引的文檔從新編制索引移動，以及使用硬連接進行有效的文件複製。

對於僅追加數據，能夠經過建立新索引並將新數據推送到其中，同時添加一個別名來覆蓋讀取操做的新舊索引，從而得到更大的靈活性。假設舊索引和新索引分別具備M和N個分片，與搜索具備M + N個分片的索引相比，這沒有開銷。

索引能進行split的前提條件：

一、目標索引不能存在。

二、源索引必須比目標索引具備更少的primary shard。

三、目標索引中主shard的數量必須是源索引中主shard的數量的倍數。

四、處理拆分過程的節點必須具備足夠的可用磁盤空間，以容納現有索引的第二個副本。

操做

下面是具體的實驗部分：

tips：實驗機器有限，索引的replica都設置爲0，生產上至少replica>=1

建立一個索引，2個主shard，沒有副本

curl -s -X PUT "http://localhost:9200/twitter?pretty" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "index.number_of_shards": 2,
    "index.number_of_replicas": 0
  },
    "aliases": {
    "my_search_indices": {}
  }
}'

# index.number_of_shards：主分片設定個數
# index.number_of_replicas：副本分片設定個數，一個副本就等於把整個索引備份1份
# aliases：設定索引別名"my_search_indices"

# 寫入幾條測試數據

curl -s -X PUT "http://localhost:9200/my_search_indices/_doc/11?pretty" -H 'Content-Type: application/json' -d '{
  "id": 11,
  "name":"lee",
  "age":"23"
}'
curl -s -X PUT "http://localhost:9200/my_search_indices/_doc/22?pretty" -H 'Content-Type: application/json' -d '{
  "id": 22,
  "name":"amd",
  "age":"22"
}'

# 查詢數據

curl -s -XGET "http://localhost:9200/my_search_indices/_search" | jq .

對索引鎖寫，以便下面執行split操做

curl -s -X PUT "http://localhost:9200/twitter/_settings?pretty" -H 'Content-Type: application/json' -d '{
  "settings": {
    "index.blocks.write": true
  }
}'

# index.blocks.write：寫入鎖定，只能讀，不能寫

# 寫數據測試，確保鎖寫生效

curl -s -X PUT "http://localhost:9200/twitter/_doc/33?pretty" -H 'Content-Type: application/json' -d '{
  "id": 33,
  "name":"amd",
  "age":"33"
}'

# 測試寫入失敗

# 取消 twitter 索引的alias

curl -s -X POST "http://localhost:9200/_aliases?pretty" -H 'Content-Type: application/json' -d '{
    "actions" : [
        { "remove" : { "index" : "twitter", "alias" : "my_search_indices" } }
    ]
}'

curl -s -X GET "http://localhost:9200/_cat/aliases"

第二種方式：

# 取消索引別名
curl -s -X DELETE "http://localhost:9200/twitter/_alias/my_search_indices"

curl -s -X GET "http://localhost:9200/_cat/aliases"

開始執行 split 切分索引的操做，調整後索引名稱爲new_twitter，且主shard數量爲8

curl -s -X POST "http://localhost:9200/twitter/_split/new_twitter?pretty" -H 'Content-Type: application/json' -d '{
  "settings": {
    "index.number_of_shards": 8,
    "index.number_of_replicas": 0
  }
}'

# 對新的index添加alias

curl -s -X POST "http://localhost:9200/_aliases?pretty" -H 'Content-Type: application/json' -d '{
    "actions" : [
        { "add" : { "index" : "new_twitter", "alias" : "my_search_indices" } }
    ]
}'

第二種方式：

# 新建索引別名
curl -s -X PUT "http://localhost:9200/new_twitter/_alias/my_search_indices"

結果：

{
 "acknowledged" : true,
 "shards_acknowledged" : true,
 "index" : "new_twitter"
}

補充：

查看split的進度，能夠使用 _cat/recovery 這個api，或者在 cerebro 界面上查看。

查看新索引的數據，能正常查看

curl -s -XGET "http://localhost:9200/my_search_indices/_search" | jq .

查看split的進度，能夠使用 _cat/recovery 這個api，或者在 cerebro 界面上查看。

curl -s -X GET "http://localhost:9200/_cat/recovery"

# 對新索引寫數據測試,能夠看到失敗的

curl -s -X PUT "localhost:9200/my_search_indices/_doc/33?pretty" -H 'Content-Type: application/json' -d '{
  "id": 33,
  "name":"amd",
  "age":"33"
}'
# 寫入失敗

# 打開索引的寫功能

curl -s -X PUT "localhost:9200/my_search_indices/_settings?pretty" -H 'Content-Type: application/json' -d '{
  "settings": {
    "index.blocks.write": false 
  }
}'

# 再次對新索引寫數據測試,能夠看到此時，寫入是成功的

curl -s -X PUT "localhost:9200/my_search_indices/_doc/33?pretty" -H 'Content-Type: application/json' -d '{
  "id": 33,
  "name":"amd",
  "age":"33"
}'

curl -s -X PUT "localhost:9200/my_search_indices/_doc/44?pretty" -H 'Content-Type: application/json' -d '{
  "id": 44,
  "name":"intel",
  "age":"4"
}'

# 此時，老的那個索引仍是隻讀的，咱們確保新索引OK後，就能夠考慮關閉或者刪除老的 twitter索引了。

測試將新數據寫入別名

curl -s -X PUT "localhost:9200/my_search_indices/_doc/44?pretty" -H 'Content-Type: application/json' -d '{
	"id": 44,
    "name":"amd",
    "age":"44"
}'


寫入也是ok 的

刪除索引

curl -s -X DELETE "http://localhost:9200/new_twitter"

總結

貼一張生產環境執行後的index的截圖，能夠看到新的index的每一個shard體積只有老index的一半，這樣也就分攤了index的壓力：

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。