1. 集羣健康情況處理node
當集羣處於yellow或者red狀態的時候,總體處理步驟以下:bash
(1) 首先查看集羣狀態curl
localhost:9200/_cluster/health?pretty
{
"cluster_name": "elasticsearch",
"status": "yellow",
"timed_out": false,
"number_of_nodes": 1,
"number_of_data_nodes": 1,
"active_primary_shards": 278,
"active_shards": 278,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 278,
"delayed_unassigned_shards": 0,
"number_of_pending_tasks": 0,
"number_of_in_flight_fetch": 0,
"task_max_waiting_in_queue_millis": 0,
"active_shards_percent_as_number": 50
}elasticsearch
主要關注其中的unassigned_shards指標,其表明已經在集羣狀態中存在的分片,可是實際在集羣裏又找不着。一般未分配分片的來源是未分配的副本。好比,一個有 5 分片和 1 副本的索引,在單節點集羣上,就會有 5 個未分配副本分片。若是你的集羣是 red
狀態,也會長期保有未分配分片(由於缺乏主分片)。其餘指標解釋:fetch
(1) initializing_shards
是剛剛建立的分片的個數。好比,當你剛建立第一個索引,分片都會短暫的處於 initializing
狀態。這一般會是一個臨時事件,分片不該該長期停留在 initializing
狀態。你還可能在節點剛重啓的時候看到 initializing
分片:當分片從磁盤上加載後,它們會從 initializing
狀態開始。url
(2) number_of_nodes
和 number_of_data_nodes
這個命名徹底是自描述的。spa
(3) active_primary_shards
指出你集羣中的主分片數量。這是涵蓋了全部索引的彙總值。code
(4) active_shards
是涵蓋了全部索引的_全部_分片的彙總值,即包括副本分片。blog
(5) relocating_shards
顯示當前正在從一個節點遷往其餘節點的分片的數量。一般來講應該是 0,不過在 Elasticsearch 發現集羣不太均衡時,該值會上漲。好比說:添加了一個新節點,或者下線了一個節點。索引
(2)查找問題索引
curl -XGET 'localhost:9200/_cluster/health?level=indices' { "cluster_name": "elasticsearch", "status": "yellow", "timed_out": false, "number_of_nodes": 1, "number_of_data_nodes": 1, "active_primary_shards": 278, "active_shards": 278, "relocating_shards": 0, "initializing_shards": 0, "unassigned_shards": 278, "delayed_unassigned_shards": 0, "number_of_pending_tasks": 0, "number_of_in_flight_fetch": 0, "task_max_waiting_in_queue_millis": 0, "active_shards_percent_as_number": 50, "indices": { "gaczrk": { "status": "yellow", "number_of_shards": 5, "number_of_replicas": 1, "active_primary_shards": 5, "active_shards": 5, "relocating_shards": 0, "initializing_shards": 0, "unassigned_shards": 5 }, "special-sms-extractor_zhuanche_20200204": { "status": "yellow", "number_of_shards": 5, "number_of_replicas": 1, "active_primary_shards": 5, "active_shards": 5, "relocating_shards": 0, "initializing_shards": 0, "unassigned_shards": 5 }, "specialhtl201905": { "status": "yellow", "number_of_shards": 1, "number_of_replicas": 1, "active_primary_shards": 1, "active_shards": 1, "relocating_shards": 0, "initializing_shards": 0, "unassigned_shards": 1 },
"v2": { "status": "red",
"number_of_shards": 10,
"number_of_replicas": 1,
"active_primary_shards": 0,
"active_shards": 0,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 20
},
"sms20181009": {
"status": "yellow",
"number_of_shards": 5,
"number_of_replicas": 1,
"active_primary_shards": 5,
"active_shards": 5,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 5
},
......
這個參數會讓 cluster-health
API 在咱們的集羣信息裏添加一個索引清單,以及有關每一個索引的細節(狀態、分片數、未分配分片數等等),一旦咱們詢問要索引的輸出,哪一個索引有問題立馬就很清楚了:v2
索引。咱們還能夠看到這個索引曾經有 10 個主分片和一個副本,而如今這 20 個分片全不見了。能夠推測,這 20 個索引就是位於從咱們集羣裏不見了的那兩個節點上。通常來說,Elasticsearch是有自我分配節點功能的,首先查看這個功能是否開啓:
curl -XGET 'localhost:9200/_cluster/settings?pretty' -d '{ "persistent": {}, "transient": { "cluster": { "routing": { "allocation": { "enable": "all" } } } } }'
level
參數還能夠接受其餘更多選項:
localhost:9200/_cluster/health?level=shards { "cluster_name": "elasticsearch", "status": "yellow", "timed_out": false, "number_of_nodes": 1, "number_of_data_nodes": 1, "active_primary_shards": 278, "active_shards": 278, "relocating_shards": 0, "initializing_shards": 0, "unassigned_shards": 278, "delayed_unassigned_shards": 0, "number_of_pending_tasks": 0, "number_of_in_flight_fetch": 0, "task_max_waiting_in_queue_millis": 0, "active_shards_percent_as_number": 50, "indices": { "gaczrk": { "status": "yellow", "number_of_shards": 5, "number_of_replicas": 1, "active_primary_shards": 5, "active_shards": 5, "relocating_shards": 0, "initializing_shards": 0, "unassigned_shards": 5, "shards": { "0": { "status": "yellow", "primary_active": true, "active_shards": 1, "relocating_shards": 0, "initializing_shards": 0, "unassigned_shards": 1 }, "1": { "status": "yellow", "primary_active": true, "active_shards": 1, "relocating_shards": 0, "initializing_shards": 0, "unassigned_shards": 1 }, "2": { "status": "yellow", "primary_active": true, "active_shards": 1, "relocating_shards": 0, "initializing_shards": 0, "unassigned_shards": 1 }, "3": { "status": "yellow", "primary_active": true, "active_shards": 1, "relocating_shards": 0, "initializing_shards": 0, "unassigned_shards": 1 }, "4": { "status": "yellow", "primary_active": true, "active_shards": 1, "relocating_shards": 0, "initializing_shards": 0, "unassigned_shards": 1 } } }, ......
shards
選項會提供一個詳細得多的輸出,列出每一個索引裏每一個分片的狀態和位置。這個輸出有時候頗有用,可是因爲太過詳細會比較難用。
(3) 手動分配未分配分片
查詢未分配分片的節點以及未分配緣由
localhost:9200/_cat/shards?v&h=index,shard,prirep,state,unassigned.reason index shard prirep state unassigned.reason gaczrk 4 p STARTED gaczrk 4 r UNASSIGNED CLUSTER_RECOVERED gaczrk 2 p STARTED gaczrk 2 r UNASSIGNED CLUSTER_RECOVERED gaczrk 1 p STARTED
未分配緣由說明:
INDEX_CREATED: 因爲建立索引的API致使未分配。
CLUSTER_RECOVERED: 因爲徹底集羣恢復致使未分配。
INDEX_REOPENED: 因爲打開open或關閉close一個索引致使未分配。
DANGLING_INDEX_IMPORTED: 因爲導入dangling索引的結果致使未分配。
NEW_INDEX_RESTORED: 因爲恢復到新索引致使未分配。
EXISTING_INDEX_RESTORED: 因爲恢復到已關閉的索引致使未分配。
REPLICA_ADDED: 因爲顯式添加副本分片致使未分配。
ALLOCATION_FAILED: 因爲分片分配失敗致使未分配。
NODE_LEFT: 因爲承載該分片的節點離開集羣致使未分配。
REINITIALIZED: 因爲當分片從開始移動到初始化時致使未分配(例如,使用影子shadow副本分片)。
REROUTE_CANCELLED: 做爲顯式取消從新路由命令的結果取消分配。
REALLOCATED_REPLICA: 肯定更好的副本位置被標定使用,致使現有的副本分配被取消,出現未分配。
而後執行命令手動分配:
curl -XPOST 'localhost:9200/_cluster/reroute' -d '{ "commands": [{ "allocate": { "index": "gaczrk(索引名稱)", "shard": 4分片編號), "node": "其餘node的id", "allow_primary": true } }] }'
若是未分片較多的話,能夠用以下腳本進行自動分派:
#!/bin/bash array=( node1 node2 node3 ) node_counter=0 length=${#array[@]} IFS=$'\n' for line in $(curl -s 'http://127.0.0.1:9200/_cat/shards'| fgrep UNASSIGNED); do INDEX=$(echo $line | (awk '{print $1}')) SHARD=$(echo $line | (awk '{print $2}')) NODE=${array[$node_counter]} echo $NODE curl -XPOST 'http://127.0.0.1:9200/_cluster/reroute' -d '{ "commands": [ { "allocate": { "index": "'$INDEX'", "shard": '$SHARD', "node": "'$NODE'", "allow_primary": true } } ] }' node_counter=$(((node_counter)%length +1)) done
(4) 快速分配分片
在上面的命令執行輸出結果中,假如全部的primary shards都是好的,全部replica shards有問題,有一種快速恢復的方法,就是強制刪除掉replica shards,讓elasticsearch自主從新生成。 首先先將出問題的index的副本爲0
curl -XPUT '/問題索引名稱/_settings?pretty' -d '{ "index" : { "number_of_replicas" : 0 } }'
而後觀察集羣狀態,最後經過命令在恢復期索引副本數據
curl -XGET '/問題索引名稱/_settings { "index" : { "number_of_replicas" : 1 } }
等待節點自動分配後,集羣成功恢復成gree
(5)集羣分片始終處於 INITIALIZING狀態
curl -XGET 'localhost:9200/_cat/shards/7a_cool?v&pretty' 7a_cool 5 r STARTED 4583018 759.4mb 10.2.4.21 pt01-pte-10-2-4-21 7a_cool 17 r INITIALIZING 10.2.4.22 pt01-pte-10-2-4-22 《==異常分片
解決辦法:
1)首先關閉異常分片主機es 服務;
登錄pt01-pte-10-2-4-22 主機 ,/etc/init.d/elasticsearch stop
若是分片自動遷移至其它主機,狀態恢復,則集羣正常,若是狀態仍是在初始化狀態,則說明問題依舊存在;則執行上面手動分配分片命令,若是問題依然存在,則將問題索引分片副本數置爲0,讓集羣
自主調整集羣分片,調整完成後集羣狀態變成:green