一、集羣狀態非綠排查清單
1.1 集羣狀態的含義
-
紅色:至少一個主分片未分配成功;html
-
黃色:至少一個副本分片未分配成功;node
-
綠色:所有主&副本都分配成功。緩存
1.2 排查實戰
1.2.1 查看集羣狀態
GET _cluster/health
返回狀態舉例:"status" : "red", 紅色,至少一個主分片未分配成功。併發
1.2.2 到底哪一個節點出現了紅色或者黃色問題呢?
GET _cluster/health?level=indices
以下的方式,更明快直接elasticsearch
GET /_cat/indices?v&health=yellow GET /_cat/indices?v&health=red
找到對應的索引。ide
1.2.3 到底索引的哪一個分片出現了紅色或者黃色問題呢?
GET _cluster/health?level=shards
1.2.4 到底什麼緣由致使了集羣變成紅色或者黃色呢?
GET _cluster/allocation/explain
返回核心信息解讀舉例:工具
"current_state" : "unassigned",——未分配 "unassigned_info" : { "reason" : "INDEX_CREATED",——緣由,索引建立階段 "at" : "2020-01-29T07:32:39.041Z", "last_allocation_status" : "no" }, "explanation" : """node does not match index setting [index.routing.allocation.require] filters [box_type:"hot"]""" }
根本緣由,shard分片與節點過濾類型不一致 到此,找到了根本緣由,也就知道了對應解決方案。性能
1.3 擴展思考:相似 "current_state" : "unassigned",——未分配 還有哪些?
實戰:ui
GET _cat/shards?h=index,shard,prirep,state,unassigned.reason
官網:https://www.elastic.co/guide/en/elasticsearch/reference/7.2/cat-shards.htmlspa
未分配狀態及緣由解讀:
(1)INDEX_CREATED Unassigned as a result of an API creation of an index. (2)CLUSTER_RECOVERED Unassigned as a result of a full cluster recovery. (3)INDEX_REOPENED Unassigned as a result of opening a closed index. (4)DANGLING_INDEX_IMPORTED Unassigned as a result of importing a dangling index. (5)NEW_INDEX_RESTORED Unassigned as a result of restoring into a new index. (6)EXISTING_INDEX_RESTORED Unassigned as a result of restoring into a closed index. (7)REPLICA_ADDED Unassigned as a result of explicit addition of a replica. (8)ALLOCATION_FAILED Unassigned as a result of a failed allocation of the shard. (9)NODE_LEFT Unassigned as a result of the node hosting it leaving the cluster. (10)REROUTE_CANCELLED Unassigned as a result of explicit cancel reroute command. (11)REINITIALIZED When a shard moves from started back to initializing, for example, with shadow replicas. (12)REALLOCATED_REPLICA A better replica location is identified and causes the existing replica allocation to be cancelled.
二、節點間分片移動
適用場景:手動移動分配分片。將啓動的分片從一個節點移動到另外一節點。
POST /_cluster/reroute { "commands": [ { "move": { "index": "indexname", "shard": 1, "from_node": "nodename", "to_node": "nodename" } } ] }
三、集羣節點優雅下線
適用場景:保證集羣顏色綠色的前提下,將某個節點優雅下線。
PUT /_cluster/settings { "transient": { "cluster.routing.allocation.exclude._ip": "122.5.3.55" } }
四、強制刷新
適用場景:刷新索引是確保當前僅存儲在事務日誌中的全部數據也永久存儲在Lucene索引中。
POST /_flush
注意:這和 7.6 版本以前的同步刷新(將來8版本+會廢棄同步刷新)一致。
POST /_flush/synced
五、更改併發分片的數量以平衡集羣
適用場景:
控制在集羣範圍內容許多少併發分片從新平衡。默認值爲2。
PUT /_cluster/settings { "transient": { "cluster.routing.allocation.cluster_concurrent_rebalance": 2 } }
六、更改每一個節點同時恢復的分片數量
適用場景:
若是節點已從集羣斷開鏈接,則其全部分片將都變爲未分配狀態。通過必定的延遲後,分片將分配到其餘位置。每一個節點要恢復的併發分片數由該設置肯定。
PUT /_cluster/settings { "transient": { "cluster.routing.allocation.node_concurrent_recoveries": 6 } }
七、調整恢復速度
適用場景:
爲了不集羣過載,Elasticsearch限制了分配給恢復的速度。你能夠仔細更改該設置,以使其恢復更快。
若是此值調的過高,則正在進行的恢復可能會消耗過多的帶寬和其餘資源,這可能會使集羣不穩定。
PUT /_cluster/settings { "transient": { "indices.recovery.max_bytes_per_sec": "80mb" } }
八、清除節點上的緩存
適用場景:若是節點達到較高的JVM值,則能夠在節點級別上調用該API 以使 Elasticsearch 清理緩存。
這會下降性能,但可使你擺脫OOM(內存不足)的困擾。
POST /_cache/clear
九、調整斷路器
適用場景:爲了不在Elasticsearch中進入OOM,能夠調整斷路器上的設置。這將限制搜索內存,並丟棄全部估計消耗比所需級別更多的內存的搜索。
注意:這是一個很是精密的設置,你須要仔細校準。
PUT /_cluster/settings { "persistent": { "indices.breaker.total.limit": "40%" } }
十、集羣遷移
適用場景:集羣數據遷移、索引數據遷移等。
方案1、 針對索引部分或者所有數據,reindex
POST _reindex { "source": { "index": "my-index-000001" }, "dest": { "index": "my-new-index-000001" } }
方案二:藉助第三方工具遷移索引或者集羣
-
elasticdump
-
elasticsearch-migration
工具本質:scroll + bulk 實現。
十一、集羣數據備份和恢復
適用場景:高可用業務場景,按期增量、全量數據備份,以備應急不時之需。
PUT /_snapshot/my_backup/snapshot_hamlet_index?wait_for_completion=true { "indices": "hamlet_*", "ignore_unavailable": true, "include_global_state": false, "metadata": { "taken_by": "mingyi", "taken_because": "backup before upgrading" } } POST /_snapshot/my_backup/snapshot_hamlet_index/_restore