一、集羣狀態非綠排查清單

1.1 集羣狀態的含義

紅色：至少一個主分片未分配成功；html
黃色：至少一個副本分片未分配成功；node
綠色：所有主&副本都分配成功。緩存

1.2 排查實戰

1.2.1 查看集羣狀態

GET _cluster/health

返回狀態舉例："status" : "red", 紅色，至少一個主分片未分配成功。併發

1.2.2 到底哪一個節點出現了紅色或者黃色問題呢？

GET _cluster/health?level=indices

以下的方式，更明快直接elasticsearch

GET /_cat/indices?v&health=yellow
GET /_cat/indices?v&health=red

找到對應的索引。ide

1.2.3 到底索引的哪一個分片出現了紅色或者黃色問題呢？

GET _cluster/health?level=shards

1.2.4 到底什麼緣由致使了集羣變成紅色或者黃色呢？

GET _cluster/allocation/explain

返回核心信息解讀舉例：工具

"current_state" : "unassigned",——未分配
  "unassigned_info" : {
    "reason" : "INDEX_CREATED",——緣由，索引建立階段
    "at" : "2020-01-29T07:32:39.041Z",
    "last_allocation_status" : "no"
  },
  "explanation" : """node does not match index setting [index.routing.allocation.require] filters [box_type:"hot"]"""
        }

根本緣由，shard分片與節點過濾類型不一致到此，找到了根本緣由，也就知道了對應解決方案。性能

1.3 擴展思考：相似 "current_state" : "unassigned",——未分配還有哪些？

實戰：ui

GET _cat/shards?h=index,shard,prirep,state,unassigned.reason

官網：https://www.elastic.co/guide/en/elasticsearch/reference/7.2/cat-shards.htmlspa

未分配狀態及緣由解讀：

（1）INDEX_CREATED
Unassigned as a result of an API creation of an index.
（2）CLUSTER_RECOVERED
Unassigned as a result of a full cluster recovery.
（3）INDEX_REOPENED
Unassigned as a result of opening a closed index.
（4）DANGLING_INDEX_IMPORTED
Unassigned as a result of importing a dangling index.
（5）NEW_INDEX_RESTORED
Unassigned as a result of restoring into a new index.
（6）EXISTING_INDEX_RESTORED
Unassigned as a result of restoring into a closed index.
（7）REPLICA_ADDED
Unassigned as a result of explicit addition of a replica.
（8）ALLOCATION_FAILED
Unassigned as a result of a failed allocation of the shard.
（9）NODE_LEFT
Unassigned as a result of the node hosting it leaving the cluster.
（10）REROUTE_CANCELLED
Unassigned as a result of explicit cancel reroute command.
（11）REINITIALIZED
When a shard moves from started back to initializing, for example, with shadow replicas.
（12）REALLOCATED_REPLICA
A better replica location is identified and causes the existing replica allocation to be cancelled.

二、節點間分片移動

適用場景：手動移動分配分片。將啓動的分片從一個節點移動到另外一節點。

POST /_cluster/reroute
{
  "commands": [
    {
      "move": {
        "index": "indexname",
        "shard": 1,
        "from_node": "nodename",
        "to_node": "nodename"
      }
    }
  ]
}

三、集羣節點優雅下線

適用場景：保證集羣顏色綠色的前提下，將某個節點優雅下線。

PUT /_cluster/settings
{
  "transient": {
    "cluster.routing.allocation.exclude._ip": "122.5.3.55"
  }
}

四、強制刷新

適用場景：刷新索引是確保當前僅存儲在事務日誌中的全部數據也永久存儲在Lucene索引中。

POST /_flush

注意：這和 7.6 版本以前的同步刷新（將來8版本+會廢棄同步刷新）一致。

POST /_flush/synced

五、更改併發分片的數量以平衡集羣

適用場景：

控制在集羣範圍內容許多少併發分片從新平衡。默認值爲2。

PUT /_cluster/settings
{
  "transient": {
    "cluster.routing.allocation.cluster_concurrent_rebalance": 2
  }
}

六、更改每一個節點同時恢復的分片數量

適用場景：

若是節點已從集羣斷開鏈接，則其全部分片將都變爲未分配狀態。通過必定的延遲後，分片將分配到其餘位置。每一個節點要恢復的併發分片數由該設置肯定。

PUT /_cluster/settings
{
  "transient": {
    "cluster.routing.allocation.node_concurrent_recoveries": 6
  }
}

七、調整恢復速度

適用場景：

爲了不集羣過載，Elasticsearch限制了分配給恢復的速度。你能夠仔細更改該設置，以使其恢復更快。

若是此值調的過高，則正在進行的恢復可能會消耗過多的帶寬和其餘資源，這可能會使集羣不穩定。

PUT /_cluster/settings
{
  "transient": {
    "indices.recovery.max_bytes_per_sec": "80mb"
  }
}

八、清除節點上的緩存

適用場景：若是節點達到較高的JVM值，則能夠在節點級別上調用該API 以使 Elasticsearch 清理緩存。

這會下降性能，但可使你擺脫OOM（內存不足）的困擾。

POST /_cache/clear

九、調整斷路器

適用場景：爲了不在Elasticsearch中進入OOM，能夠調整斷路器上的設置。這將限制搜索內存，並丟棄全部估計消耗比所需級別更多的內存的搜索。

注意：這是一個很是精密的設置，你須要仔細校準。

PUT /_cluster/settings
{
  "persistent": {
    "indices.breaker.total.limit": "40%"
  }
}

十、集羣遷移

適用場景：集羣數據遷移、索引數據遷移等。

方案1、針對索引部分或者所有數據，reindex

POST _reindex
{
  "source": {
    "index": "my-index-000001"
  },
  "dest": {
    "index": "my-new-index-000001"
  }
}

方案二：藉助第三方工具遷移索引或者集羣

elasticdump
elasticsearch-migration

工具本質：scroll + bulk 實現。

十一、集羣數據備份和恢復

適用場景：高可用業務場景，按期增量、全量數據備份，以備應急不時之需。

PUT /_snapshot/my_backup/snapshot_hamlet_index?wait_for_completion=true
{
  "indices": "hamlet_*",
  "ignore_unavailable": true,
  "include_global_state": false,
  "metadata": {
    "taken_by": "mingyi",
    "taken_because": "backup before upgrading"
  }
}

POST /_snapshot/my_backup/snapshot_hamlet_index/_restore

Elasticsearch 運維實戰經常使用命令清單

一、集羣狀態非綠排查清單

1.1 集羣狀態的含義

1.2 排查實戰

1.2.1 查看集羣狀態

1.2.2 到底哪一個節點出現了紅色或者黃色問題呢？

1.2.3 到底索引的哪一個分片出現了紅色或者黃色問題呢？

1.2.4 到底什麼緣由致使了集羣變成紅色或者黃色呢？

1.3 擴展思考：相似 "current_state" : "unassigned",——未分配還有哪些？

二、節點間分片移動

三、集羣節點優雅下線

四、強制刷新

五、更改併發分片的數量以平衡集羣

六、更改每一個節點同時恢復的分片數量

七、調整恢復速度

八、清除節點上的緩存

九、調整斷路器

十、集羣遷移

方案1、針對索引部分或者所有數據，reindex

方案二：藉助第三方工具遷移索引或者集羣

十一、集羣數據備份和恢復

小結

Elasticsearch 運維實戰經常使用命令清單

一、集羣狀態非綠排查清單

1.1 集羣狀態的含義

1.2 排查實戰

1.2.1 查看集羣狀態

1.2.2 到底哪一個節點出現了紅色或者黃色問題呢？

1.2.3 到底索引的哪一個分片出現了紅色或者黃色問題呢？

1.2.4 到底什麼緣由致使了集羣變成紅色或者黃色呢？

1.3 擴展思考：相似 "current_state" : "unassigned",——未分配 還有哪些？

二、節點間分片移動

三、集羣節點優雅下線

四、強制刷新

五、更改併發分片的數量以平衡集羣

六、更改每一個節點同時恢復的分片數量

七、調整恢復速度

八、清除節點上的緩存

九、調整斷路器

十、集羣遷移

方案1、 針對索引部分或者所有數據，reindex

方案二：藉助第三方工具遷移索引或者集羣

十一、集羣數據備份和恢復

小結

1.3 擴展思考：相似 "current_state" : "unassigned",——未分配還有哪些？

方案1、針對索引部分或者所有數據，reindex