常見elasticsearch問題處理方法

時間 2020-04-06

原文原文鏈接

一、集羣狀態查看node

經過這兩個界面能夠看出
1.1若是heap紅色,會致使界面查詢比較慢. 一般是打開了比較多的索引, 能夠在企業版的日誌策略中打開自動關閉索引的功能, 一般是打開7天的索引, 能夠根據具體內存狀況多打開幾天,好比10天或者15天, 不能太多.
1.2.若是cpu和load紅色, 會致使界面查詢慢, 寫入也慢, 對採集器的影響尤爲明顯, 採集器的日誌中會出現超時或者bulk reject錯誤. 可能緣由以下:
a)當天或當前正在大量寫入的event索引的分片分佈不均,某個節點上分佈了比較多的分片,致使該節點壓力很大
b)磁盤寫入速度過低
c)分片在恢復或者rebalance
d)cpu核數少
e)主機上同時部署了其餘消耗cpu資源較多的應用
1.3.若是磁盤使用率達到85%可能致使, 索引分片不能分配到此節點, 致使其餘節點壓力變大, 集羣性能降低
1.4.能夠看出節點列表是否缺乏節點, 若是少節點, 可能存在下面兩種狀況:
a)節點進程死掉, 沒啓動成功
b)集羣發生腦裂, 一個大集羣變成了多個小集羣
5.節點列表裏面實心星號的節點是當前的master節點, 須要看ES日誌時首先看此節點的日誌, 不能發現問題再看其餘節點的日誌.
二、查看集羣參數配置
curl http://localhost:9200/_cluster/settings?pretty
或者直接在瀏覽器中訪問此url
三、防止集羣腦裂
設置3個Master Eligible節點：node.master: true
至少看見2個Master Eligible纔可組建集羣：discovery.zen.minimum_master_nodes：2
配置文件位置/opt/hansight/enterprise/elasticsearch/config/elasticsearch.ymlapi

四、強制event索引分片平均分配到各個節點
好比event索引共6個數據節點, 12個分片1個副本共24個分片, 能夠設置每一個節點最大分片數爲5或者4, 設置爲5是爲了容許一個節點死掉後還可以正常分配.
a)在event模板中添加"index.routing.allocation.total_shards_per_node":5參數

b)在已有索引上進行設置(可選, 若是集羣rebalance開關打開這可能引發分片自動平衡)
curl -XPUT 172.16.106.66:9200/event*/_settings -d '{"index.routing.allocation.total_shards_per_node":5}'瀏覽器

五、ES分片分配不下去
kopf插件顯示有未分配的分片可是未顯示有多少個分片正在分配，這時說明ES沒有在作分片分配操做
併發

經過下面命令查看觸發分片分配的緣由
curl -XGET localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason| grep UNASSIGNED
查看分片分配不下去的緣由
5.X版本能夠經過命令查看
curl -XGET localhost:9200/_cluster/allocation/explain?pretty
下圖是執行結果，關注紅框框起來的部分便可，紅框中的ip是問題ES節點

2.3.5版本es只能在es的日誌中查看分配不下去的緣由。curl

解決辦法：
經過下面方式看可否觸發分片分配
5.一、點擊kopf界面上的鎖按鈕關閉/打開分片分配或者經過下面命令關閉分片分配再打開，看看可否觸發分片開始分配
curl -XPUT http://127.0.0.1:9200/_cluster/settings -d '{"transient" : {"cluster.routing.allocation.enable" : "none"}}'
curl -XPUT http://127.0.0.1:9200/_cluster/settings -d '{"transient" : {"cluster.routing.allocation.enable" : "all"}}'
5.二、（5.6版本ES）使用reroute命令，看看可否觸發分片開始分配
curl -XPOST localhost:9200/_cluster/reroute?retry_failed=falseelasticsearch

5.三、檢查集羣設置和索引設置看看是否設置問題致使分片不能分配
curl http://172.16.106.63:9200/_cluster/settings?pretty
curl http://172.16.106.63:9200/event_20190313/_settings?pretty
下面參數可能會致使分片不分配
cluster.routing.allocation.enable none表示不分配，all表示分配ide

cluster.routing.allocation.total_shards_per_node 集羣級別-每一個節點最多分配幾個分片
index.routing.allocation.total_shards_per_node 索引級別-每一個節點最多分配幾個分片性能

cluster.routing.allocation.disk.threshold_enabled 默認true，表示分片分配參考磁盤空間使用率
cluster.routing.allocation.disk.watermark.low 默認85% 表示磁盤使用率超過85%時再也不向節點分配新分片（若是集羣只有一個節點仍是能夠分配的）
cluster.routing.allocation.disk.watermark.high 默認90% 表示磁盤使用率超過90%時ES會嘗試將分片轉移到其餘節點ui

Shard Allocation Filtering參數：
cluster.routing.allocation.include
cluster.routing.allocation.require
cluster.routing.allocation.exclude
index.routing.allocation.include
index.routing.allocation.require
index.routing.allocation.excludeurl

5.四、若是確認分片數據已損壞或已丟失可使用強制分配命令將分片分配下去
強制分配分片(會致使數據丟失，慎用)
ES版本號2.3
curl -XPOST 'localhost:9200/_cluster/reroute' -d '
{ "commands" : [ { "allocate" : { "index" : "event_test", "shard" : 0, "node": "node123", "allow_primary": "true" } }] }'

ES版本號5.6版本往上
分配副本分片
curl -XPOST 'localhost:9200/_cluster/reroute' -d '
{
"commands" : [
{
"allocate_replica" : {"index" : "event_test", "shard" : 5, "node" : "node-5"}
}
]
}'
從副本生成一個主分片
curl -XPOST 'localhost:9200/_cluster/reroute' -d '
{
"commands" : [
{
"allocate_stale_primary" : {"index" : "event_test", "shard" : 5, "node" : "node-5", "accept_data_loss":true}
}
]
}'
分配一個空的主分片
curl -XPOST 'localhost:9200/_cluster/reroute' -d '
{
"commands" : [
{
"allocate_empty_primary" : {"index" : "event_test", "shard" : 4, "node" : "node-5", "accept_data_loss":true}
}
]
}'

六、提升分片恢復速度
curl -XPUT localhost:9200/_cluster/settings -d '
{
"persistent" : {
"cluster.routing.allocation.node_concurrent_recoveries": 4,
"indices.recovery.max_bytes_per_sec" : "400mb"
}
}
'
七、作大量分片遷移時提升分片自動平衡的併發數
curl -XPUT localhost:9200/_cluster/settings -d '
{
"persistent" : {
"cluster.routing.allocation.cluster_concurrent_rebalance": 10
}
}
'

八、查看節點bulk reject狀況
5.x版本 curl 'localhost:9200/_cat/thread_pool/bulk?v'
2.x版本 curl 'localhost:9200/_cat/thread_pool?v&h=ip,bulk.*'
reject多的節點最多是磁盤速度或者cpu不夠用的節點

九、調整分片分配磁盤參數提升磁盤空間利用率
關閉分片分配磁盤參數
curl -XPUT 'localhost:9200/_cluster/settings' -d '{"persistent": {"cluster.routing.allocation.disk.threshold_enabled":false}}'
或者根據磁盤空間大小和天天日誌量調整閥值設置(按空間大小或百分比設置)
curl -XPUT 'localhost:9200/_cluster/settings?pretty' -d '{"persistent": {"cluster.routing.allocation.disk.threshold_enabled":true}}'
curl -XPUT 'localhost:9200/_cluster/settings?pretty' -d '{"persistent": {"cluster.routing.allocation.disk.watermark.low": "500g"}}'
curl -XPUT 'localhost:9200/_cluster/settings?pretty' -d '{"persistent": {"cluster.routing.allocation.disk.watermark.high": "100g"}}'
或者
curl -XPUT 'localhost:9200/_cluster/settings?pretty' -d '{"persistent": {"cluster.routing.allocation.disk.threshold_enabled":true}}'
curl -XPUT 'localhost:9200/_cluster/settings?pretty' -d '{"persistent": {"cluster.routing.allocation.disk.watermark.low": "95%"}}'
curl -XPUT 'localhost:9200/_cluster/settings?pretty' -d '{"persistent": {"cluster.routing.allocation.disk.watermark.high": "98%"}}'

十、設置索引數據刷入file-system-cache的頻率爲60秒,提升寫入性能
新版本此參數已經默認60秒了,3.7版本默認1秒
index settings api針對已存在索引有效，在event的模版中設置此參數對新建立的索引有效。
curl -XPUT 'localhost:9200/event*/_settings' -d '{"index.refresh_interval" : "60s"}'

十一、設置索引分片不分配到問題節點
設置event索引不分配到node_205節點(index.routing.allocation.exclude._name)
在event的模版中設置此參數對新建立的索引有效。
下面命令對已建立的索引進行設置,索引名稱支持通配符.
curl -XPUT 'localhost:9200/event_20181218/_settings' -d '{"index.routing.allocation.exclude._name": "node_205"}'
設置以後node_205節點上的分片會自動遷移到其餘節點(已關閉的索引不會遷移).
設置集羣中全部索引都不分配到node_205節點
curl -XPUT 'localhost:9200/_cluster/settings' -d '{"persistent":{"cluster.routing.allocation.exclude._name": "node_205"}}'
設置以後node_205節點上的分片會自動遷移到其餘節點

注意: 2.3.5版本的es須要開着自動平衡功能分片纔會自動遷移, 5.6.8版本的es開不開自動平衡都會自動遷移.
十二、單機部署多實例時避免主/副分片分在同一個主機
curl -XPUT 'localhost:9200/_cluster/settings' -d '{"persistent": {"cluster.routing.allocation.same_shard.host":true}}'

1三、集羣老是自動平衡時能夠關閉自動平衡功能
關閉:
curl -XPUT http://localhost:9200/_cluster/settings?pretty -d '{
  "transient" : {
    "cluster.routing.rebalance.enable" : "none"
  }
}'
打開:
curl -XPUT http://localhost:9200/_cluster/settings?pretty -d '{
  "transient" : {
    "cluster.routing.rebalance.enable" : "all"
  }
}'

1四、人工遷移分片
curl -XPOST 'http://localhost:9200/_cluster/reroute' -d '{
"commands": [
{
"move": {
"index": "event_20181019",
"shard": 1,
"from_node": "node119",
"to_node": "node96"
}
}
]
}'
1五、集羣中磁盤空間不平均致使分片都分配到大空間節點上,集羣性能降低問題解決
好比:節點A和B磁盤空間20T,節點C磁盤空間10T, 一段時間以後分片不會分配到節點C上,致使節點A和節點B壓力增大,不知足性能要求, 可經過將節點C上最先的索引分片遷移到A和B上來釋放節點C的磁盤空間.
具體操做爲:
在最先的5天的索引上執行下面命令,而後打開索引(以前的索引可能已關閉),將索引分片從小空間節點上轉移到大空間節點上. 轉移完成後關閉這5天的索引. 注意不要一次打開太多索引, 可能致使集羣死掉.
curl -XPUT 'localhost:9200/event_20181218/_settings' -d '{"index.routing.allocation.exclude._name": "node_205"}'
若是分片沒有自動平衡走,請檢查是否打開了自動平衡功能.

1六、快速的關閉/打開/刪除索引
刪除索引
curl -XDELETE 'localhost:9200/索引名字'
關閉索引
curl -XPOST 'localhost:9200/索引名字/_close'
打開索引
curl -XPOST 'localhost:9200/索引名字/_open'
1七、ES啓動失敗ES的運行用戶是elasticsearch不是root,因此從根目錄到ES的安裝目錄和數據目錄都須要對elasticsearch用戶開放權限好比安裝目錄是/opt/hansight/enterprise, 數據目錄是/data01那麼這4個目錄都要受權rx權限給elasticsearch用戶, 其下的elasticsearch目錄及子目錄/文件的歸屬用戶要是elasticsearch用戶

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。