bulk寫es超時問題

時間 2019-11-16

標籤 bulk 超時問題简体版

原文原文鏈接

背景

筆者維護的線上EFK集羣，在天天早上8點建立新索引的時候，日誌中總會出現以下的日誌：html

failed to process cluster event (cluster_update_settings) within 30s

[2019-04-13T08:00:38,213][DEBUG][o.e.a.b.TransportShardBulkAction] [logstash-felice-query-2019.04.13][2] failed to execute bulk item (index) BulkShardRequest [[logstash-felice-query-2019.04.13][2]] containing [7] requests
org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: failed to process cluster event (put-mapping) within 30s
        at org.elasticsearch.cluster.service.MasterService$Batcher.lambda$onTimeout$0(MasterService.java:124) ~[elasticsearch-6.3.0.jar:6.3.0]
        at java.util.ArrayList.forEach(ArrayList.java:1257) ~[?:1.8.0_152]
        at org.elasticsearch.cluster.service.MasterService$Batcher.lambda$onTimeout$1(MasterService.java:123) ~[elasticsearch-6.3.0.jar:6.3.0]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:625) ~[elasticsearch-6.3.0.jar:6.3.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_152]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_152]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_152]

問題緣由

簡言之：es的master處理不過來了，es的segment合併是一個很是耗時的操做。
批量處理的超時時間默認設置爲30s。
能夠經過如下命令查看pending的task：java

curl 'localhost:9200/_cat/pending_tasks?v'

網上說法不一，出現這個問題極可能的緣由能夠總結以下：git

分片數過多；單節點的分片數不要超過1000個（經驗值）；
經過寫入數據自動建立索引最容易出現這種狀況；
大批量寫入數據refresh時間間隔過短；
索引的字段數量太多（幾十上百個）

解決方案

方案一：天天預建立索引

雖然普通的EFK日誌服務能夠經過logstash的默認模板去建立索引，可是當索引個數比較大，而且索引的字段數量太多時，就很是容易出現超時。那麼咱們通常的作法是提早建立好索引，而且設定好每一個索引的mapping。
筆者採用了一種偷懶的操做，寫了一個定時任務，天天去讀取當天的索引的mapping，而後直接用這個mapping去建立新的索引。固然這種解決方案的前提是你本身已經對索引有了一個基本的管理。github