目前架構:node
n臺filebeat客戶端來將每臺應用上的日誌傳到kafka,3臺kafka作集羣用於日誌隊列,四臺ES作集羣,前兩臺存放近兩天熱數據日誌,後兩臺存放兩天前的歷史日誌,數據保存一個月,目前總數據量44億,大小爲6T。logstash與kibana與ES在一臺機器上,kibana域名指向後端三個kibana作輪詢。nginx
出現性能問題:後端
一、集羣中只有第一臺負載很高,其餘節點負載一直都很低,偶爾同爲hot數據節點的第二臺負載也會稍微有點升高。架構
二、隊列常常堵塞,kafka中uat,pet,prd三個環境的topic同在一個默認的logstash消費組。只要其中一個環境的列隊積壓,其餘環境的隊列就沒法消費了。ide
三、Kibana登錄後首頁打開,須要至少半分鐘,日誌查詢也很慢,至少幾分鐘纔會出結果。
性能
四、有時候ES常因負載高而脫離集羣,致使集羣節點數據從新分配,集羣狀態顏色爲RED,同時kibana頁面打開時顯示Red報錯。kibana頁面間斷沒法打開的狀況約持續一兩週。spa
目前ELK中發現有些索引查詢有點慢,因而打開ES索引查詢日誌來記錄慢查詢,進而對慢查詢日誌進行分析,定位問題。慢日誌內容以下:日誌
[2017-08-28T11:21:02,377][WARN ][index.search.slowlog.query] [node-3] [logstash-nginx-2017.08.01][4] took[15s], took_millis[15029], types[], stats[], search _type[QUERY_THEN_FETCH], total_shards[140], source[{"size":0,"query":{"bool":{"filter":[{"match_none":{"boost":1.0}},{"query_string":{"query":"NOT status:200 OR NOT status:304","fields":[],"use_dis_max":true,"tie_breaker":0.0,"default_operator":"or","auto_generate_phrase_queries":false,"max_determined_states":10000,"enable_position _increment":true,"fuzziness":"AUTO","fuzzy_prefix_length":0,"fuzzy_max_expansions":50,"phrase_slop":0,"analyze_wildcard":true,"escape":false,"split_on_whitespace":true, "boost":1.0}}],"disable_coord":false,"adjust_pure_negative":true,"boost":1.0}},"aggregations":{"3":{"terms":{"field":"status","size":5,"min_doc_count":0,"shard_min_doc_ count":0,"show_term_doc_count_error":false,"order":[{"_count":"desc"},{"_term":"asc"}]},"aggregations":{"2":{"date_histogram":{"field":"@timestamp","format":"epoch_mill is","interval":"20m","offset":0,"order":{"_key":"asc"},"keyed":false,"min_doc_count":0,"extended_bounds":{"min":"1503886846372","max":"1503890446372"}}}}}}}], [2017-08-28T11:21:02,377][WARN ][index.search.slowlog.query] [node-3] [logstash-nginx-2017.08.01][2] took[15.7s], took_millis[15787], types[], stats[], sear ch_type[QUERY_THEN_FETCH], total_shards[140], source[{"size":0,"query":{"bool":{"filter":[{"match_none":{"boost":1.0}},{"query_string":{"query":"NOT status:200 OR NOT status:304","fields":[],"use_dis_max":true,"tie_breaker":0.0,"default_operator":"or","auto_generate_phrase_queries":false,"max_determined_states":10000,"enable_positi on_increment":true,"fuzziness":"AUTO","fuzzy_prefix_length":0,"fuzzy_max_expansions":50,"phrase_slop":0,"analyze_wildcard":true,"escape":false,"split_on_whitespace":tru e,"boost":1.0}}],"disable_coord":false,"adjust_pure_negative":true,"boost":1.0}},"aggregations":{"3":{"terms":{"field":"status","size":5,"min_doc_count":0,"shard_min_do c_count":0,"show_term_doc_count_error":false,"order":[{"_count":"desc"},{"_term":"asc"}]},"aggregations":{"2":{"date_histogram":{"field":"@timestamp","format":"epoch_mi llis","interval":"20m","offset":0,"order":{"_key":"asc"},"keyed":false,"min_doc_count":0,"extended_bounds":{"min":"1503886846372","max":"1503890446372"}}}}}}}],
下面進行分析:orm
待續
索引