ELK 使用小技巧（第 2 期）

時間 2019-11-16

標籤 elk 使用技巧简体版

原文原文鏈接

ELK Tips 主要介紹一些 ELK 使用過程當中的小技巧，內容主要來源爲 Elastic 中文社區。html

1、Logstash

一、Filebeat ：Non-zero metrics in the last 30s

問題表現：Filebeat 沒法向 Elasticsearch 發送日誌數據；
錯誤信息：INFO [monitoring] 1og/log.go:124 Non-zero metrics in the last 30s；
社區反饋：在 input 和 output 下面添加屬性 enabled：true。

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /var/log/*.log

output.elasticsearch:
  hosts: ["https://localhost:9200"]
  username: "filebeat_internal"
  password: "YOUR_PASSWORD"
  enabled: true
複製代碼

input 和 output 下 enabled 屬性默認值爲 true，所以懷疑另有其因。java

二、Logstash 按月生成索引

output {
	if [type] == "typeA"{
		elasticsearch {
			hosts  => "127.0.0.1:9200"
			index => "log_%{+YYYY_MM}"
		}
	}
}
複製代碼

按照日的原理相似：%{+YYYY.MM.dd}git

三、Filebeat 經過配置刪除特定字段

Filebeat 實現了相似 Logstash 中 filter 的功能，叫作處理器（processors），processors 種類很少，儘量在保持 Filebeat 輕量化的基礎上提供更多經常使用的功能。github

下面列幾種經常使用的 processors：算法

add_cloud_metadata：添加雲服務器的 meta 信息；
add_locale：添加本地時區；
decode_json_fields：解析並處理包含 Json 字符串的字段；
drop_event：丟棄符合條件的消息事件；
drop_fields：刪除符合條件的字段；
include_fields：選擇符合條件的字段；
rename：字段重命名；
add_kubernetes_metadata：添加 k8s 的 meta 信息；
add_docker_metadata：添加容器的 meta 信息；
add_host_metadata：添加操做系統的 meta 信息；
dissect：相似與 gork 的正則匹配字段的功能；
dns：配置 filebeat 獨立的 dns 解析方式；
add_process_metadata：添加進程的元信息。

processors 的使用方式：docker

- type: <input_type>
  processors:
  - <processor_name>:
      when:
        <condition>
      <parameters>
...
複製代碼

四、LogStash 採集 FTP 日誌文件

exec {
    codec => plain { }
    command => "curl ftp://server/logs.log"
    interval => 3000}
}
複製代碼

五、Logstash docker-compose 啓動失敗（Permission denied）

在 docker-compose 中使用 user 選項設置使用 root 用戶啓動 docker，能解決權限問題。apache

$ cat docker-compose.yml

version: '2'
services:
  logstash:
    image: docker.elastic.co/logstash/logstash:6.4.2
    user: root
    command: id
複製代碼

六、Metricize filter plugin

將一條消息拆分爲多條消息。編程

# 原始信息
{
    type => "type A"
    metric1 => "value1"
    metric2 => "value2"
}

# 配置信息
filter {
  metricize {
    metrics => [ "metric1", "metric2" ]
  }
}

# 最終輸出
{                               {
    type => "type A"                type => "type A"
    metric => "metric1"             metric => "metric2"
    value => "value1"               value => "value2"
}                               }
複製代碼

2、Elasticsearch

一、ES 倒排索引內部結構

Lucene 的倒排索引都是按照字段（field）來存儲對應的文檔信息的，若是 docName 和 docContent 中有「蘋果」這個 term，就會有這兩個索引鏈，以下所示：json

docName：
"蘋果" -> "doc1, doc2, doc3..."

docContent：
"蘋果" -> "doc2, doc4, doc6..."
複製代碼

二、Jest 和 RestHighLevelClient 哪一個好用點

RestHighLevelClient 是官方組件，會一直獲得官方的支持，且會與 ES 保持同步更新，推薦使用官方的高階 API。api

Jest 因爲是社區維護，因此更新會有必定延遲，目前最新版對接 ES6.3.1，近一個月只有四個 issue，說明總體活躍度較低，所以不推薦使用。

此外推薦一份 TransportClient 的中文使用手冊，翻譯的很不錯：github.com/jackiehff/e…。

三、ES 單分片使用 From/Size 分頁遇到重複數據

常規狀況下 ES 單分片使用 From/Size 是不會遇到數據重複的，數據重複的可能緣由有：

沒有添加排序；
添加了按得分排序，可是查詢語句所有爲 filter 過濾條件（此時得分都一致）；
添加了排序，可是有索引中文檔的新增、修改、刪除等操做。

對於多分片，推薦添加 preference 參數來實現分頁結果的一致性。

四、The number of object passed must be even but was [1]

ES 在調用 setSource 的時候傳入 Json 對象後會報錯：The number of object passed must be even but was [1]，此時能夠推薦將 Json 對象轉爲 Map 集合，或者把 Json 對象轉爲 json 字符串，不過傳入字符串的時候須要設置類型。

IndexRequest indexRequest = new IndexRequest("index", "type", "id");
JSONObject doc = new JSONObject();
//indexRequest.source(jsonObject); 錯誤的使用方法
//轉爲 Map 對象
indexRequest.source(JSONObject.parseObject((String) doc.get("json"), Map.class));
//轉爲 Json 字符串（聲明字符串類型）
indexRequest.source(JSON.toJSONString(doc), XContentType.JSON);
複製代碼

五、跨集羣搜索

ES 6.X 原生支持跨集羣搜索，具體配置請參考：www.elastic.co/guide/en/ki…

PUT _cluster/settings
{
  "persistent": {
    "cluster": {
      "remote": {
        "cluster_one": {
          "seeds": [
            "127.0.0.1:9300"
          ]
        },
        "cluster_two": {
          "seeds": [
            "127.0.0.1:9301"
          ]
        },
        "cluster_three": {
          "seeds": [
            "127.0.0.1:9302"
          ]
        }
      }
    }
  }
}
複製代碼

ES 6.5 推出了新功能，跨集羣同步（Cross-cluster replication），感興趣的能夠自行了解。

六、ES 排序時設置空值排序位置

GET /_search
{
    "sort" : [
        { "price" : {"missing" : "_last"} }
    ],
    "query" : {
        "term" : { "product" : "chocolate" }
    }
}
複製代碼

七、ES 冷歸檔數據如何處理

使用相對低配的大磁盤機器配置爲 ES 的 Warm Nodes，能夠經過 index.routing.allocation.require.box_type 來設置索引是冷數據或者熱數據。若是索引極少使用，能夠 close 索引，而後在須要搜索的時候 open 便可。

八、ES 類似文章檢測

對於大文本的去重，能夠參考 SimHash 算法，經過 SimHash 能夠提取到文檔指紋（64位），兩篇文章經過 SimHash 計算海明距離便可判斷是否重複。海明距離計算，能夠經過插件實現：github.com/joway/elast…

九、Terms 聚合查詢優化

若是隻須要聚合後前 N 條記錄，推薦在 Terms 聚合時添加上 "collect_mode": "breadth_first"；
此外能夠經過設置 "min_doc_count": 10來限制最小匹配文檔數；
若是對返回的 Term 有所要求，能夠經過設置 include 和 exclude 來過濾 Term；
若是想獲取所有 Term 聚合結果，可是聚合結果又不少，能夠考慮將聚合分紅多個批次分別取回（Filtering Values with partitions）。

十、Tomcat 字符集形成的 ES 查詢無結果

兩個系統鏈接同一個 ES 服務，配置和代碼徹底一致，同一個搜索條件，一個可以搜索出來東西，一個什麼都搜索不出來，排查結果是由於其中一個系統的 tomcat 配置有問題，致使請求的時候亂碼了，因此搜不到數據。

十一、ES 索引設置默認分詞器

默認狀況下，若是字段不指定分詞器，ES 或使用 standard 分詞器進行分詞；能夠經過下面的設置更改默認的分詞器。

2.X 支持設置默認的索引分詞器（default_index）和默認的查詢分詞器（default_search），6.X 已經再也不支持。

PUT /index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "default": {
          "type": "ik_max_word",
          "tokenizer": "ik_max_word"
        }
      }
    }
  }
}
複製代碼

十二、ES 中的魔法參數

索引名：_index
類型名：_type
文檔Id：_id
得分：_score
索引排序：_doc

若是你對排序沒有特別的需求，推薦使用 _doc 進行排序，例如執行 Scroll 操做時。

1三、ES 延遲執行數據上卷（Rollup ）

Rollup job 有個 delay 參數控制 job 執行的延遲時間，默認狀況下不延遲執行，這樣若是某個 interval 的數據已經聚合好了，該 interval 遲到的數據是不會處理的。

好在 rollup api 能夠支持同時搜索裸索引和 rollup 過的索引，因此若是數據常常有延遲的話，能夠考慮設置一個合適的 delay，好比 1h、6h 甚至 24h，這樣 rollup 的索引產生會有延遲，可是能確保遲到的數據被處理。

從應用場景上看，rollup 通常是爲了對歷史數據作聚合存放，減小存儲空間，因此延遲幾個小時，甚至幾天都是合理的。搜索的時候，同時搜索最近的裸索引和歷史的 rollup 索引，就能將二者的數據組合起來，在給出正確的聚合結果的狀況下，又兼顧了性能。

Rollup 是實驗性功能，不過很是有用，特別是使用 ES 作數據倉庫的場景。

1四、ES6.x 獲取全部的聚合結果

ES2.x 版本中，在聚合查詢時，經過設置 setSize(0) 就能夠獲取全部的聚合結果，在ES6.x 中直接設置 setSize(Integer.MAX_VALUE) 等效於 2.x 中設置爲 0。

1五、ES Jar 包衝突問題

常常會遇到 ES 與業務集成時出現 Jar 包衝突問題，推薦的解決方法是使用 maven-shade-plugin 插件，該插件經過將衝突的 Jar 包更換一個命名空間的方式來解決 Jar 包的衝突問題，具體使用能夠參考文章：www.jianshu.com/p/d9fb7afa6…。

<plugins>
    <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-shade-plugin</artifactId>
        <version>2.4.1</version>
        <configuration>
            <createDependencyReducedPom>false</createDependencyReducedPom>
        </configuration>
        <executions>
            <execution>
                <phase>package</phase>
                <goals>
                    <goal>shade</goal>
                </goals>
                <configuration>
                    <relocations>
                        <relocation>
                            <pattern>com.google.guava</pattern>
                            <shadedPattern>net.luculent.elasticsearch.guava</shadedPattern>
                        </relocation>
                        <relocation>
                            <pattern>com.fasterxml.jackson</pattern>
                            <shadedPattern>net.luculent.elasticsearch.jackson</shadedPattern>
                        </relocation>
                        <relocation>
                            <pattern>org.joda</pattern>
                            <shadedPattern>net.luculent.elasticsearch.joda</shadedPattern>
                        </relocation>
                        <relocation>
                            <pattern>com.google.common</pattern>
                            <shadedPattern>net.luculent.elasticsearch.common</shadedPattern>
                        </relocation>
                        <relocation>
                            <pattern>com.google.thirdparty</pattern>
                            <shadedPattern>net.luculent.elasticsearch.thirdparty</shadedPattern>
                        </relocation>
                    </relocations>
                    <transformers>
                        <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer" />
                        <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
                    </transformers>
                </configuration>
            </execution>
        </executions>
    </plugin>
</plugins>
複製代碼

1六、ES 如何選擇 Shard 存儲文檔?

ES 採用 djb2 哈希算法對要索引文檔的指定（或者默認隨機生成的）_id 進行哈希，獲得哈希結果後對索引 shard 數目 n 取模，公式以下：hash(_id) % n；根據取模結果決定存儲到哪個 shard 。

3、Kibana

一、在 Kiabana 的 Discovery 界面顯示自定義字段

Kibana 的 Discovery 界面默認只顯示 time 和 _source 兩個字段，這個界面的左半部分，在 Popular 下面展現了不少，你只須要在你須要展現的字段後面點擊 add 便可將自定義的字段添加到 discovery 界面。

二、filebeat 的 monitor 指標的說明

Total：'All events newly created in the publishing pipeline'
Emitted： 'Events processed by the output (including retries)'
Acknowledged：'Events acknowledged by the output (includes events dropped by the output)'
Queued：'Events added to the event pipeline queue'