寫給大忙人的ELK最新版6.2.4學習筆記-Logstash和Filebeat解析(java異常堆棧下多行日誌配置支持)

接前一篇CentOS 7下最新版(6.2.4)ELK+Filebeat+Log4j日誌集成環境搭建完整指南,繼續對ELK。html

logstash官方最新文檔https://www.elastic.co/guide/en/logstash/current/index.html。
假設有幾十臺服務器,每臺服務器要監控系統日誌syslog、tomcat日誌、nginx日誌、mysql日誌等等,監控OOM、內存低下進程被kill、nginx錯誤、mysql異常等等,可想而知,這是多麼的耗時耗力。
logstash採用的是插件化體系架構,幾乎全部具體功能的實現都是採用插件,已安裝的插件列表能夠經過bin/logstash-plugin list --verbose列出。或者訪問https://www.elastic.co/guide/en/logstash/current/input-plugins.html、https://www.elastic.co/guide/en/logstash/current/output-plugins.html。java

logstash配置文件格式

分爲輸入、過濾器、輸出三部分。除了POC目的外,基本上全部實際應用中都須要filter對日誌進行預處理,不管是nginx日誌仍是log4j日誌。output中的stdout同理。node

input {
    log4j {
        port => "5400"
    }
    beats {
        port => "5044"
    }
}
filter {  # 多個過濾器會按聲明的前後順序執行
    grok {
        match => { "message" => "%{COMBINEDAPACHELOG}"}
    }
    geoip {
        source => "clientip"
    }
}
output {
    elasticsearch {
		action => "index"
		hosts => "127.0.0.1:9200" # 或者 ["IP Address 1:port1", "IP Address 2:port2", "IP Address 3"] ,支持均衡的寫入ES的多個節點,通常爲非master節點
		index  => "logstash-%{+YYYY-MM}"
    }
	stdout { 
		codec=> rubydebug 
	}
	file {
        path => "/path/to/target/file"
    }
}

logstash支持的經常使用輸入包括syslog(參考RFC3164)、控制檯、文件、redis、beats。
logstash支持的經常使用輸出包括es、控制檯、文件。
logstash支持的經常使用過濾器包括grok、mutate、drop、clone、geoip。mysql

查看logstash各類命令行選項nginx

[root@elk1 bin]# ./logstash --help
OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N
Usage:
    bin/logstash [OPTIONS]

Options:
    -n, --node.name NAME          Specify the name of this logstash instance, if no value is given
                                  it will default to the current hostname.
                                   (default: "elk1")
    -f, --path.config CONFIG_PATH Load the logstash config from a specific file
                                  or directory.  If a directory is given, all
                                  files in that directory will be concatenated
                                  in lexicographical order and then parsed as a
                                  single config file. You can also specify
                                  wildcards (globs) and any matched files will
                                  be loaded in the order described above.
    -e, --config.string CONFIG_STRING Use the given string as the configuration
                                  data. Same syntax as the config file. If no
                                  input is specified, then the following is
                                  used as the default input:
                                  "input { stdin { type => stdin } }"
                                  and if no output is specified, then the
                                  following is used as the default output:
                                  "output { stdout { codec => rubydebug } }"
                                  If you wish to use both defaults, please use
                                  the empty string for the '-e' flag.
                                   (default: nil)
    --modules MODULES             Load Logstash modules.
                                  Modules can be defined using multiple instances
                                  '--modules module1 --modules module2',
                                     or comma-separated syntax
                                  '--modules=module1,module2'
                                  Cannot be used in conjunction with '-e' or '-f'
                                  Use of '--modules' will override modules declared
                                  in the 'logstash.yml' file.
    -M, --modules.variable MODULES_VARIABLE Load variables for module template.
                                  Multiple instances of '-M' or
                                  '--modules.variable' are supported.
                                  Ignored if '--modules' flag is not used.
                                  Should be in the format of
                                  '-M "MODULE_NAME.var.PLUGIN_TYPE.PLUGIN_NAME.VARIABLE_NAME=VALUE"'
                                  as in
                                  '-M "example.var.filter.mutate.fieldname=fieldvalue"'
    --setup                       Load index template into Elasticsearch, and saved searches, 
                                  index-pattern, visualizations, and dashboards into Kibana when
                                  running modules.
                                   (default: false)
    --cloud.id CLOUD_ID           Sets the elasticsearch and kibana host settings for
                                  module connections in Elastic Cloud.
                                  Your Elastic Cloud User interface or the Cloud support
                                  team should provide this.
                                  Add an optional label prefix '<label>:' to help you
                                  identify multiple cloud.ids.
                                  e.g. 'staging:dXMtZWFzdC0xLmF3cy5mb3VuZC5pbyRub3RhcmVhbCRpZGVudGlmaWVy'
    --cloud.auth CLOUD_AUTH       Sets the elasticsearch and kibana username and password
                                  for module connections in Elastic Cloud
                                  e.g. 'username:<password>'
    --pipeline.id ID              Sets the ID of the pipeline.
                                   (default: "main")
    -w, --pipeline.workers COUNT  Sets the number of pipeline workers to run.
                                   (default: 1)
    --experimental-java-execution (Experimental) Use new Java execution engine.
                                   (default: false)
    -b, --pipeline.batch.size SIZE Size of batches the pipeline is to work in.
                                   (default: 125)
    -u, --pipeline.batch.delay DELAY_IN_MS When creating pipeline batches, how long to wait while polling
                                  for the next event.
                                   (default: 50)
    --pipeline.unsafe_shutdown    Force logstash to exit during shutdown even
                                  if there are still inflight events in memory.
                                  By default, logstash will refuse to quit until all
                                  received events have been pushed to the outputs.
                                   (default: false)
    --path.data PATH              This should point to a writable directory. Logstash
                                  will use this directory whenever it needs to store
                                  data. Plugins will also have access to this path.
                                   (default: "/usr/local/app/logstash-6.2.4/data")
    -p, --path.plugins PATH       A path of where to find plugins. This flag
                                  can be given multiple times to include
                                  multiple paths. Plugins are expected to be
                                  in a specific directory hierarchy:
                                  'PATH/logstash/TYPE/NAME.rb' where TYPE is
                                  'inputs' 'filters', 'outputs' or 'codecs'
                                  and NAME is the name of the plugin.
                                   (default: [])
    -l, --path.logs PATH          Write logstash internal logs to the given
                                  file. Without this flag, logstash will emit
                                  logs to standard output.
                                   (default: "/usr/local/app/logstash-6.2.4/logs")
    --log.level LEVEL             Set the log level for logstash. Possible values are:
                                    - fatal
                                    - error
                                    - warn
                                    - info
                                    - debug
                                    - trace
                                   (default: "info")
    --config.debug                Print the compiled config ruby code out as a debug log (you must also have --log.level=debug enabled).
                                  WARNING: This will include any 'password' options passed to plugin configs as plaintext, and may result
                                  in plaintext passwords appearing in your logs!
                                   (default: false)
    -i, --interactive SHELL       Drop to shell instead of running as normal.
                                  Valid shells are "irb" and "pry"
    -V, --version                 Emit the version of logstash and its friends,
                                  then exit.
    -t, --config.test_and_exit    Check configuration for valid syntax and then exit.
                                   (default: false)
    -r, --config.reload.automatic Monitor configuration changes and reload
                                  whenever it is changed.
                                  NOTE: use SIGHUP to manually reload the config
                                   (default: false)
    --config.reload.interval RELOAD_INTERVAL How frequently to poll the configuration location
                                  for changes, in seconds.
                                   (default: 3000000000)
    --http.host HTTP_HOST         Web API binding host (default: "127.0.0.1")
    --http.port HTTP_PORT         Web API http port (default: 9600..9700)
    --log.format FORMAT           Specify if Logstash should write its own logs in JSON form (one
                                  event per line) or in plain text (using Ruby's Object#inspect)
                                   (default: "plain")
    --path.settings SETTINGS_DIR  Directory containing logstash.yml file. This can also be
                                  set through the LS_SETTINGS_DIR environment variable.
                                   (default: "/usr/local/app/logstash-6.2.4/config")
    --verbose                     Set the log level to info.
                                  DEPRECATED: use --log.level=info instead.
    --debug                       Set the log level to debug.
                                  DEPRECATED: use --log.level=debug instead.
    --quiet                       Set the log level to info.
                                  DEPRECATED: use --log.level=quiet instead.
    -h, --help                    print help

各配置的含義也能夠參考https://www.elastic.co/guide/en/logstash/current/logstash-settings-file.html
比較實用的是:
-f filename.conf 指定配置文件
--config.test_and_exit 解析配置文件正確性
--config.reload.automatic 自動監聽配置修改而無需重啓,跟nginx -s reload同樣,挺實用的git

ELK均採用YAML語言(https://baike.baidu.com/item/YAML/1067697?fr=aladdin)編寫配置文件。github

YAML有如下基本規則: 
一、大小寫敏感 
二、使用縮進表示層級關係 
三、禁止使用tab縮進,只能使用空格鍵 
四、縮進長度沒有限制,只要元素對齊就表示這些元素屬於一個層級。 
五、使用#表示註釋 
六、字符串能夠不用引號標註redis

JVM參數在config/jvm.options中設置。spring

配置文件中output和filter部分均支持主要常見的邏輯表達式好比if/else if,以及各類比較、正則匹配。
配置文件中還能夠訪問環境變量,經過${HOME}便可,具體能夠參考https://www.elastic.co/guide/en/logstash/current/environment-variables.html。sql

Beats Input插件

在開始看具體Input插件以前,咱們看下哪些選項是全部插件都支持的。
其中主要的是id,若是一個logstash實例裏面開了多個相同類型的插件,能夠用來區分。

經過Beats插件加載數據源已是ELK 6.x的主要推薦方式,因此咱們來詳細看下Beats插件的配置(https://www.elastic.co/guide/en/logstash/current/plugins-inputs-beats.html)。

input {
  beats {
    port => 5044
  }
}

其中port是參數是必填的,沒有默認值。除了ssl配置外,其餘幾乎都是可選的。
host默認是"0.0.0.0",表明監聽全部網卡,除非有特殊安全要求,也是推薦的作法。

核心解析插件Grok Filter

一般來講,各類日誌的格式都比較靈活複雜好比nginx訪問日誌或者並不純粹是一行一事件好比java異常堆棧,並且還不必定對大部分開發或者運維那麼友好,因此若是能夠在最終展示前對日誌進行解析並歸類到各個字段中,可用性會提高不少。grok過濾器插件就是用來完成這個功能的。grok和beat插件同樣,默承認用。
從非源頭上來講,日誌體系好很差,很大程度上依賴於這一步的過濾規則作的好很差,因此雖然繁瑣,但卻必須掌握,跟nginx的重寫差很少。
Logstash自帶了約120個模式,具體可見https://github.com/logstash-plugins/logstash-patterns-core/tree/master/patterns。
grok的語法爲:%{SYNTAX:SEMANTIC}
相似於java:

String pattern = ".*runoob.*";
boolean isMatch = Pattern.matches(pattern, content); 

其中的pattern就至關於SYNTAX,SEMANTIC爲content,只不過由於解析的時候沒有字段名,因此content是賦給匹配正則模式的文本的字段名,這些字段名會被追加到event中。
例如對於下列http請求日誌:
55.3.244.1 GET /index.html 15824 0.043
使用 %{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration} 匹配的話,除了原message外,事件中會新增下列額外字段:
client: 55.3.244.1
method: GET
request: /index.html
bytes: 15824
duration: 0.043
完整的grok例子以下:

input {
  file {
    path => "/var/log/http.log"
  }
}
filter {
  grok {
    match => { "message" => "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}" }
  }
}

注:若是重啓,logstash怎麼知道讀取到http.log的什麼位置了,在filebeat部分,咱們會講到。
grok的主要選項是match和overwrite,前者用來解析message到相應字段,後者用來重寫message,這樣原始message就能夠被覆蓋,對於不少的日誌來講,原始的message重複存儲一份沒有意義。 https://www.elastic.co/guide/en/logstash/6.2/plugins-filters-grok.html#plugins-filters-grok-overwrite

雖然Grok過濾器能夠用來進行格式化,可是對於多行事件來講,並不適合在filter或者input(multiline codec,若是但願在logstash中處理多行事件,能夠參考https://www.elastic.co/guide/en/logstash/current/multiline.html)中處理,由於使用ELK的平臺一般日誌使用beats input插件,此時在logstash中進行多行事件的處理會致使數據流混亂,因此須要在事件發送到logstash以前就處理好,也就是應該在filebeat中預處理。

對於來自於filebeat模塊的數據,logstash自帶了針對他們的解析模式,參考https://www.elastic.co/guide/en/logstash/current/logstash-config-for-filebeat-modules.html,具體到filebeat的時候詳解。

ES Output插件

主要的選項包括:
action,默認是index,索引文檔(logstash的事件)(ES架構與核心概念參考)。
host,聲明ES服務器地址端口
index,事件寫入的ES index,默認是logstash-%{+YYYY.MM.dd},按天分片index,通常來講咱們會按照時間分片,時間格式參考http://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat.html。

filebeat

從ELK 6.x開始,log4j輸入插件已經再也不建議使用,推薦的替代是filebat。

filebeat工做原理

參考https://www.elastic.co/guide/en/beats/filebeat/6.2/how-filebeat-works.html
Filebeat由兩個主要組件組成, prospectors和harvesters,他們一塊兒協做tail文件並將事件發送給聲明的輸出。harvester的職責是以行爲單位讀取文件,發送給輸出,每一個文件由不一樣的harvester讀取。prospector的職責是管理harvester並找到要讀取的文件。
Filebeat當前支持log和stdin這兩種prospector,每種prospector能夠定義屢次。
Filebeat在註冊表(經過參數filebeat.registry_file聲明,默認是${path.data}/registry)中記錄了每一個文件的狀態,狀態記錄了上一次harvester的讀取偏移量。prospector則記錄了每一個找到的文件的狀態。Filebeat確保全部的事件都被髮送至少一次。

filebeat的配置文件一樣採用YAML格式。

filebeat.prospectors:
- type: log
  paths:
    - /var/log/*.log  # 聲明日誌文件的絕對路徑
  fields:
    type: syslog  # 聲明增長一個值爲syslog的type字段到事件中
output.logstash:
  hosts: ["localhost:5044"]

filebeat支持輸出到Elasticsearch或者Logstash,通常來講通行的作法都是到Logstash,因此到ES的相關配置略過。
filebeat的命令行選項能夠參考https://www.elastic.co/guide/en/beats/filebeat/6.2/command-line-options.html,配置文件全部配置項參考https://www.elastic.co/guide/en/beats/filebeat/6.2/filebeat-reference-yml.html。

默認狀況下,filebeat運行在後臺,要之前臺方式啓動,運行./filebeat -e。

要使用Filebeat,咱們須要在filebeat.yml配置文件的filebeat.prospectors下聲明prospector,prospector不限定只有一個。例如:

filebeat.prospectors:
- type: log
  paths:
    - /var/log/apache/httpd-*.log

- type: log
  paths:
    - /var/log/messages
    - /var/log/*.log

其餘有用的選項還包括include_lines(僅讀取匹配的行)、exclude_lines(不讀取匹配的行)、exclude_files(排除某些文件)、tags、fields、fields_under_root、close_inactive(日誌文件多久沒有變化後自動關閉harvester,默認5分鐘)、scan_frequency(prospector爲harvester掃描新文件的頻率,注意,因close_inactive自動關閉的也算新文件,默認爲10s,不要低於1s)等
具體可見https://www.elastic.co/guide/en/beats/filebeat/6.2/configuration-filebeat-options.html。

解析多行消息

對於採用ELK做爲應用日誌來講,多行消息的友好展現是必不可少的,不然ELK的價值就大大打折了。要正確的處理多行消息,須要在filebeat.yml中設置multiline規則以聲明哪些行屬於一個事件。主要是由multiline.pattern、multiline.negate、multiline.match這三個參數決定。
好比,對於java日誌而言,可使用:

multiline.pattern: '^\['
multiline.negate: true
multiline.match: after

或者:

multiline.pattern: '^[[:space:]]+(at|\.{3})\b|^Caused by:'
multiline.negate: false
multiline.match: after

這樣,下面的日誌就算一個事件了。

[beat-logstash-some-name-832-2015.11.28] IndexNotFoundException[no such index]
    at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver$WildcardExpressionResolver.resolve(IndexNameExpressionResolver.java:566)
    at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndices(IndexNameExpressionResolver.java:133)
    at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndices(IndexNameExpressionResolver.java:77)
    at org.elasticsearch.action.admin.indices.delete.TransportDeleteIndexAction.checkBlock(TransportDeleteIndexAction.java:75)

詳細的配置能夠參考https://www.elastic.co/guide/en/beats/filebeat/6.2/multiline-examples.html。

Filebeat支持的輸出包括Elasticsearch、Logstash、Kafka、Redis、File、Console,都挺簡單,能夠參考https://www.elastic.co/guide/en/beats/filebeat/6.2/kafka-output.html。

Filebeat模塊提供了一種更便捷的方式處理常見的日誌格式,好比apache二、mysql等。從性質上來講,他就像spring boot,約定優於配置。具體能夠參考https://www.elastic.co/guide/en/beats/filebeat/6.2/filebeat-modules-overview.html。Filebeat模塊要求Elasticsearch 5.2以及以後的版本。

相關文章
相關標籤/搜索