如下配置是logstash切分tomcat catalina.out日誌。java
http://grok.qiexun.net/ 分割時先用這個網站測試下語句對不對,能不能按需切割日誌。正則表達式
1 input { 2 file { 3 type => "01-catalina" 4 path => ["/usr/local/tomcat-1/logs/catalina.out"] 5 start_position => "beginning" 6 ignore_older => 3 7 codec=> multiline { 8 pattern => "^2018" 9 negate => true 10 what => "previous" 11 } 12 } 13 14 file { 15 type => "02-catalina" 16 path => ["/usr/local/tomcat-2/logs/catalina.out"] 17 start_position => "beginning" 18 ignore_older => 3 19 codec=> multiline { 20 pattern => "^2018" 21 negate => true 22 what => "previous" 23 } 24 } 25 26 } 27 28 filter { 29 grok { 30 match => { 31 "message" => "%{DATESTAMP:date} \|-%{LOGLEVEL:level} \[%{DATA:class}\] %{DATA:code_info} -\| %{GREEDYDATA:log_info}" 32 } 33 } 34 } 35 36 37 38 output { 39 elasticsearch { 40 hosts => ["192.168.1.1:9200"] 41 index => "tomcat-%{type}" 42 } 43 stdout { 44 codec => rubydebug 45 } 46 }
跨行匹配 好比java 堆棧信息apache
1 input { 2 file { 3 type => "10.139.32.68" 4 path => ["/data1/application/api/apache-tomcat/logs/catalina.out"] 5 start_position => "beginning" 6 ignore_older => 3 7 codec=> multiline { 8 pattern => "^\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}" 9 negate => true 10 what => "previous" 11 } 12 }
codec=> multiline 引用 multiline插件
pattern 正則匹配 ^\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2} 表示以 2018-10-10 10:10:10 日期形式開通的
negate
what 值爲previous 表示未匹配的內容屬於上一個匹配內容
自定義正則表達式api
其中(?<>())格式表示一個正則開始,<>裏是正則匹配名,()裏是正則表達式tomcat
上圖正則分四段,分別是ruby
(?<date>(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d{3})) 匹配日期bash
\s{1,2} 匹配1或2個空格服務器
(?<loglevel>(\w{4,5})) 匹配4或5個字母app
(?<log_info>(.*)) 匹配全部字符elasticsearch
又以下面這個
2019-01-04 17:29:56.479 |-ERROR 31593 --- [DubboServerHandler-10.139.32.94:20885-thread-50] c.v.g.risk.service.CreditReportService : shuJuMoHeMessage TelRelativize error_null
%{DATESTAMP:date} \|-%{LOGLEVEL:level} \d{3,5} --- (?<xxx>(\[\w+-\d+.\d+.\d+.\d+:\d+-\w+-\d+\])) (?<file>(\S+))\s+: %{GREEDYDATA:log_info}
logstash 啓動多個配置文件
logstash 啓動多個配置文件,好比conf目錄下有cs.conf和server.conf就能夠用下面命令啓動
./logstash -f ../conf/
記住conf後面不能加上* 如./logstash -f ../conf/* ,這樣只會讀取conf目錄下的一個配置文件。
另外雖然能夠同時啓動多個配置文件,但其實是把多個配置文件拼接成一個的配置文件的,也就是多個配置文件裏的input、filter、output不是相互獨立的。
若有倆個配置文件cs.conf和server.conf 配置以下:
cs.conf配置:
1 input { 2 file { 3 type => "192.168.1.1" 4 path => ["/data1/application/cs/tomcat-1/logs/catalina.out"] 5 start_position => "beginning" 6 ignore_older => 3 7 codec=> multiline { 8 pattern => "^\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}" 9 negate => true 10 what => "previous" 11 } 12 } 13 14 } 15 16 filter { 17 grok {
remove_tag => ["multiline"] #打印多行時有時會沒法解析,由於tags裏會多出一個multiline ,進而報錯,報錯信息如文末備註1
18 match => { 19 "message" => "%{DATESTAMP:date} \|-%{LOGLEVEL:level} %{GREEDYDATA:log_info}" 20 } 21 } 22 } 23 24 25 26 output { 27 elasticsearch { 28 hosts => ["192.168.0.1:9200"] 29 index => "qwe-cs-tomcat" 30 } 31 stdout { 32 codec => rubydebug 33 } 34 }
server.conf配置
1 input { 2 file { 3 type => "192.168.1.1" 4 path => ["/data1/application/server/tomcat-2/logs/catalina.out"] 5 start_position => "beginning" 6 ignore_older => 3 7 codec=> multiline { 8 pattern => "^\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}" 9 negate => true 10 what => "previous" 11 } 12 } 13 14 } 15 16 filter { 17 grok { 18 match => { 19 "message" => "%{DATESTAMP:date} \|-%{LOGLEVEL:level} %{GREEDYDATA:log_info}" 20 } 21 } 22 } 23 24 25 26 output { 27 elasticsearch { 28 hosts => ["192.168.0.1:9200"] 29 index => "qwe-server-tomcat" 30 } 31 stdout { 32 codec => rubydebug 33 } 34 }
上面倆個配置就算同時啓動,但實際上倆個配置文件會拼接成一個,input裏的內容會輸出倆個,致使elk裏數據看起來是重複的,打印了倆次

通常這種狀況建議,input裏建議使用或者這兩個特殊字段,即在讀取文件的時候,添加標識符在中或者定義變量。
以下面這種tagstypetagstype
1 input { 2 file { 3 type => "192.168.1.1" 4 tags =>"cs" 5 path => ["/data1/application/cs/tomcat-1/logs/catalina.out"] 6 start_position => "beginning" 7 ignore_older => 3 8 codec=> multiline { 9 pattern => "^\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}" 10 negate => true 11 what => "previous" 12 } 13 } 14 15 file { 16 type => "192.168.1.1" 17 tags =>"server" 18 path => ["/data1/application/server/tomcat-2/logs/catalina.out"] 19 start_position => "beginning" 20 ignore_older => 3 21 codec=> multiline { 22 pattern => "^\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}" 23 negate => true 24 what => "previous" 25 } 26 } 27 28 } 29 30 filter { 31 grok {
remove_tag =>["multiline"] 32 match => { 33 "message" => "%{DATESTAMP:date} \|-%{LOGLEVEL:level} %{GREEDYDATA:log_info}" 34 } 35 } 36 } 37 38 39 40 output { 41 elasticsearch { 42 hosts => ["192.168.0.1:9200"] 43 index => "ulh-%{tags}-tomcat" 44 } 45 stdout { 46 codec => rubydebug 47 } 48 }
這樣咱們就能夠根據tpye和tags分別日誌是哪臺服務器上的哪一個應用了
注1:跨行解析時由於多個tags致使沒法解析異常的截圖,解決方法就是在 filter grok裏添加 remove_tag =>["multiline"]
注2:logstash中multiline更多用法
input { stdin { codec =>multiline { charset=>... #可選 字符編碼 max_bytes=>... #可選 bytes類型 設置最大的字節數 max_lines=>... #可選 number類型 設置最大的行數,默認是500行 multiline_tag... #可選 string類型 設置一個事件標籤,默認是multiline pattern=>... #必選 string類型 設置匹配的正則表達式 patterns_dir=>... #可選 array類型 能夠設置多個正則表達式 negate=>... #可選 boolean類型 設置true是向前匹配,設置false向後匹配,默認是FALSE what=>... #必選 設置未匹配的內容是向前合併仍是前後合併,previous,next兩個值選擇 } } }