在網上很難找到logstash中文資料,ruby也沒了解過,看官方文檔太吃力,而個人要求也不高,使用loggstash能夠提取想要的字段便可。html
如下內容純粹想固然的理解:mysql
logstash配置格式
ios
#官方文檔:http://www.logstash.net/docs/1.4.2/ input { ...#讀取數據,logstash已提供很是多的插件,好比能夠從file、redis、syslog等讀取數據 } filter { ...#想要從不規則的日誌中提取關注的數據,就須要在這裏處理。經常使用的有grok、mutate等 } output { ...#輸出數據,在上面處理後的數據輸出到file、elasticsearch等 }
logstash處理過程:正則表達式
1.從input中的插件中讀入數據,按行處理(與awk同樣)redis
file{sql
path => "/var/log/maillog"apache
start_position => "beginning"數組
}ruby
2.在filter中進行數據處理bash
首先讀取第一行,把內容傳給message字段(message與awk中的$0類似)。
grok{}從message中取須要的數據,主要使用正則表達式。
mutate{}主要是修改數據,好比取得一個字段的值,可使用mutate進行數據處理。
3.把處理後的數據輸出去各個插件
處理完一行數據後,重複上面的動做,直到把數據所有處理完成。
logstash配置語言
網址:http://www.logstash.net/docs/1.4.2/configuration
#:註釋 Boolean:true 或者false Examples: debug => true String(字符串) name => "Hello world" #字符串放在雙引號內 abc => "%{name}" #這樣abc的值就是name的值 Number port => 33 Array(數組) path => [ "/var/log/messages", "/var/log/*.log" ] path => "/data/mysql/mysql.log" #path包含三個路徑。 Hash match => { "field1" => "value1" "field2" => "value2" ... } #把多個字段放在{}中,每一個字段使用 "key" => "value" Field References(字段引用) { "agent": "Mozilla/5.0 (compatible; MSIE 9.0)", "ip": "192.168.24.44", "request": "/index.html" "response": { "status": 200, "bytes": 52353 }, "ua": { "os": "Windows 7" } } #字段引用使用[]號,好比使用status作判斷,if [status] = 200 {} #如果要取得字段的值,使用 %{ip} #取os的值,須要這樣:[ua][os],能夠把ua看做數組名,os是下標。 Conditionals(條件語句) if EXPRESSION { ... } else if EXPRESSION { ... } else { ... } equality, etc: ==, !=, <, >, <=, >= regexp: =~, !~ (正則表達式) inclusion: in, not in and, or, nand, xor ! #例子以下: filter { if [action] == "login" { mutate { remove => "secret" } } } output { if [type] == "apache" { if [status] =~ /^5\d\d/ { nagios { ... } } else if [status] =~ /^4\d\d/ { elasticsearch { ... } } statsd { increment => "apache.%{status}" } } } output { # Send production errors to pagerduty if [loglevel] == "ERROR" and [deployment] == "production" { pagerduty { ... } } } filter { if [foo] in [foobar] { mutate { add_tag => "field in field" } } if [foo] in "foo" { mutate { add_tag => "field in string" } } if "hello" in [greeting] { mutate { add_tag => "string in field" } } if [foo] in ["hello", "world", "foo"] { mutate { add_tag => "field in list" } } if [missing] in [alsomissing] { mutate { add_tag => "shouldnotexist" } } if !("foo" in ["hello", "world"]) { mutate { add_tag => "shouldexist" } } } Or, to test if grok was successful: output { if "_grokparsefailure" not in [tags] { elasticsearch { ... } } }
前面關於mutate處理alter日誌,存在很是多的問題。好比原字符串裏面有多個:符號,就會描述顯示不全。使用grok處理以下:
input{ stdin{ type => "hxwtest" } } filter{ grok{ match => ["message","(?<ORAERR_ID>^O[A-Z]{2}-[0-9]{5}):(?<ORA_DESC>.*)"] } grok{ #(?<組名>regex) 把regex捕獲的內容放到組名中,組名會看成一個字段。(?<=:)環視 match => ["message","(?<TEST>(?<=:).*)"] } if "_grokparsefailure" not in [tags]{ mutate{ add_field => {"NGSUBTEST" => "%{TEST}"} } } #把TEST中的空格去掉 mutate {gsub => ["TEST"," ",""]} } output{ stdout{ codec => rubydebug } }
結果以下:
ORA-01589: alter database oracle lkjldkfjdkf { "message" => "ORA-01589: alter database oracle lkjldkfjdkf\r", "@version" => "1", "@timestamp" => "2014-12-13T02:50:46.671Z", "type" => "hxwtest", "host" => "huangwen", "ORAERR_ID" => "ORA-01589", "ORA_DESC" => " alter database oracle lkjldkfjdkf\r", "TEST" => "alterdatabaseoraclelkjldkfjdkf\r", "NGSUBTEST" => " alter database oracle lkjldkfjdkf\r" }