立刻十一了,吾等無意工做,一心只想儘快的爲祖國母親慶生,等待之餘放出來一個我解決logstash日期filter的實踐。
使用logstash @timestamp 取出來的日誌格式爲UTC時間,也就是說比中國的用戶早了8個小時,這樣致使咱們在查詢的時候的時候不能按照咱們咱們本身時間進行查詢,還得作把這個時間減去8個小時。帶來了很大的不便,嘗試了設定時區依然沒法更改這個日期,因此只能本身經過其餘方式diy了。 nginx
既然這樣,採用不使用它的@timestamp的方法,那就本身新增字段,曲線救國。正則表達式
nginx的日誌配置格式爲:redis
log_format access ‘$remote_addr – $remote_user [$time_local] "$request" "$status $body_bytes_sent "$http_referer" '"$http_user_agent" $http_x_forwarded_for';
首先放出來一條常見的nginx日誌記錄
mongodb
127.0.0.1 - - [30/Sep/2016:14:18:33 +0800] "GET / HTTP/1.1" 200 396 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:47.0) Gecko/20100101 Firefox/47.0"
看到了吧,30/Sep/2016:14:18:33 +0800 這一段就是訪問的真正時間,也就是$time_local 這個字段匹配出來的,這個格式是ISO8601格式,可是咱們經常使用的是yyyy-MM-dd格式的,nginx的日誌格式又無法變動,若是想變動的話,只能修改nginx的源碼,從新打包nginx安裝,比較麻煩。既然這樣,咱們就手工的去解析這個字符串,而後組裝成咱們須要的格式,再進行輸出出去。ruby
首先增長三個字段,年月日均指向%{timestamp},bash
add_field => {"access_year" => "%{timestamp}"} add_field => {"access_month" => "%{timestamp}"} add_field => {"access_day" => "%{timestamp}"}
這三個字段能夠在任何的filter中,可是要在grok filter下面,例如我放在了urldecode中,以下ide
urldecode { add_field => {"access_year" => "%{timestamp}"} add_field => {"access_month" => "%{timestamp}"} add_field => {"access_day" => "%{timestamp}"} all_fields => true }
而後經過定義正則表達式,分別把年月日匹配出來url
mutate{ gsub =>[ "access_year","[\W\w]*/|:[\s\S]*","" ] gsub => [ "access_month","[(\d+/)|(/\d+)]|:[\s\S]*","" ] gsub =>[ "access_day","/[\s\S]*|:[\s\S]*","" ] }
實際上以上正則表達式也就是字符串30/Sep/2016:14:18:33 +0800中的年月日匹配出來.net
匹配出年debug
匹配出月
匹配出日
可是這時候咱們匹配出來的月份是用英文表示的而不是數字,能夠經過translate來進行轉換
translate{ exact => true regex => true dictionary => [ "Jan","01", "Feb","02", "Mar","03", "Apr","04", "May","05", "Jun","06", "Jul","07", "Aug","08", "Sep","09", "Oct","10", "Nov","11", "Dec","12" ] field => "access_month" destination => "access_month_temp" }
最後就能夠增長一個咱們最終顯示的字段,經過把以上臨時字段進行任意的組裝
alter{ add_field => {"access_date"=>"%{access_year}-%{access_month_temp}-%{access_day}"} remove_field=>["access_year","access_month","access_day","access_month_temp","bytes","ident","auth"] remove_tag=>["tags"] }
上述增長了一個access_date字段,這個字段出來的格式就是yyyy-MM-dd的,而後經過remove_fileld把中間的臨時字段都給刪除掉,這樣經過logstash添加到redis或者mongodb中的access_date字段就是咱們想要的格式了。這個格式能夠根據咱們的需求隨便定義和拼裝。
我使用的部分完整文件以下
input { stdin { } file { path => "/usr/local/nginx/logs/gateway_access.log" start_position => beginning } } filter{ grok { #經過GROK來自動解析APACHE日誌格式 match => { "message" => "%{COMMONAPACHELOG}" } } #date{ # locale => "en" # timezone => "Asia/Shanghai" # match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ] #} kv { source => "request" field_split => "&?" value_split => "=" } urldecode { add_field => {"access_year" => "%{timestamp}"} add_field => {"access_month" => "%{timestamp}"} add_field => {"access_day" => "%{timestamp}"} all_fields => true } mutate{ gsub =>[ "access_year","[\W\w]*/|:[\s\S]*","" ] gsub => [ "access_month","[(\d+/)|(/\d+)]|:[\s\S]*","" ] gsub =>[ "access_day","/[\s\S]*|:[\s\S]*","" ] } translate{ exact => true regex => true dictionary => [ "Jan","01", "Feb","02", "Mar","03", "Apr","04", "May","05", "Jun","06", "Jul","07", "Aug","08", "Sep","09", "Oct","10", "Nov","11", "Dec","12" ] field => "access_month" destination => "access_month_temp" } alter{ add_field => {"access_date"=>"%{access_year}-%{access_month_temp}-%{access_day}"} remove_field=>["access_year","access_month","access_day","access_month_temp","bytes","ident","auth"] remove_tag=>["tags"] } } output { stdout { codec => rubydebug } mongodb { collection => "pagelog" database => "statistics" uri => "mongodb://192.168.1.52:27017" } }