nginx
grok**正則表達式是logstash很是重要的一個環節**;能夠經過grok很是方便的將數據拆分和索引git
語法格式:正則表達式
(?<name>pattern) 編程
?<name>表示要取出裏面的值,pattern就是正則表達式vim
例子:收集控制檯輸入,而後將時間採集出來ruby
input {stdin{}}
filter {
grok {
match => {
"message" => "(?<date>\d+\.\d+)\s+"
}
}
}
output {stdout{codec => rubydebug}}
仍是按照上面的例子:4.19 is luck day 而後取出每個字段app
input {stdin{}}
filter {
grok {
match => {
"message" => "(?<date>\d+\.\d+)\s+(?<is>\w+)\s+(?<luck>\w+)\s+(?<day>\w+)"
}
}
}
output {stdout{codec => rubydebug}}
默認grok調用的是:/logstash-5.5.2/vendor/bundle/jruby/1.9/gems/logstash-patterns-core-4.1.1/patterns 這個目錄下的正則svg
上面的例子,能夠這樣寫:oop
input {stdin{}}
filter {
grok {
match => {
"message" => "%{NUMBER:date:float} %{WORD:is} %{WORD:luck} %{WORD:day}"
}
}
}
output {stdout{codec => rubydebug}}
結果截圖:測試
Nginx打印出的日誌通常格式是:
192.168.77.1 - - [10/May/2018:12:12:40 +0800] "GET /plugins/ml/ml.svg HTTP/1.1" 304 0 "http://hadoop01/app/kibana" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36" "-"
nginx這種日誌是非格式化的,一般,咱們獲取到日誌後,還要使用mapreduce或者spark作一下清洗操做,就是將非格式化日誌編程格式化日誌;
在清洗的時候,若是日誌的數據量比較大,那麼也是須要花費必定的時間的;
因此可使用logstash的grok功能,將nginx的非格式化數據採集成格式化數據:
安裝grok插件: bin/logstash-plugin install logstash-filter-grok
input {stdin{}}
filter {
grok {
match => {
"message" => "%{IPORHOST:clientip} - - \[%{HTTPDATE:time_local}\] \"(?:%{WORD:request} %{NOTSPACE:request}(?:HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:status} %{NUMBER:body_bytes_sent} %{QS:http_referer} %{QS:agent} %{NOTSPACE:http_x_forwarded_for}"
}
}
}
output {stdout{codec => rubydebug}}
【注意:】不一樣的nginx日誌格式,應該對應不一樣的正則
啓動:
bin/logstash -f /home/angel/logstash-5.5.2/logstash_conf/filter_4.conf
在控制檯輸入日誌:
192.168.77.1 - - [10/May/2018:12:12:40 +0800] "GET /plugins/ml/ml.svg HTTP/1.1" 304 0 "http://hadoop01/app/kibana" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36" "-"
上面瞭解到logstash能夠將nginx的非格式化日誌進行格式化,那麼在nginx的日誌中有IP;每每會根據ip定位當前的地理位置,Logstash默認是安裝了logstash-filter-geoip插件的
而後在kibana上以高德地圖作展現
vim /conf/template/geoip.conf
input {stdin{}}
filter {
grok {
match => {
"message" => "%{IPORHOST:clientip} - - \[%{HTTPDATE:time_local}\] \"(%{WORD:request} %{NOTSPACE:request}(?:HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:status} %{NUMBER:body_bytes_sent} %{QS:http_referer} %{QS:agent} %{NOTSPACE:http_x_forwarded_for}"
}
}
geoip{
source => "clientip". #設置解析的ip字段
target => 「geoip」. #將解析的geoip保存在一個字段內
}
}
output {stdout{codec => rubydebug}}
啓動:bin/logstash -f /usr/local/elk/logstash-5.5.2/conf/template/geoip.conf
向控制檯輸入nginx日誌:
119.151.192.24 - - [10/May/2018:12:12:40 +0800] "GET /plugins/ml/ml.svg HTTP/1.1" 304 0 "http://hadoop01/app/kibana" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36" "-"
截圖展現:
下載地址:http://geolite.maxmind.com/download/geoip/database/GeoLite2-City.mmdb.gz(課程內提供)
而後在編寫的時候,指定下載的ip-經緯度庫,同時,咱們會發現返回的信息太多了,有不少不是咱們想要的,那麼也能夠指定哪些是本身想要的:
input {stdin{}}
filter {
grok {
match => {
"message" => "%{IPORHOST:clientip} - - \[%{HTTPDATE:time_local}\] \"(?:%{WORD:request} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:status} %{NUMBER:body_bytes_sent} %{QS:http_referer} %{QS:agent} %{NOTSPACE:http_x_forwarded_for}"
}
}
geoip{
source => "clientip"
database => "/home/angel/logstash-5.5.2/conf/GeoLite2-City.mmdb"
target => "geoip"
add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]
add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}" ]
fields => ["country_name", "region_name", "city_name", "latitude", "longitude"]
# remove_field => [ "[geoip][longitude]", "[geoip][latitude]" ]
}
}
output {stdout{codec => rubydebug}}
在採集的日誌中,每每出現相似於這樣的URL:
https://mbd.baidu.com/newspage/data/landingsuper?context=%7B%22nid%22%3A%22news_6858188417104403771%22%7D&n_type=0&p_from=1
相似這種url,字段的信息是按照&拼接而成的,因此須要把這些url進行拆分
vim k_v_split.conf
input {
stdin {
}
}
filter {
kv {
prefix => "key_"
source => "message"
field_split => "&"
value_split => "="
}
}
output {
stdout{codec=>rubydebug}
}
啓動:bin/logstash -f /usr/local/elk/logstash-5.5.2/conf/template/k_v_split.conf
向控制檯輸入:
https://www.baidu.com/s?wd=哈哈,這就是測試&a=1&b=2&c=3&d=4&e=5
結果截圖: