先上一張總體的效果圖:html
上面這張圖就是經過 ELK 分析 nginx 日誌所獲得的數據,經過 kibana 的功能展現出來的效果圖。是否是這樣對日誌作了解析,想要知道的數據一目瞭然。接下來就是實現過程實錄。node
經過上一篇:ELK 部署文檔 已經對 ELK + filebeat 獲取 nginx 作了詳細的配置介紹,這裏重點就不在安裝 ELK 上面了。 下面這邊的內容,主要是針對 logstash 配置文件的編寫和 kibana web界面的配置。nginx
主機信息在申明下,和上一篇同樣:web
在編寫logstash 文件以前,得有一個標準輸入輸出格式,這個格式通用的就是 json 格式。正則表達式
首先,考慮如何才能獲取 json 格式的日誌,固然能夠直接經過修改 nginx 日誌的格式來實現,所以開始修改 nginx 日誌格式。若是獲取的日誌沒法修改json 格式,則能夠經過正則表達式來匹配。
json
在nginx 配置文件中添加以下日誌格式:vim
http { … log_format main_json '{"domain":"$server_name",' '"http_x_forwarded_for":"$http_x_forwarded_for",' '"time_local":"$time_iso8601",' '"request":"$request",' '"request_body":"$request_body",' '"status":$status,' '"body_bytes_sent":"$body_bytes_sent",' '"http_referer":"$http_referer",' '"upstream_response_time":"$upstream_response_time",' '"request_time":"$request_time",' '"http_user_agent":"$http_user_agent",' '"upstream_addr":"$upstream_addr",' '"upstream_status":"$upstream_status"}'; …. }
定義的這個nginx 日誌格式叫 main_json 後面的配置文件,均可以引用這個日誌格式。除了nginx 日誌參數之外,還能夠經過配置文件來自行添加自定義參數,好比 獲取用戶的真實ip後端
因而編寫一個自定義變量的配置文件:centos
[root@192.168.118.16 ~]#vim /etc/nginx/location.conf #set $real_ip $remote_addr; if ( $http_x_forwarded_for ~ "^(\d+\.\d+\.\d+\.\d+)" ) { set $real_ip $1; }
這個配置文件只是爲了獲取用戶的真實IP,變量名爲: real_ip 須要在nginx.conf 中引用,在剛纔的配置文件中也加入該變量,完整日誌格式以下:瀏覽器
log_format main_json '{"domain":"$server_name",'
'"real_ip":"$real_ip",'
'"http_x_forwarded_for":"$http_x_forwarded_for",'
'"time_local":"$time_iso8601",'
'"request":"$request",'
'"request_body":"$request_body",'
'"status":$status,'
'"body_bytes_sent":"$body_bytes_sent",'
'"http_referer":"$http_referer",'
'"upstream_response_time":"$upstream_response_time",'
'"request_time":"$request_time",'
'"http_user_agent":"$http_user_agent",'
'"upstream_addr":"$upstream_addr",'
'"upstream_status":"$upstream_status"}';
註釋掉該行:
#access_log /var/log/nginx/access.log main;
接下來,編寫一個nginx 配置文件 端口爲 9527 做爲測試使用
[root@192.168.118.16 ~]#vim /etc/nginx/conf.d/server_9527.conf server { listen 9527; server_name localhost; include location.conf; location / { root /www/9527/; index index.html; access_log /www/log/access.log main_json; error_log /www/log/error.log; } location /shop { root /www/9527; access_log /www/log/shop_access.log main_json; error_log /www/log/shop_error.log; } } [root@192.168.118.16 ~]#mkdir -p /www/{9527,log} [root@192.168.118.16 ~]#cd /www/9527/ [root@192.168.118.16 /www/9527]#vim index.html hello, 9527 [root@192.168.118.16 /www/9527]#mkdir -pv /www/9527/shop [root@192.168.118.16 /www/9527]#vim /www/9527/shop/index.html 出售9527 [root@192.168.118.16 /www/9527]#nginx -t [root@192.168.118.16 /www/9527]#nginx -s reload
Nginx 配置完成,從新加載,訪問測試:
[root@192.168.118.16 ~]#curl http://192.168.118.16:9527/index.html hello, 9527 [root@192.168.118.16 ~]#curl http://192.168.118.16:9527/shop/index.html 出售9527
頁面訪問正常,查看日誌:
[root@192.168.118.16 ~]#ll -tsh /www/log/ total 8.0K 4.0K -rw-r--r-- 1 root root 346 Sep 14 14:35 shop_access.log 4.0K -rw-r--r--. 1 root root 341 Sep 14 14:35 access.log 0 -rw-r--r--. 1 root root 0 Sep 14 14:35 error.log 0 -rw-r--r-- 1 root root 0 Sep 14 14:34 shop_error.log
日誌文件已生成,查看日誌格式:
[root@192.168.118.16 ~]#cat /www/log/access.log {"domain":"localhost","real_ip":"","http_x_forwarded_for":"-","time_local":"2019-09-14T14:35:11+08:00","request":"GET /index.html HTTP/1.1","request_body":"-","status":200,"body_bytes_sent":"12","http_referer":"-","upstream_response_time":"-","request_time":"0.000","http_user_agent":"curl/7.29.0","upstream_addr":"-","upstream_status":"-"}
定義的 json 格式已經被引用到,nginx日誌格式配置完成,接下來就是 經過filebeat 將nginx 日誌傳遞給 logstash
這裏在上一篇的基礎上作修改,直接修改 filebeat 配置文件:
[root@192.168.118.16 ~]#vim /etc/filebeat/modules.d/nginx.yml
重啓 filebeat 服務
[root@192.168.118.16 ~]#systemctl restart filebeat
經過上面的步驟,filebeat 已經將 nginx 日誌傳輸過來了,接下來就看 logstash 要怎麼接收日誌數據了,仍是經過按部就班的方式來編寫。
首先將日誌數據打印到屏幕,保證數據的正確性。
從 nginx.conf 啓動 logstash。經過瀏覽器訪問 nginx 9527端口產生日誌數據。
這裏啓動 logstash 能夠添加 修改自動重載的模式,這樣當修改了 nginx.conf 時,沒必要頻繁的去關閉重啓。
[root@192.168.118.15 /etc/logstash/conf.d]#logstash -f nginx.conf --config.reload.automatic
經過抓取一段 json 數據,分析下:
{ "@timestamp" => 2019-09-14T06:52:16.056Z, "@version" => "1", "source" => "/www/log/access.log", "input" => { "type" => "log" }, "beat" => { "name" => "web-node1", "version" => "6.8.2", "hostname" => "web-node1" }, "host" => { "name" => "web-node1", "architecture" => "x86_64", "id" => "4b3b32a1db0343458c4942a10c79acef", "os" => { "name" => "CentOS Linux", "codename" => "Core", "family" => "redhat", "platform" => "centos", "version" => "7 (Core)" }, "containerized" => false }, "log" => { "file" => { "path" => "/www/log/access.log" } }, "tags" => [ [0] "beats_input_codec_plain_applied" ], "prospector" => { "type" => "log" }, "fileset" => { "module" => "nginx", "name" => "access" }, "offset" => 9350, "event" => { "dataset" => "nginx.access" }, "message" => "{\"domain\":\"localhost\",\"real_ip\":\"\",\"http_x_forwarded_for\":\"-\",\"time_local\":\"2019-09-14T14:52:15+08:00\",\"request\":\"GET / HTTP/1.1\",\"request_body\":\"-\",\"status\":304,\"body_bytes_sent\":\"0\",\"http_referer\":\"-\",\"upstream_response_time\":\"-\",\"request_time\":\"0.000\",\"http_user_agent\":\"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36\",\"upstream_addr\":\"-\",\"upstream_status\":\"-\"}" }
這裏面數據不少,可是這裏的有些數據不是必要的,應該保留須要的數據,而剔除不必的數據,使 json 看起來更加簡潔。
首先來查看這段 json ,發現真正的 nginx 日誌數據都存在 message 裏面,其餘的都是一些主機服務相關的信息,可是 message 看起來亂糟糟的,簡直無法看。既然採用的 json 格式,那就可以格式化。
修改配置文件以下:
{ "@version" => "1", "host" => { "os" => { "name" => "CentOS Linux", "version" => "7 (Core)", "family" => "redhat", "platform" => "centos", "codename" => "Core" }, "name" => "web-node1", "id" => "4b3b32a1db0343458c4942a10c79acef", "architecture" => "x86_64", "containerized" => false }, "upstream_response_time" => "-", "beat" => { "name" => "web-node1", "version" => "6.8.2", "hostname" => "web-node1" }, "domain" => "localhost", "request_body" => "-", "log" => { "file" => { "path" => "/www/log/access.log" } }, "http_user_agent" => "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36", "prospector" => { "type" => "log" }, "http_referer" => "-", "real_ip" => "", "fileset" => { "module" => "nginx", "name" => "access" }, "upstream_status" => "-", "body_bytes_sent" => "0", "@timestamp" => 2019-09-14T07:03:36.087Z, "http_x_forwarded_for" => "-", "status" => 304, "source" => "/www/log/access.log", "input" => { "type" => "log" }, "time_local" => "2019-09-14T15:03:28+08:00", "request_time" => "0.000", "upstream_addr" => "-", "tags" => [ [0] "beats_input_codec_plain_applied" ], "offset" => 11066, "event" => { "dataset" => "nginx.access" }, "request" => "GET / HTTP/1.1" }
將這兩次獲取的數據進行一個對比,下面這個數據 message 被刪除了,可是 message 中的每一個字段都獨立出來了。這樣的好處:
(1)日誌信息更加清晰,可以準確的定位到某一個字段;
(2)爲後面存儲到 elasticsearch 中,進行查詢或者篩選作好了準備。
上面這個操做就等因而將原來的 message 分列存放了。
上面這個json 發現有兩個時間:
@timestamp - 格林尼治時間 - logstash 獲取日誌時間
Time_local - 東八區時間 - nginx日誌記錄時間
這兩個時間的 分鐘和秒鐘並不一致,然後面過濾日誌採用的是 @timestamp 時間,也就是 logstash 時間,這就會形成 nginx 日誌時間不許確的現象,所以須要將兩個時間修改成一致。
{ "@version" => "1", "host" => { "name" => "web-node1", "os" => { "name" => "CentOS Linux", "version" => "7 (Core)", "family" => "redhat", "platform" => "centos", "codename" => "Core" }, "id" => "4b3b32a1db0343458c4942a10c79acef", "architecture" => "x86_64", "containerized" => false }, "upstream_response_time" => "-", "beat" => { "name" => "web-node1", "version" => "6.8.2", "hostname" => "web-node1" }, "domain" => "localhost", "request_body" => "-", "log" => { "file" => { "path" => "/www/log/access.log" } }, "http_user_agent" => "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36", "prospector" => { "type" => "log" }, "http_referer" => "-", "real_ip" => "", "fileset" => { "module" => "nginx", "name" => "access" }, "upstream_status" => "-", "body_bytes_sent" => "0", "status" => 304, "http_x_forwarded_for" => "-", "@timestamp" => 2019-09-14T07:14:46.000Z, "source" => "/www/log/access.log", "input" => { "type" => "log" }, "time_local" => "2019-09-14T15:14:46+08:00", "request_time" => "0.000", "upstream_addr" => "-", "tags" => [ [0] "beats_input_codec_plain_applied" ], "offset" => 11495, "event" => { "dataset" => "nginx.access" }, "request" => "GET / HTTP/1.1" }
如今,對比兩個時間的分鐘 和秒鐘,徹底一致了。接下來,刪除一些沒必要要的字段,並重命名一些字段名,修改配置文件以下:
{ "@version" => "1", "agent" => "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36", "domain" => "localhost", "request_body" => "-", "log" => { "file" => { "path" => "/www/log/access.log" } }, "http_referer" => "-", "response_time" => [ [0] "-" ], "real_ip" => "", "fileset" => { "module" => "nginx", "name" => "access" }, "upstream_status" => "-", "body_bytes_sent" => "0", "status" => 304, "@timestamp" => 2019-09-14T07:22:14.000Z, "time_local" => "2019-09-14T15:22:14+08:00", "request_time" => "0.000", "upstream_addr" => "-", "x_forwarded_for" => [ [0] "-" ], "request" => "GET / HTTP/1.1" }
通過重命名和刪除沒有的字段,json 也變的精簡了不少,這樣存儲 elasticsearch 消耗的存儲空間也響應的變小了。
接下來就能夠將數據寫入到 elasticsearch 中了。在這以前,作的都是 access.log,壓根就沒考慮到 error.log 的格式,由於 nginx 中 error.log 日誌格式沒法自定義。
嘗試訪問一個錯誤uri 來查看下獲取到的數據:
[WARN ] 2019-09-14 15:25:34.300 [[main]>worker3] json - Error parsing json {:source=>"message", :raw=>"2019/09/14 15:25:29 [error] 2122#0: *33 open() \"/www/9527/123.html\" failed (2: No such file or directory), client: 192.168.118.41, server: localhost, request: \"GET /123.html HTTP/1.1\", host: \"192.168.118.16:9527\"", :exception=>#<LogStash::Json::ParserError: Unexpected character ('/' (code 47)): Expected space separating root-level values at [Source: (byte[])"2019/09/14 15:25:29 [error] 2122#0: *33 open() "/www/9527/123.html" failed (2: No such file or directory), client: 192.168.118.41, server: localhost, request: "GET /123.html HTTP/1.1", host: "192.168.118.16:9527""; line: 1, column: 6]>} { "@timestamp" => 2019-09-14T07:25:33.173Z, "@version" => "1", "log" => { "file" => { "path" => "/www/log/error.log" } }, "fileset" => { "module" => "nginx", "name" => "error" }, "message" => "2019/09/14 15:25:29 [error] 2122#0: *33 open() \"/www/9527/123.html\" failed (2: No such file or directory), client: 192.168.118.41, server: localhost, request: \"GET /123.html HTTP/1.1\", host: \"192.168.118.16:9527\"" }
error.log 過來的數據就變成上面這個樣子了。這又是個問題,作 ELK 一是爲了分析數據,二是爲了儘快排錯,若是 ELK 連這個都作不到,那就有點雞肋了。
上面的這個格式看起來又是很亂了,nginx 錯誤日誌都在 message 中,雖然nginx 錯誤日誌沒法定義格式,可是 logstash 能夠經過正則表達式來將它轉換爲 json 格式。但在這以前,應該考慮,access.log 和 error.log 是兩種不一樣的格式,不能用同一種方式去匹配。那怎麼判斷數據是來自 access.log 仍是 error.log 呢?
這裏語法確定是想到了:
If … { Access.log } elseif … { Error.log }
對,語法沒錯,可是用什麼條件呢?查看上面的日誌,不難發現每次都有這樣的字段:
Access.log 日誌數據: "fileset" => { "module" => "nginx", "name" => "access" error.log 日誌數據: "fileset" => { "module" => "nginx", "name" => "error"
這樣,就有判斷的依據了,根據logstash配置語法開始寫:
到目前爲止,logstash 的 nginx 日誌收集過濾配置文件以下:
配置文件名:nginx.conf
input { beats { port => "5044" } } filter { if [fileset][name] == "access" { json { source => "message" remove_field => "message" remove_field => "@timestamp" } date { match => ["time_local", "ISO8601"] target => "@timestamp" } grok { match => { "request" => "%{WORD:method} (?<url>.* )" } } mutate { remove_field => ["host","event","input","request","offset","prospector","source","type","tags","beat"] rename => {"http_user_agent" => "agent"} rename => {"upstream_response_time" => "response_time"} rename => {"http_x_forwarded_for" => "x_forwarded_for"} split => {"x_forwarded_for" => ", "} split => {"response_time" => ", "} } geoip { source => "real_ip" } } if [fileset][name] == "error" { mutate { remove_field => ["@timestamp"] } grok { match => {"message" => "(?<datetime>%{YEAR}[./-]%{MONTHNUM}[./-]%{MONTHDAY}[- ]%{TIME}) \[%{LOGLEVEL:severity}\] %{POSINT:pid}#%{NUMBER}: %{GREEDYDATA:errormessage}(?:, client: (?<real_ip>%{IP}|%{HOSTNAME}))(?:, server: %{IPORHOST:domain}?)(?:, request: %{QS:request})?(?:, upstream: (?<upstream>\"%{URI}\"|%{QS}))?(?:, host: %{QS:request_host})?(?:, referrer: \"%{URI:referrer}\")?"} } date { match => ["datetime", "yyyy/MM/dd HH:mm:ss"] target => "@timestamp" } mutate { remove_field => ["message","request","http_referer","host","event","input","offset","prospector","source","type","tags","beat"] } } } output { stdout { codec => "rubydebug" } }
測試 access.log 日誌格式數據:
{ "@version" => "1", "agent" => "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36", "domain" => "localhost", "request_body" => "-", "log" => { "file" => { "path" => "/www/log/access.log" } }, "http_referer" => "-", "response_time" => [ [0] "-" ], "real_ip" => "", "fileset" => { "module" => "nginx", "name" => "access" }, "upstream_status" => "-", "body_bytes_sent" => "0", "status" => 304, "@timestamp" => 2019-09-14T07:39:50.000Z, "time_local" => "2019-09-14T15:39:50+08:00", "request_time" => "0.000", "upstream_addr" => "-", "x_forwarded_for" => [ [0] "-" ], "request" => "GET / HTTP/1.1" }
測試 error.log 日誌格式數據:
{ "@version" => "1", "agent" => "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36", "domain" => "localhost", "request_body" => "-", "log" => { "file" => { "path" => "/www/log/access.log" } }, "http_referer" => "-", "response_time" => [ [0] "-" ], "real_ip" => "", "fileset" => { "module" => "nginx", "name" => "access" }, "upstream_status" => "-", "body_bytes_sent" => "571", "status" => 404, "@timestamp" => 2019-09-14T07:41:48.000Z, "time_local" => "2019-09-14T15:41:48+08:00", "request_time" => "0.000", "upstream_addr" => "-", "x_forwarded_for" => [ [0] "-" ], "request" => "GET /123.html HTTP/1.1" }
這下沒問題了, 兩種格式的數據都獲取到了。接下來就將數據寫入到 elasticsearch中。
到目前爲止,logstash 配置文件 nginx.conf 以下:
input { beats { port => "5044" } } filter { if [fileset][name] == "access" { json { source => "message" remove_field => "message" remove_field => "@timestamp" } date { match => ["time_local", "ISO8601"] target => "@timestamp" } grok { match => { "request" => "%{WORD:method} (?<url>.* )" } } mutate { remove_field => ["host","event","input","request","offset","prospector","source","type","tags","beat"] rename => {"http_user_agent" => "agent"} rename => {"upstream_response_time" => "response_time"} rename => {"http_x_forwarded_for" => "x_forwarded_for"} split => {"x_forwarded_for" => ", "} split => {"response_time" => ", "} } geoip { source => "real_ip" } } if [fileset][name] == "error" { mutate { remove_field => ["@timestamp"] } grok { match => {"message" => "(?<datetime>%{YEAR}[./-]%{MONTHNUM}[./-]%{MONTHDAY}[- ]%{TIME}) \[%{LOGLEVEL:severity}\] %{POSINT:pid}#%{NUMBER}: %{GREEDYDATA:errormessage}(?:, client: (?<real_ip>%{IP}|%{HOSTNAME}))(?:, server: %{IPORHOST:domain}?)(?:, request: %{QS:request})?(?:, upstream: (?<upstream>\"%{URI}\"|%{QS}))?(?:, host: %{QS:request_host})?(?:, referrer: \"%{URI:referrer}\")?"} } date { match => ["datetime", "yyyy/MM/dd HH:mm:ss"] target => "@timestamp" } mutate { remove_field => ["message","request","http_referer","host","event","input","offset","prospector","source","type","tags","beat"] } } } #output { # stdout { # codec => "rubydebug" # } #} output { elasticsearch { hosts => ["192.168.118.14"] index => "logstash-nginx-%{+YYYY.MM.dd}" } }
這個配置也算是本次 nginx 最終版配置了。
使用瀏覽器屢次訪問nginx 9527 端口,而後切換到 elasticsearch-head 查看索引是否建立成功。
ok,已經看到今天的索引建立成功,查看數據。
數據也是沒有問題的,切換到 kibana 添加索引。
ok,目前已經將數據存儲到 elasticsearch 並經過 kibana 展現出來了,可是想要更清晰的分析查看數據還須要在 kibana 上下一番功夫。
首先是 Discover 這裏,每次進來,都須要一目瞭然的查看日誌,作如下配置:
上面兩個設置之後,每次登陸進來只須要點擊 打開 查看相關模板就能看到清晰的日誌數據。
接下來,就是繪製最上面那副圖啦。
在繪製以前必需要有數據支撐,由於這個是測試環境沒有真是的用戶訪問。所以須要造一批假數據訪問。
方法就是 直接去 access.log 裏複製一條數據,修改 real_ip 爲 公網ip
假數據添加成功後,來進行圖表的配置,點擊 可視化
第一個:訪問省會城市 TOP 5 (餅圖)
選擇餅圖,而後選擇 logstash-nginx-* 索引
完成後點擊保存。
第二個:訪問分佈地圖(座標地圖)
完成後點擊保存。
第三個:域名TOP5 (數據表)
完成後點擊保存。
第四個:後端服務TOP5(數據表)
完成後點擊保存。
第五個:uri top 5(數據表)
完成後點擊保存。
第六個:realipTOP5 (水平條形圖)
完成後點擊保存。
第七個:http狀態TOP5 (餅圖)
完成後點擊保存。
好了, 在 可視化 一欄中,建立了 7 個數據表圖,點開 儀表板,將這些圖表展現出來就好了。
而後將圖表擺放好,大功告成。