nginx 日誌分析

時間 2019-11-05

標籤 nginx 日誌分析欄目 Nginx 简体版

原文原文鏈接

一、日誌簡介php

1. nginx日誌主要有兩種：訪問日誌和錯誤日誌。訪問日誌主要記錄客戶端訪問nginx的每個請求，格式能夠自定義；錯誤日誌主要記錄客戶端訪問nginx出錯時的日誌，格式不支持自定義。兩種日誌均可以選擇性關閉。
1. 經過訪問日誌，你能夠獲得用戶地域來源、跳轉來源、使用終端、某個URL訪問量等相關信息；經過錯誤日誌，你能夠獲得系統某個服務或server的性能瓶頸等。所以，將日誌好好利用，你能夠獲得不少有價值的信息。

二、訪問日誌html

[Access.log]

log_format  main  '$remote_addr $remote_user [$time_local] "$request" $http_host '
              '$status $upstream_status $body_bytes_sent "$http_referer" '
              '"$http_user_agent" $ssl_protocol $ssl_cipher $upstream_addr '
              '$request_time $upstream_response_time'; 

變量名稱    變量描述    舉例說明
$remote_addr    客戶端地址    例：10.140.15.91
$remote_user    客戶端用戶名稱    -
$time_local    訪問時間和時區    例：18/Jul/2016:17:00:01 +0800
$request    請求的URI和HTTP協議"    例：GET /pa/img/home/logo-alipay-t.png HTTP/1.1"
$http_host    請求地址，即瀏覽器中你輸入的地址(IP或域名)    例：img.xxxxx.com 10.253.70.113
$status    HTTP請求狀態    例：200
$upstream_status    upstream狀態    例：200
$body_bytes_sent    發送給客戶端文件內容大小    例：547
$http_referer    跳轉來源     例："https://cashier.xxxxx.com.../"
$http_user_agent    用戶終端代理    例："Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; SV1; GTB7.0; .NET4.0C;
$ssl_protocol    SSL協議版本    例：TLSv1
$ssl_cipher    交換數據中的算法    例：RC4-SHA
$upstream_addr    後臺upstream的地址，即真正提供服務的主機地址    例：10.238.35.247:80
$request_time    整個請求的總時間    例：0.205
$upstream_response_time    請求過程當中，upstream響應時間    例：0.002

實例：nginx

10.9.137.91 - [02/Aug/2016:14:47:12 +0800] "GET /images/XX/20160324752729.png HTTP/1.1"img.xxxxx.com 200 200 2038 https://cashier.xxxxx.com/XX/PaymentResult.htm?payNo=XX&outBizNo=2012XX "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; Tablet PC 2.0; 360SE)" TLSv1 AES128-SHA 10.228.21.237:80 0.198 0.001

線下測試（$http_referer）：算法

10.14.21.107 - - [14/Aug/2016:17:28:22 +0800] "GET /spanner/watch/v1?--db=ztg-1&--mode=compare&--index=status&--option=&--cluster=whole&-F=2016/8/12-00:00:00&-T=+2880&-i=1&-n=0&_=1344936501292 HTTP/1.1" 200 94193 "http://spanner.xxxxx.net/optionFrame/history.html" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.60 Safari/537.1"

備註：$http_referer和重定向有關。

三、錯誤日誌後端

錯誤信息                                            錯誤說明
"upstream prematurely closed connection"        請求uri的時候出現的異常，是因爲upstream還未返回應答給用戶時用戶斷掉鏈接形成的，對系統沒有影響，能夠忽略
"recv() failed (104: Connection reset by peer)"
        （1）服務器的併發鏈接數超過了其承載量，服務器會將其中一些鏈接Down掉； 
        （2）客戶關掉了瀏覽器，而服務器還在給客戶端發送數據； 
        （3）瀏覽器端按了Stop
"(111: Connection refused) while connecting to upstream"    用戶在鏈接時，若遇到後端upstream掛掉或者不通，會收到該錯誤
"(111: Connection refused) while reading response header from upstream"    用戶在鏈接成功後讀取數據時，若遇到後端upstream掛掉或者不通，會收到該錯誤
"(111: Connection refused) while sending request to upstream"    Nginx和upstream鏈接成功後發送數據時，若遇到後端upstream掛掉或者不通，會收到該錯誤
"(110: Connection timed out) while connecting to upstream"    nginx鏈接後面的upstream時超時
"(110: Connection timed out) while reading upstream"    nginx讀取來自upstream的響應時超時
"(110: Connection timed out) while reading response header from upstream"    nginx讀取來自upstream的響應頭時超時
"(110: Connection timed out) while reading upstream"    nginx讀取來自upstream的響應時超時
"(104: Connection reset by peer) while connecting to upstream"    upstream發送了RST，將鏈接重置
"upstream sent invalid header while reading response header from upstream"    upstream發送的響應頭無效
"upstream sent no valid HTTP/1.0 header while reading response header from upstream"    upstream發送的響應頭無效
"client intended to send too large body"    用於設置容許接受的客戶端請求內容的最大值，默認值是1M，client發送的body超過了設置值
"reopening logs"    用戶發送kill  -USR1命令
"gracefully shutting down",    用戶發送kill  -WINCH命令
"no servers are inside upstream"    upstream下未配置server
"no live upstreams while connecting to upstream"    upstream下的server全都掛了
"SSL_do_handshake() failed"    SSL握手失敗
"SSL_write() failed (SSL:) while sending to client" 
"(13: Permission denied) while reading upstream" 
"(98: Address already in use) while connecting to upstream" 
"(99: Cannot assign requested address) while connecting to upstream" 
"ngx_slab_alloc() failed: no memory in SSL session shared cache"                ssl_session_cache大小不夠等緣由形成
"could not add new SSL session to the session cache while SSL handshaking"        ssl_session_cache大小不夠等緣由形成
"send() failed (111: Connection refused)"

例： connect() failed (111: Connection refused) while connecting to upstream. 後端的upstream中指定的是fast-cgi的php服務器，因此緣由有多是php-fpm.conf的進程數量過少致使。解決方法：php-fpm.conf配置文件pm.max_children修改大一點,重啓php-fpm瀏覽器

除了上述方法，還須要注意os層面的資源限制，整體個人以下：tomcat

ulimit -a  
    core file size              (blocks, -c) 0  
    data seg size               (kbytes, -d) unlimited  
    scheduling priority                 (-e) 0  
    file size               (blocks, -f) unlimited  
    pending signals                 (-i) 32063  
    max locked memory       (kbytes, -l) 64  
    max memory size         (kbytes, -m) unlimited  
    open files                      (-n) 65536  
    pipe size            (512 bytes, -p) 8  
    POSIX message queues     (bytes, -q) 819200  
    real-time priority              (-r) 0   
    stack size              (kbytes, -s) 8192  
    cpu time               (seconds, -t) unlimited  
    max user processes              (-u) 32063  
    virtual memory          (kbytes, -v) unlimited  
    file locks                      (-x) unlimited

例： connect() failed (110: Connection refused) while connecting to upstream. 後端的upstream中指定的是tomcat server ，因此緣由有多是tomcat 的鏈接數過少，tomcat沒法處理高併發致使。解決辦法：建議修改tomcat 配置文件中子進程的最大個數、併發鏈接數、內存大小等。另外，還建議調整nginx 到後端upstream的timeout 時長，以下所示：服務器

worker_processes 4;
    pid /run/nginx.pid;
    worker_rlimit_nofile 16384;
    events {
            worker_connections 10000;
            # multi_accept on;
    }
    http {
            proxy_connect_timeout 300s;
            proxy_send_timeout   600s;
            proxy_read_timeout   600s;
            send_timeout 600s;

}

例：一臺nginx是用來作轉發數據的,也就是反向代理。當鏈接數達到8000以上時，就會出現NGINX 502 Bad Gateway錯誤。nginx日誌所有都是一些110錯誤代碼：*7300911 connect() failed (110: Connection timed out) while connecting to upstream。(nginx的併發鏈接數五六萬都沒問題的吧，怎麼會出現這種現象呢)cookie

分析：Nginx 502錯誤的緣由比較多，大部分是由於在反向代理模式下後端服務器出現問題引發的。這些錯誤通常都不是nginx自己的問題，因此必定要從後端找緣由！但nginx拋出的這些出錯信息很是簡單，着實讓nginx的用戶備受置疑，畢竟從字眼上理解，bad gateway？不就是bad nginx嗎？讓不瞭解的人看到，會直接把責任推在nginx身上。502錯誤最一般的出現狀況就是後端主機當機，固然，後端主機併發數過高也可能引發偶爾的502，後端主機優化配置不得當，也可能形成502。若是一直502，得檢查後端主機地址或端口是否配置錯誤。

例：nginx 配置以下:session

upstream tomcat{
server tom1:8901 max_fails=1 fail_timeout=60s;
server tom2:8901 max_fails=1 fail_timeout=60s;
}

日誌裏面每隔一兩分鐘就會記錄一條相似 379803415 no live upstreams while connecting to upstream 的日誌，此外，還有大量的「upstream prematurely closed connection while reading response header from upstream」的日誌。
· ·
·分析，nginx在鏈接upstream的其中一個server時，maxfail爲1，表示有任何一次失敗，nginx就認爲該機器失敗，而後就去嘗試連下一個；若是全部的都失敗了，在底層就會進行quick recovery 把每一個peer的失敗次數都重置爲0，而後再返回一個NGX_BUSY，而後nginx就會打印一條no live upstreams ,最後又回到原始狀態，接着進行轉發了。這就解釋了no live upstreams以後還能正常訪問。

·
從新看配置文件，若是其中一臺有一次失敗，nginx就會認爲它已經死掉，而後就會把之後的流量全都打到另外一臺上面，當另一臺也有一次失敗的時候，就認爲兩個都死掉了，而後quick recovery，而後打印一條日誌。這樣帶來的另外一個問題是，若是幾臺nginx同時認定一臺後端已經死掉的時候，會形成流量的不均衡。

初步的解決方法：把max_fails從1改爲5，效果很明顯，「no live upstreams」出現的機率變少了不少，但卻沒有徹底消失。

至於日誌中的「upstream prematurely closed connection while reading response header from upstream」抓包分析後發現是由於nginx 給後端服務器發送一個請求後，後端服務器直接回復[fin,ack]，而後nginx將502的結果轉發給請求者。這種狀況極可能是因爲後端應用引發

關於概念

1.併發鏈接數

客戶端向服務器發起請求，並創建了TCP鏈接。每秒鐘服務器連接的總TCP數量，就是併發鏈接數。

2.請求數

請求數指的是客戶端在創建完鏈接後，向http服務發出GET/POST/HEAD數據包。
拓展：服務器返回了請求結果後有兩種狀況：
http數據包頭包含Close字樣，關閉本次TCP鏈接；
http數據包頭包含Keep-Alive字樣，本次鏈接不關閉，可繼續經過該鏈接繼續向http服務發 送請求，用於減小TCP併發鏈接數。

PV（page view）

訪問量,即頁面瀏覽量或者點擊量,用戶每次對網站的訪問均被記錄1次。用戶對同一頁面的屢次訪問，訪問量值累計

UV（unique visitor）

獨立訪客量。將每一個獨立設備（以cookie爲依據）視爲一位訪客，一天以內（00:00-24:00）的訪客數量。一天以內相同cookie的訪問只被計算1次。

獨立IP

00:00-24:00內相同IP地址只被計算一次

注：關於併發鏈接數、請求數、併發用戶數的概念參考日誌http://www.4wei.cn/archives/1002399

經常使用日誌分析命令

一、總請求數

wc -l access.log |awk '{print $1}'

二、獨立IP數

awk '{print $1}' access.log|sort |uniq |wc -l

三、每秒客戶端請求數 TOP5

awk  -F'[ []' '{print $5}' access.log|sort|uniq -c|sort -rn|head -5

四、訪問最頻繁IP Top5

awk '{print $1}' access.log|sort |uniq -c | sort -rn |head -5

五、訪問最頻繁的URL TOP5

awk '{print $7}' access.log|sort |uniq -c | sort -rn |head -5

六、響應大於10秒的URL TOP5

awk '{if ($12 > 10){print $7}}' access.log|sort|uniq -c|sort -rn |head -5

七、HTTP狀態碼(非200)統計 Top5

awk '{if ($13 != 200){print $13}}' access.log|sort|uniq -c|sort -rn|head -5

八、分析請求數大於50000的源IP的行爲

awk '{print $1}' access.log|sort |uniq -c |sort -rn|awk '{if ($1 > 50000){print $2}}' > tmp.txt
for i in $(cat tmp.txt)
do
     echo $i  >> analysis.txt
     echo "訪問行爲統計" >> analysis.txt
     grep $i  access.log|awk '{print $6}' |sort |uniq -c | sort -rn |head -5 >> analysis.txt
     echo "訪問接口統計" >> analysis.txt
     grep $i  access.log|awk '{print $7}' |sort |uniq -c | sort -rn |head -5 >> analysis.txt
     echo -e "\n"  >> /root/analysis/$Ydate.txt
done
注：若是源IP來自代理服務器，應將第一條命令過濾地址改成$http_x_forwarded_for地址

awk '{print $NF}' access.log|sort |uniq -c |sort -rn|awk '{if ($1 > 50000){print $2}}' > tmp.txt

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。