Nginx 104 Connection reset by peer故障處理

故障現象

1.看日誌發現正常日誌和錯誤日誌比例幾乎1:1
2.錯誤日誌所有是104: Connection reset by peer) while reading upstream
3.看訪問日誌也沒有其餘http錯誤狀態碼html

[root@VM_0_22_centos logs]# ls -lh
total 389M
-rw-r--r-- 1 work work 191M Oct 30 17:30 ttt.minminmsn.com_access.log
-rw-r--r-- 1 work work 199M Oct 30 17:30 ttt.minminmsn.com_error.log
[root@VM_0_22_centos logs]# tail -n 1  ttt.minminmsn.com_error.log
2020/10/30 17:30:27 [error] 14063#0: *807476828 readv() failed (104: Connection reset by peer) while reading upstream, client: 117.61.242.104, server: ttt.minminmsn.com, request: "POST /yycp-launcherSnapshot/launcherSnapshot/querySnapshotSync HTTP/1.1", upstream: "http://192.168.8831:8081/ttt", host: "ttt.minminmsn.com"
[root@VM_0_22_centos logs]# cat ttt.minminmsn.com_access.log |awk '{print $9}'|sort |uniq -dc
1081274 200
      6 304
    125 400
  27482 404
    145 429
    106 499
      8 500

分析問題

1.連續責任人諮詢業務場景發現客戶端請求基本上都是POST請求,開始覺得是上傳大文件鏈接超時了,後來開發確認爲了安全使用POST請求,因此並無大文件上傳
2.因爲upstream重置鏈接了,就是說後端主動斷開了鏈接,而後發現鏈接裏有不少TIME-WAIT,應該是qps比較大的狀況下,鏈接處理比較快還在斷開鏈接中就顯得比較多了
3.nginx做爲反向代理既然是客戶端又是服務端,當和後端服務創建鏈接時並無默認開啓長鏈接,開啓長鏈接後性能應該會提高不少
4.默認開啓長鏈接不須要keeplive參數,以下是nginx官網查尋的keepalive參數,看的不是很明白,不過有個連接講的很清楚,他能夠激活鏈接緩存,應該屬於長鏈接性能優化類
5.keepalive參數值應該與qps有關,默認不須要設置太大,若是訪問日誌裏面有5XX錯誤還得根據實際狀況調整,以達到最優效果nginx

下面是官網keeplaive參數解釋
Syntax: keepalive connections;
Default: —
Context: upstream
This directive appeared in version 1.1.4.apache

Activates the cache for connections to upstream servers.後端

The connections parameter sets the maximum number of idle keepalive connections to upstream servers that are preserved in the cache of each worker process. When this number is exceeded, the least recently used connections are closed.centos

It should be particularly noted that the keepalive directive does not limit the total number of connections to upstream servers that an nginx worker process can open. The connections parameter should be set to a number small enough to let upstream servers process new incoming connections as well.
When using load balancing methods other than the default round-robin method, it is necessary to activate them before the keepalive directive.緩存

處理方案

1.修改nginx配置開啓長鏈接及結合鏈接緩存
2.重啓nginx服務
主要配置以下安全

upstream gateway{
            server 192.168.88.31:8081;
            server 192.168.88.44:8081;
            server 192.168.88.115:8081;
            server 192.168.88.80:8081;
            #如下是新增配置
            keepalive 100;
        }

        location / {
           proxy_pass http://gateway;
           proxy_set_header   Host             $host;
           proxy_set_header   X-Real-IP        $remote_addr;
           proxy_set_header   X-Forwarded-For  $proxy_add_x_forwarded_for;
           #如下是新增配置
           proxy_connect_timeout      120;   
           proxy_send_timeout         300;    
           proxy_read_timeout         300; 
           proxy_http_version 1.1;    
           proxy_set_header Connection ""; 
        }

檢查效果

1.查看錯誤日誌
錯誤日誌清空後沒有增加過性能優化

[root@VM_0_22_centos logs]# ls -lh
total 389M
-rw-r--r-- 1 work work 389M Oct 30 18:50 ttt.minminmsn.com_access.log
-rw-r--r-- 1 work work  446 Oct 30 18:10 ttt.minminmsn.com_error.log

2.查看鏈接數狀態
長鏈接前TIME-WAIT比較多app

[root@VM_0_22_centos logs]# ss -an |awk '{print $2}'|sort |uniq -dc |sort -rn
   5045 TIME-WAIT
    156 ESTAB
     62 UNCONN
     21 LISTE

長鏈接後TSTAB比較多ide

[root@VM_0_22_centos ~]# ss -an |awk '{print $2}'|sort |uniq -dc |sort -rn
    511 ESTAB
     62 UNCONN
     52 TIME-WAIT
     21 LISTEN

參考文檔

http://nginx.org/en/docs/http/ngx_http_upstream_module.html#keepalive
https://www.cnblogs.com/sunsky303/p/10648861.html
http://blog.51yip.com/apachenginx/2203.html

相關文章
相關標籤/搜索