openresty 致使大量 timewait 的一次跳坑經歷

    背景是cdn 的刷新系統,agent 會向 openresty 發送大量 GET / PURGE 請求,而後由openresty 將請求轉發給後端 ats ,當壓測的時候,發現openresty 性能出現問題,檢查發現出現了大量TIMEWAIT。html

該文章後續仍在不斷的更新修改中, 請移步到原文地址http://dmwan.ccnginx

    首先,咱們openresty 的配置是這樣的:後端

lua_package_path "/usr/local/openresty/nginx/sp_lua/?.lua;/usr/local/openresty/nginx/sp_lua/?.sp;?.lua;/usr/local/openresty/lualib/?.lua";
lua_code_cache on;
lua_shared_dict refresh_db 16m;
  

upstream ats{
    server 127.0.0.1:8080;
}

    server {
        listen       80;
        server_name  locolhost default.com;

        location / {
                set $module_conf "/usr/local/openresty/nginx/conf/lua_modules_conf/module_conf";
                include "lua_include_conf/include_location.conf";
                proxy_set_header Host $host;
                proxy_pass http://ats;
               
        }

        # redirect server error pages to the static page /50x.html
        #
        error_page   500 502 503 504  /50x.html;
        location = /50x.html {
            root   html;
        }
}

    由 openresty 做爲代理,向8080端口的ats 轉發請求。按照猜測,我認爲這應該是長鏈接是比較合理的。而後,並非。bash

    ss -s 發現短時timewait 數量達到5萬,這必定是有問題的。tcp

    查看端口狀態memcached

netstat -an |grep 8080

  現象以下:性能

tcp        0      0 127.0.0.1:58009             127.0.0.1:8080              TIME_WAIT   -
tcp        0      0 127.0.0.1:57931             127.0.0.1:8080              TIME_WAIT   -
tcp        0      0 127.0.0.1:60167             127.0.0.1:8080              TIME_WAIT   -
tcp        0      0 127.0.0.1:58079             127.0.0.1:8080              TIME_WAIT   -
tcp        0      0 127.0.0.1:58149             127.0.0.1:8080              TIME_WAIT   -
tcp        0      0 127.0.0.1:60375             127.0.0.1:8080              TIME_WAIT   -
tcp        0      0 127.0.0.1:57657             127.0.0.1:8080              TIME_WAIT   -
tcp        0      0 127.0.0.1:59569             127.0.0.1:8080              TIME_WAIT   -
tcp        0      0 127.0.0.1:56999             127.0.0.1:8080              TIME_WAIT   -
tcp        0      0 127.0.0.1:63087             127.0.0.1:8080              TIME_WAIT   -
tcp        0      0 127.0.0.1:61483             127.0.0.1:8080              TIME_WAIT   -
tcp        0      0 127.0.0.1:61461             127.0.0.1:8080              TIME_WAIT   -
tcp        0      0 127.0.0.1:62133             127.0.0.1:8080              TIME_WAIT   -
tcp        0      0 127.0.0.1:63053             127.0.0.1:8080              TIME_WAIT   -
tcp        0      0 127.0.0.1:62125             127.0.0.1:8080              TIME_WAIT   -
tcp        0      0 127.0.0.1:61197             127.0.0.1:8080              TIME_WAIT   -
tcp        0      0 127.0.0.1:63139             127.0.0.1:8080              TIME_WAIT   -
tcp        0      0 127.0.0.1:57719             127.0.0.1:8080              TIME_WAIT   -
tcp        0      0 127.0.0.1:60967             127.0.0.1:8080              TIME_WAIT   -
tcp        0      0 127.0.0.1:58575             127.0.0.1:8080              TIME_WAIT   -

    從timewait 能夠看出應該是openresty 主動關掉了鏈接。fetch

    抓包定位下,this

ngrep -W byline port 8080 -d lo -X

    顯示結果以下:lua

##
T 127.0.0.1:31278 -> 127.0.0.1:8080 [AP]
GET /test.html HTTP/1.0.
Host: www.default.com.
Connection: close.
User-Agent: refresh_fetcher.
Accept-Encoding: gzip.
.

##
T 127.0.0.1:8080 -> 127.0.0.1:31278 [AP]
HTTP/1.0 200 OK.
Server: ATS/6.2.3.
Date: Tue, 27 Mar 2018 13:34:37 GMT.
Content-Type: text/html.
Content-Length: 4.
Last-Modified: Tue, 05 Sep 2017 05:11:20 GMT.
ETag: "59ae31f8-4".
Expires: Tue, 27 Mar 2018 14:34:37 GMT.
Cache-Control: max-age=3600.
Accept-Ranges: bytes.
Age: 489.
Ws-Cache-Key: http://www.xxx.com/test.html.
Ws-Milestone: UA-BEGIN=1522158166703713, UA-FIRST-READ=1522158166703713, UA-READ-HEADER-DONE=1522158166703713, UA-BEGIN-WRITE=1522158166703758, CACHE-OPEN-READ-BEGIN=1522158166703733, CACHE-OPEN-READ-END=1522158166703733, PLUGIN-ACTIVE=1522158166703715, PLUGIN-TOTAL=152215816670371.
Ws-Hit-Miss-Code: TCP_MEM_HIT.
Ws-Is-Hit: 1.

    結果有兩個很重要的點,一個是http1.0 發起方就是http1.0,說明發送端確實是轉發的時候就設主動以1.0發的短鏈接,第二個是connection 是close。

    查詢openresty 官方文檔。發現有對應描述:

The connections parameter sets the maximum number of idle keepalive connections to upstream servers that are preserved in the cache of each worker process. When this number is exceeded, the least recently used connections are closed.

It should be particularly noted that the keepalive directive does not limit the total number of connections to upstream servers that an nginx worker process can open. The connections parameter should be set to a number small enough to let upstream servers process new incoming connections as well.
Example configuration of memcached upstream with keepalive connections:

upstream memcached_backend {
    server 127.0.0.1:11211;
    server 10.0.0.2:11211;

    keepalive 32;
}

server {
    ...

    location /memcached/ {
        set $memcached_key $uri;
        memcached_pass memcached_backend;
    }

}
For HTTP, the proxy_http_version directive should be set to 「1.1」 and the 「Connection」 header field should be cleared:

upstream http_backend {
    server 127.0.0.1:8080;

    keepalive 16;
}

server {
    ...

    location /http/ {
        proxy_pass http://http_backend;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        ...
    }
}
Alternatively, HTTP/1.0 persistent connections can be used by passing the 「Connection: Keep-Alive」 header field to an upstream server, though this method is not recommended.

    簡單來講,就是須要設置keepalive 數量,設置http_version, 設置 Connection 三個參數。當修改後,問題獲得解決。

    總結:最開始覺得像openresty 性能如此優秀的項目,長鏈接是應該天然而然支持的,不過,真的是想多了。

相關文章
相關標籤/搜索