TCP SYN隊列與Accept隊列詳解

李樂git

  盡信書,不如無書。github

  紙上得來終覺淺,絕知此事要躬行。golang

  實驗現象依賴於系統(以下)以及內核參數(附錄);一切以實驗結果爲準。sql

cat /proc/version
Linux version 3.10.0-693.el7.x86_64

引子

  線上服務(Golang)調用內網API服務(經由內網網關/Nginx轉發)時,偶爾會出現"connection reset by peer"報警;爲此梳理TCP RST包可能產生的幾種狀況:cookie

  • 目的主機防火牆攔截;
  • 向已關閉的socket發送數據;
  • 全鏈接隊列溢出;
  • 向已經"消逝"的鏈接發送數據。

  狀況說明:Golang服務做爲客戶端,內網網關Nginx做爲服務端,HTTP請求默認基於長鏈接(鏈接池)。網絡

  狀況1很是容易理解;同機房內網環境,基本能夠排除。這裏不作過多介紹。下面將詳細介紹狀況2/3/4。app

Nginx關閉鏈接

  Golang服務經過長鏈接向網關Nginx發起請求;當Nginx主動斷開鏈接,而剛好很不幸的此時Golang發起HTTP請求而且是複用以前的長鏈接,便會出現狀況2。那麼何時Nginx會主動斷開長鏈接呢?curl

  1)keepalive_timeout:設置每一個TCP長鏈接在Nginx能夠保持的最大時間,默認75秒;socket

  2)keepalive_requests:設置每一個TCP長鏈接最多能夠處理的請求數,默認100;tcp

  Golang目前有這幾個措施應對鏈接關閉狀況:1)底層檢測鏈接關閉事件,標記鏈接不可用;2)ECONNRESET錯誤時,對部分請求進行重試,好比:GET請求,請求頭中出現{X-,}Idempotency-Key。固然實際判斷是否重試邏輯仍是比較複雜的;

+Transport.roundTrip
    +persistConn.shouldRetryRequest
        +RequestisReplayable
        
func (r *Request) isReplayable() bool {
    if r.Body == nil || r.Body == NoBody || r.GetBody != nil {
        switch valueOrDefault(r.Method, "GET") {
        case "GET", "HEAD", "OPTIONS", "TRACE":
            return true
        }
        
        if r.Header.has("Idempotency-Key") || r.Header.has("X-Idempotency-Key") {
            return true
        }
    }
    return false
}

  Transport.IdleConnTimeout可配置空閒鏈接超時時間;然而他與Nginx配置keepalive_timeout含義不一樣,所以沒法保證Golang客戶端主動關閉鏈接;

  另外,也能夠經過短鏈接方式避免。

  Golang net/http庫還有待深刻研究。

  參考資料:

SYN Queue與Accept Queue介紹

  以下圖所示(摘抄自網絡),1)server端接受到SYN請求,建立socket,存儲於SYN Queue(半鏈接隊列),並向客戶端返回SYN+ACK;2)server端接收到第三次握手的ACK,socket狀態更新爲ESTABLISHED,同時將socket移動到Accept Queue(全鏈接隊列),等待應用程序執行accept()。

syn-accept.png

  不論是SYN Queue仍是Accept Queue,都有最大長度限制,超過限制時,內核或直接丟棄,或返回RST包。Queue大小計算方法以下:

  注:下文使用的backlog指調系統用listen(fd, backlog) 的第二個參數。

  • Accept Queue:

  min(backlog, net.core.somaxconn)

  校驗Accept Queue是否滿的邏輯以下(注意大於號才返回ture,即最終可存儲socket數目會加1):

return sk->sk_ack_backlog > sk->sk_max_ack_backlog
  • SYN Queue:
nr_table_entries = min_t(u32, nr_table_entries, sysctl_max_syn_backlog);
nr_table_entries = max_t(u32, nr_table_entries, 8);
nr_table_entries = roundup_pow_of_two(nr_table_entries + 1);
//向上取知足2的指數倍的整數;好比10=》16

for (lopt->max_qlen_log = 3;
     (1 << lopt->max_qlen_log) < nr_table_entries;
     lopt->max_qlen_log++);

  程序中的nr_table_entries初始值爲min(backlog, net.core.somaxconn);sysctl_max_syn_backlog即內核參數net.ipv4.tcp_max_syn_backlog;變量lopt->max_qlen_log限制了SYN Queue大小。

  須要注意的,變量lopt->max_qlen_log的類型爲u8(8比特無符號整型),最終SYN Queue大小爲2^(lopt->max_qlen_log),其上限爲roundup_pow_of_two(sysctl_max_syn_backlog + 1),下限爲16。

  校驗SYN Queue是否滿的邏輯以下(qlen爲當前SYN Queue長度,經過右移運算符判斷):

return queue->listen_opt->qlen >> queue->listen_opt->max_qlen_log;

  小知識

  可經過netstat或者ss命令查看socket信息;socket處於監聽LISTEN狀態時,Send-Q爲Accept Queue最大長度,Recv-Q爲Accept Queue累計的等待應用程序accept()的socket數目。(而當socket處於ESTABLISHED狀態時,Send-Q與Recv-Q分別表示socket發送緩衝區與接收緩衝區數據大小)

# ss -lnt
State       Recv-Q Send-Q   Local Address:Port  Peer Address:Port
LISTEN      0      128          *:10088            *:*

SYN Queue

  那麼當SYN Queue溢出時,服務端是怎麼處理呢?丟棄仍是回覆RST包?咱們將從實驗驗證與源碼分析兩個角度講解。

SYN Queue溢出實驗

  咱們利用hping3模擬SYN發包(須要注意的是,在利用hping3模擬時,客戶端收到SYN+ACK會返回RST;本文經過iptables -A INPUT -s $ip -j DROP攔截服務端返回數據包,消除了客戶端RST包影響)。服務端啓動監聽(此時SYN Queue限制爲16):

sock=socket(AF_INET, SOCK_STREAM)
sock.bind(('', 8888))
sock.listen(1)

  記得經過netstat查看初始TCP統計信息:

# netstat -s |grep -E 'listen| resets sent| LISTEN'
    5236 resets sent //RST包發送數目
    438 times the listen queue of a socket overflowed //Accept Queue溢出數目
    2900 SYNs to LISTEN sockets dropped //三次握手過程丟棄數目

  客戶端啓動發包;-S設置SYN標誌,-p指定目標端口號,-i設置發送間隔1000微妙(即每毫秒發送1個SYN數據包)。同時在服務端啓動tcpdump抓包。

hping3 -S -p 8888 -i u1000  $ip

542 packets tramitted

  總髮包數目爲542;再次查看服務端
TCP統計信息:

# netstat -s |grep -E 'listen| resets sent| LISTEN'
    5236 resets sent
    438 times the listen queue of a socket overflowed
    3426 SYNs to LISTEN sockets dropped

  能夠看到SYN丟棄數目增長了526=542-16(16爲SYN Queue長度限制),服務端發送RST數目沒有變化。

  查看tcpdump結尾的抓包狀況,能夠看到只有客戶端的SYN請求,服務端沒有給客戶端返回SYN+ACK。

13:40:48.230881 IP xxxx.ms-sql-s > xxxx.8888: Flags [S], seq 340595037, win 512, length 0
13:40:48.231880 IP xxxx.ms-sql-m > xxxx.8888: Flags [S], seq 580674513, win 512, length 0
13:40:48.232920 IP xxxx.ibm-cics > xxxx.8888: Flags [S], seq 1559804617, win 512, length 0
13:40:48.233896 IP xxxx.saism > xxxx.8888: Flags [S], seq 2102270179, win 512, length 0

  SYN Queue溢出時,服務端只是丟棄客戶端的SYN數據包。

tcp_syncookies

  其實還有一個內核參數tcp_syncookies能夠影響SYN Queue行爲。

tcp_syncookies (Boolean; since Linux 2.2)
              Enable TCP syncookies.  The kernel must be compiled with CONFIG_SYN_COOKIES.  Send out  syncookies  when  the  syn  backlog
              queue  of a socket overflows.  The syncookies feature attempts to protect a socket from a SYN flood attack.  This should be
              used as a last resort, if at all.  This is a violation of the TCP protocol, and conflicts with other areas of TCP  such  as
              TCP  extensions.   It  can  cause problems for clients and relays.  It is not recommended as a tuning mechanism for heavily
              loaded servers to help with overloaded or misconfigured conditions.  For recommended alternatives see  tcp_max_syn_backlog,
              tcp_synack_retries, and tcp_abort_on_overflow

  tcp_syncookies是一種專門防護 SYN Flood 攻擊的方法,其 基於鏈接信息(包括源地址、源端口、目的地址、目的端口等)以及一個加密種子(如系統啓動時間),計算出一個哈希值(SHA1),這個哈希值稱爲cookie。

  該cookie被用做TCP初始序列號,來應答SYN+ACK 包,並釋放鏈接狀態。當客戶端發送完三次握手的最後一次ACK 後,服務端就會再次計算這個哈希值,確認是上次返回的 SYN+ACK 的返回包,纔會進入TCP 的鏈接狀態。

  即,開啓 SYN Cookies 後,服務端就不須要維護半開鏈接狀態了,從而也就不存在SYN Queue溢出狀況了。

  是這樣嗎?咱們來實驗驗證下。

  修改內核參數:

# sysctl -w net.ipv4.tcp_syncookies=1
net.ipv4.tcp_syncookies = 1

  記錄初始TCP統計信息:

# netstat -s |grep -E 'listen| resets sent| LISTEN'
    5236 resets sent
    438 times the listen queue of a socket overflowed
    4473 SYNs to LISTEN sockets dropped

  客戶端啓動hping3開始發送SYN包,服務端開啓tcpdump抓包:

hping3 -S -p 8888 -i u1000  10.90.101.6

282 packets tramitted

  總髮包數目爲282;再次查看服務端 TCP統計信息:

# netstat -s |grep -E 'listen| resets sent| LISTEN'
    5236 resets sent
    438 times the listen queue of a socket overflowed
    4739 SYNs to LISTEN sockets dropped

  能夠很明顯的看到,SYN包丟棄的數目依然有變化,增加266=282-16。怎麼回事?爲何服務端還會丟棄SYN包呢?難道tcp_syncookies與咱們理解的不一致?

  可是查看服務端tcpdump抓包狀況,咱們發現結尾服務端依然在向客戶端返回SYN+ACK(不是SYN+ACK重試包,整個實驗過程很是短,而SYN+ACK重試間隔初始爲1秒)

15:10:27.895666 IP xxxx.8888 > xxxx.pxc-sapxom: Flags [S.], seq 2552938291, ack 277155377, win 29200, options [mss 1460], length 0
15:10:27.895670 IP xxxx.8888 > xxxx.syncserverssl: Flags [S.], seq 132109634, ack 1320827335, win 29200, options [mss 1460], length 0
15:10:28.095641 IP xxxx.8888 > xxxx.md-cg-http: Flags [S.], seq 571952037, ack 1550190463, win 29200, options [mss 1460], length 0
15:10:28.095680 IP xxxx.8888 > xxxx.ncdloadbalance: Flags [S.], seq 3043329827, ack 1288412213, win 29200, options [mss 1460], length 0

  服務端一直應答SYN+ACK,說明這些鏈接請求並無丟棄,是生效的。(另外,在使用hping3模擬大量SYN請求的同時,能夠發起正常鏈接請求,驗證是否能夠正常創建鏈接)。

源碼分析

  上述實驗看到,在開啓tcp_syncookies以後,依然有SYN請求丟棄發生,可是服務端卻依然在反饋SYN+ACK。下面將從源碼角度分析。

  在接收到SYN請求時,服務端處理邏輯以下:

+tcp_v4_do_rcv
    +tcp_v4_hnd_req
    +tcp_rcv_state_process
        +tcp_v4_conn_request

  函數tcp_v4_conn_request處理客戶端鏈接請求,校驗SYN Queue邏輯以下:

if (inet_csk_reqsk_queue_is_full(sk)) {
    want_cookie = tcp_syn_flood_action(sk, skb, "TCP");
    if (!want_cookie)
        goto drop;
}

drop:
    NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENDROPS);
    //統計ListenDrops

  能夠看到,在SYN Queue隊列溢出時,根據want_cookie處理,若是配置tcp_syncookies=1,則want_cookie=true,同時繼續處理(SYN Queue長度限制失效);不然會執行drop邏輯丟棄SYN包。

//配置tcp_syncookies=1,且SYN Queue溢出,want_cookie=true

skb_synack = tcp_make_synack(sk, dst, req,
        fastopen_cookie_present(&valid_foc) ? &valid_foc : NULL);

err = ip_build_and_send_pkt(skb_synack, sk, ireq->loc_addr,
             ireq->rmt_addr, ireq->opt);
if (err || want_cookie)
    goto drop_and_free;

//添加socket信息到SYN Queue    
inet_csk_reqsk_queue_hash_add(sk, req, TCP_TIMEOUT_INIT);
    
drop_and_free:
    reqsk_free(req);
drop:
    NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENDROPS);
    return 0;

  能夠看到,在want_cookie時(tcp_syncookies=1),跳轉到drop_and_free處理(沒有添加加socket信息到SYN Queue);drop標籤同時累加ListenDrops。

  上文實驗netstat -s |grep -E 'LISTEN'統計的數據,是從/proc/net/netstat獲取,即對應ListenDrops。

  經過這兩段邏輯,咱們明白了tcp_syncookies的處理過程:1)SYN Queue沒有溢出時,與普通流程相同;2)SYN Queue溢出時,才真正開啓SYN Cookie功能,開啓後會丟棄全部SYN包,同時累加ListenDrops。

  補充:

  函數tcp_syn_flood_action還會作一些統計須要咱們關注下:

bool tcp_syn_flood_action(struct sock *sk,
             const struct sk_buff *skb,
             const char *proto)
{
    bool want_cookie = false;
    
    if (sysctl_tcp_syncookies) {
        want_cookie = true;
        NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPREQQFULLDOCOOKIES);
        //TCPReqQFullDoCookies,發送cookie
    } else
        NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPREQQFULLDROP);
        //TCPReqQFullDrop,SYN丟棄
    return want_cookie;
}

  上述實驗咱們再執行兩次:

  • 開啓tcp_syncookies:
# netstat -s | grep -E 'TCPReqQFullDrop|TCPReqQFullDoCookies'
    TCPReqQFullDoCookies: 1194
    TCPReqQFullDrop: 3265
    
455 packets tramitted

# netstat -s | grep -E 'TCPReqQFullDrop|TCPReqQFullDoCookies'
    TCPReqQFullDoCookies: 1633
    TCPReqQFullDrop: 3265
    
//drop新增0;cookie新增439=455-16
  • 關閉tcp_syncookies:
# netstat -s | grep -E 'TCPReqQFullDrop|TCPReqQFullDoCookies'
    TCPReqQFullDoCookies: 1633
    TCPReqQFullDrop: 3265
    
358 packets tramitted

# netstat -s | grep -E 'TCPReqQFullDrop|TCPReqQFullDoCookies'
    TCPReqQFullDoCookies: 1633
    TCPReqQFullDrop: 3607
    
//drop新增342=358-16;cookie新增0

Accept Queue

  當Accept Queue溢出時,服務端是怎麼處理呢?丟棄仍是回覆RST包?咱們一樣將從實驗驗證與源碼分析兩個角度講解。

Accept Queue溢出實驗

  • 服務端經過以下方式啓動監聽,此時Accept Queue最大長度爲2:
sock=socket(AF_INET, SOCK_STREAM)
sock.bind(('', 8888))
sock.listen(1)
  • 發送2個鏈接請求,ss查看Accept Queue統計狀況,Accept Queue已達到最大長度:
# ss -lnt
State       Recv-Q Send-Q   Local Address:Port  Peer Address:Port
LISTEN      2      1          *:8888             *:*
  • netstat查看初始TCP狀態統計數據:
# netstat -s |grep -E 'listen| resets sent| LISTEN | Cookies'
    5244 resets sent
    448 times the listen queue of a socket overflowed
    5896 SYNs to LISTEN sockets dropped
  • 發起新的鏈接請求,同時啓動tcpdump抓包:

  netstat再次查看初始TCP狀態統計數據:

# netstat -s |grep -E 'listen| resets sent| LISTEN | Cookies'
    5244 resets sent
    451 times the listen queue of a socket overflowed
    5899 SYNs to LISTEN sockets dropped

  能夠看到,RST包發送統計沒有增長;listenoverflow以及listendrop均有增長,且同時增長3。這裏留個疑問,爲何會增長3呢?不是隻發起一個請求嗎?

  ss查看新的鏈接狀態爲SYN-RECV:

ss -nat | grep -E 'State|8888'
State      Recv-Q Send-Q Local Address:Port      Peer Address:Port
LISTEN     2      1            *:8888            *:*                   
SYN-RECV   0      0      xxxx:8888        xxxx:35453

   查看tcpdump抓包數據:

16:52:33.942358 IP xxxx.35453 > xxxx.8888: Flags [S], seq 2051886524, win 29200, length 0
16:52:33.942588 IP xxxx.8888 > xxxx.35453: Flags [S.], seq 3268637378, ack 2051886525, win 28960, length 0
16:52:33.942916 IP xxxx.35453 > xxxx.8888: Flags [.], ack 3268637379, win 58, length 0
16:52:35.345579 IP xxxx.8888 > xxxx.35453: Flags [S.], seq 3268637378, ack 2051886525, win 28960, length 0
16:52:35.345953 IP xxxx.35453 > xxxx.8888: Flags [.], ack 3268637379, win 58, length 0
16:52:37.345598 IP xxxx.8888 > xxxx.35453: Flags [S.], seq 3268637378, ack 2051886525, win 28960, length 0
16:52:37.346078 IP xxxx.35453 > xxxx.8888: Flags [.], ack 3268637379, win 58, length 0

  能夠看到,再服務端接收到客戶端第三次ACK以後(參照ss結果,因爲Accept Queue溢出,丟棄了ACK包,鏈接狀態依然爲SYN-RECV);服務端超時後還發送了兩次SYN+ACK包,客戶端均應答ACK。

  經過tcpdump抓包結果能夠看到,因爲重試機制,服務端總共收到了三次客戶端的第三次握手ACK,而三次都因爲Accept Queue溢出丟棄,所以上面說的listenoverflow以及listendrop增長3。(至於爲什麼二者同時增長,待會源碼分析)。

  知識補充

  服務端的SYN+ACK重試次數,由內核參數tcp_synack_retries決定。

tcp_synack_retries (integer; default: 5; since Linux 2.2)
              The  maximum  number of times a SYN/ACK segment for a passive TCP connection will be retransmitted.  This number should not
              be higher than 255.

HTTP請求驗證

  上面實驗咱們只是發起了鏈接請求,HTTP請求時,服務端丟棄第三次ACK致使鏈接狀態爲SYN-RECV,可是此時客戶端狀態已經爲ESTABLISHED,當客戶端此時傳輸HTTP請求數據時,會致使RST嗎?

  其餘同上面的實驗,curl請求:

curl http://xxxx:8888/user/login
curl: (56) Recv failure: Connection timed out

  tcpdump抓包結果以下:

17:18:27.820307 IP xxxx.35515 > xxxx.8888: Flags [S], seq 3685002386, win 29200, length 0
17:18:27.820378 IP xxxx.8888 > xxxx.35515: Flags [S.], seq 1886008256, ack 3685002387, win 28960, length 0
17:18:27.820672 IP xxxx.35515 > xxxx.8888: Flags [.], ack 1886008257, win 58, length 0
//發起HTTP請求
17:18:27.820680 IP xxxx.35515 > xxxx.8888: Flags [P.], seq 3685002387:3685002477, ack 1886008257, win 58, length 90
//HTTP請求重試
17:18:28.020543 IP xxxx.35515 > xxxx.8888: Flags [P.], seq 3685002387:3685002477, ack 1886008257, win 58, length 90
//HTTP請求重試
17:18:28.220471 IP xxxx.35515 > xxxx.8888: Flags [P.], seq 3685002387:3685002477, ack 1886008257, win 58, length 90
//HTTP請求重試
17:18:28.621487 IP xxxx.35515 > xxxx.8888: Flags [P.], seq 3685002387:3685002477, ack 1886008257, win 58, length 90
//SYN+ACK重試
17:18:29.021763 IP xxxx.8888 > xxxx.35515: Flags [S.], seq 1886008256, ack 3685002387, win 28960, length 0
17:18:29.022193 IP xxxx.35515 > xxxx.8888: Flags [.], ack 1886008257, win 58, length 0
//HTTP請求重試
17:18:29.424432 IP xxxx.35515 > xxxx.8888: Flags [P.], seq 3685002387:3685002477, ack 1886008257, win 5
//SYN+ACK重試8, length 90
17:18:31.221631 IP xxxx.8888 > xxxx.35515: Flags [S.], seq 1886008256, ack 3685002387, win 28960, length 0
//客戶端超時,發起RST
17:18:31.221942 IP xxxx.35515 > xxxx.8888: Flags [R], seq 3685002387, win 0, length 0

  能夠看到服務端對於客戶端的HTTP請求數據,並無響應(直接丟棄);客戶端鏈接狀態爲ESTABLISHED,服務端爲SYN-RECV,客戶端一直在重試HTTP請求,服務端一直在重試SYN+ACK。最後,客戶端HTTP請求傳輸超時(TCP重傳失敗),客戶端發起RST包。TCP重傳失敗時,上層錯誤信息爲Connection timed out,與curl失敗報錯相對應。

  知識補充

  TCP數據傳輸重試次數由內核參數tcp_retries2決定。

tcp_retries2 (integer; default: 15; since Linux 2.2)
              The  maximum number of times a TCP packet is retransmitted in established state before giving up.  The default value is 15,
              which corresponds to a duration of approximately between 13 to 30 minutes, depending on the  retransmission  timeout.   The
              RFC 1122 specified minimum limit of 100 seconds is typically deemed too short.

tcp_abort_on_overflow

  其實,服務端Accept Queue溢出的行爲還受到內核參數tcp_abort_on_overflow決定。而咱們的系統配置tcp_abort_on_overflow=0。

tcp_abort_on_overflow (Boolean; default: disabled; since Linux 2.4)
              Enable resetting connections if the listening service is too slow and unable to keep up and accept them.  It means that  if
              overflow  occurred  due  to  a burst, the connection will recover.  Enable this option only if you are really sure that the
              listening daemon cannot be tuned to accept connections faster.  Enabling this option can harm the clients of your server.

  修改配置tcp_abort_on_overflow=1,重試上面實驗:

# sysctl -w net.ipv4.tcp_abort_on_overflow=1
net.ipv4.tcp_abort_on_overflow = 1

  客戶端curl請求當即報錯:

time curl http://10.90.101.6:8888/user/login
curl: (56) Recv failure: Connection reset by peer

real    0m0.005s

  tcpdump抓包狀況以下,服務端在接收到第三次握手ACK時,當即返回RST包:

17:35:02.063694 IP xxxx.35547 > xxxx.8888: Flags [S], seq 1965671248, win 29200, length 0
17:35:02.063804 IP xxxx.8888 > xxxx.35547: Flags [S.], seq 3965903705, ack 1965671249, win 28960, length 0
17:35:02.064200 IP xxxx.35547 > xxxx.8888: Flags [.], ack 3965903706, win 58, length 0
17:35:02.064228 IP xxxx.8888 > xxxx.35547: Flags [R], seq 3965903706, win 0, length 0

源碼分析

  在接收到第三次握手ACK時,服務端處理邏輯以下:

+tcp_v4_do_rcv
    +tcp_v4_hnd_req
        +tcp_check_req
            +tcp_v4_syn_recv_sock

  tcp_v4_syn_recv_sock函數判斷Accept Queue是否溢出:

if (sk_acceptq_is_full(sk))
    goto exit_overflow;

exit_overflow:
    NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENOVERFLOWS);
    //統計ListenOverflows
exit_nonewsk:
    dst_release(dst);
exit:
    NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENDROPS);
    //統計ListenDrops
    return NULL;

  能夠看到,在溢出時,同時修改ListenOverflows以及ListenDrops。(與上面實驗同時增長3相對應)。

  函數tcp_check_req根據tcp_v4_syn_recv_sock返回結果,以及tcp_abort_on_overflow,決定是否發送RST包:

child = inet_csk(sk)->icsk_af_ops->syn_recv_sock(sk, skb, req, NULL);
if (child == NULL)
    goto listen_overflow;
    
embryonic_reset:
    req->rsk_ops->send_reset(sk, skb);
    //實現函數爲:tcp_v4_send_reset
    
    NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_EMBRYONICRSTS);
    //統計EmbryonicRsts
    return NULL;

Accept Queue再驗證

  爲何有些人在實驗Accept Queue溢出時,哪怕配置的tcp_abort_on_overflow=0,依然客戶端會收到RST包,這是爲何呢?其實仍是與系統配置有關。

  另外,本文最開始提到向已經"消逝"的鏈接發送數據,一樣會致使RST。

  當tcp_synack_retries配置很是小時,因爲Accept Queue溢出,服務端的SYN-RECV狀態很快超時,鏈接被釋放;而客戶端的tcp_retries2配置的比較大時,客戶端還在一直重試發送HTTP請求,此時服務端便會返回RST包。

  修改tcp_retries2:

# sysctl -w net.ipv4.tcp_retries2=15
net.ipv4.tcp_retries2 = 15

  再次發起curl請求:

# time curl http://10.90.101.6:8888/user/login
curl: (56) Recv failure: Connection reset by peer

real    0m12.844s

  tcpdump抓包狀況以下:

17:58:33.522067 IP xxxx.35603 > xxxx.8888: Flags [S], seq 2997388295, win 29200, olength 0
17:58:33.522182 IP xxxx.8888 > xxxx.35603: Flags [S.], seq 2883911494, ack 2997388296, win 28960, length 0
17:58:33.522463 IP xxxx.35603 > xxxx.8888: Flags [.], ack 2883911495, win 58, length 0
//發起HTTP請求
17:58:33.522583 IP xxxx.35603 > xxxx.8888: Flags [P.], seq 2997388296:2997388386, ack 2883911495, win 58, length 90
//HTTP請求重試
17:58:33.723351 IP xxxx.35603 > xxxx.8888: Flags [P.], seq 2997388296:2997388386, ack 2883911495, win 58, length 90
//HTTP請求重試
17:58:33.924422 IP xxxx.35603 > xxxx.8888: Flags [P.], seq 2997388296:2997388386, ack 2883911495, win 58, length 90
//HTTP請求重試
17:58:34.327366 IP xxxx.35603 > xxxx.8888: Flags [P.], seq 2997388296:2997388386, ack 2883911495, win 58, length 90
//服務端重試SYNc+ACK
17:58:34.523613 IP xxxx.8888 > xxxx.35603: Flags [S.], seq 2883911494, ack 2997388296, win 28960, length 0
17:58:34.523916 IP xxxx.35603 > xxxx.8888: Flags [.], ack 2883911495, win 58, length 0
//HTTP請求重試
17:58:35.133451 IP xxxx.35603 > xxxx.8888: Flags [P.], seq 2997388296:2997388386, ack 2883911495, win 58, length 90
//服務端重試SYNc+ACK
17:58:36.723600 IP xxxx.8888 > xxxx.35603: Flags [S.], seq 2883911494, ack 2997388296, win 28960, length 0
17:58:36.723987 IP xxxx.35603 > xxxx.8888: Flags [.], ack 2883911495, win 58, length 0
//HTTP請求重試
17:58:36.743318 IP xxxx.35603 > xxxx.8888: Flags [P.], seq 2997388296:2997388386, ack 2883911495, win 58, length 90
//HTTP請求重試
17:58:39.967405 IP xxxx.35603 > xxxx.8888: Flags [P.], seq 2997388296:2997388386, ack 2883911495, win 58, length 90
//HTTP請求重試
17:58:46.423467 IP xxxx.35603 > xxxx.8888: Flags [P.], seq 2997388296:2997388386, ack 2883911495, win 58, length 90
//服務端返回RST
17:58:46.423716 IP xxxx.8888 > xxxx.35603: Flags [R], seq 2883911495, win 0, length 0

  能夠看到,在客戶端TCP屢次重試的過程當中,服務端的鏈接SYN-RECV已經超時釋放,致使服務端最終返回RST包。

總結

  仍是那句話:盡信書,不如無書。

  不少人的實驗現象,是與其系統以及內核參數息息相關。不能簡簡單單的認爲TCP隊列溢出就會致使RST或者不會RST。

  只是在本文的系統配置下,HTTP請求異常"connection reset by peer"(服務端RST)不是由TCP隊列溢出致使的。

  Golang爲了不"connection reset by peer"狀況,目前能夠經過短連接方式避免,或者異常時重試。而本文重點介紹了TCP SYN Queue以及Accept Queue,至於Golang長鏈接RST狀況還有待深究。

附錄

net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_sack = 0
net.ipv4.tcp_retrans_collapse = 1
net.ipv4.ip_default_ttl = 64
net.ipv4.ip_nonlocal_bind = 0
net.ipv4.tcp_syn_retries = 2
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_max_orphans = 262144
net.ipv4.tcp_max_tw_buckets = 1600000
net.ipv4.ip_dynaddr = 0
net.ipv4.tcp_keepalive_time = 300
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_intvl = 75
net.ipv4.tcp_retries1 = 3
net.ipv4.tcp_retries2 = 2
net.ipv4.tcp_fin_timeout = 5
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_tw_recycle = 0
net.ipv4.tcp_abort_on_overflow = 0
net.ipv4.tcp_stdurg = 0
net.ipv4.tcp_rfc1337 = 0
net.ipv4.tcp_max_syn_backlog = 81920
net.core.somaxconn = 65535

參考資料

相關文章
相關標籤/搜索