套接字內核參數

套接字內核參數

內核參數設置

以修改somaxconn舉例:html

1.暫時性修改(系統重啓後保存不了)linux

step 1git

echo 2048 >   /proc/sys/net/core/somaxconn

step 2github

sysctl -p

2.永久性修改,在/etc/sysctl.conf中添加以下編程

step 1windows

net.core.somaxconn = 2048

step 2數組

sysctl -p

內核套接字參數

如下文件的所在目錄爲/proc/sys/net/ipv4 或 /proc/sys/net/core/ (Centos Linux release 7.2.1511)緩存

tcp_retries1

[TCP/IP詳解 卷一(中文 第二版) P464]
reference
重傳超過閾值tcp_retries1,主要的動做就是更新路由緩存cookie

tcp_retries2

[TCP/IP詳解 卷一(中文 第二版) P464]網絡

tcp_syn_retries & tcp_synack_retries

[TCP/IP詳解 卷一(中文 第二版) P464]
For SYN segments, net.ipv4.tcp_syn_retries and net.ipv4.tcp_synack_retries bounds the number of retransmissions of SYN segments; their default value is 5 (roughly 180s).

tcp_fin_timeout

[TCP/IP詳解 卷一(中文 第二版) P446]
和FIN_WAIT_2有關

tcp_abort_on_overflow

[TCP/IP詳解 卷一(中文 第二版) P455]
If there is not enough room on the queue for the new connection, the TCP delays responding to the SYN, to give the application a chance to catch up. Linux is somewhat unique in this behavior—it persists in not ignoring incoming connections if it possibly can. If the net.ipv4.tcp_abort_on_overflow system control variable is set, new incoming connections are reset with a reset segment.

tcp_max_syn_backlog

[TCP/IP詳解 卷一(中文 第二版) P458]
When a connection request arrives (i.e.,the SYN segment), the system-wide parameter tcp_max_syn_backlog is checked (default 1000). If the number of connections in the SYN_RCVD state would exceed this threshold, the incoming connection is rejected.

tcp_timestamps

TCP Timestamps Option (TSopt):
結構:

+-------+-------+---------------------+---------------------+
  |Kind=8 |  10   |   TS Value (TSval)  |TS Echo Reply (TSecr)|
  +-------+-------+---------------------+---------------------+
      1       1              4                     4

 The Timestamps option carries two four-byte timestamp fields.
 The Timestamp Value field (TSval) contains the current value of
 the timestamp clock of the TCP sending the option.

 The Timestamp Echo Reply field (TSecr) is only valid if the ACK
 bit is set in the TCP header; if it is valid, it echos a times-
 tamp value that was sent by the remote TCP in the TSval field
 of a Timestamps option.  When TSecr is not valid, its value
 must be zero.  The TSecr value will generally be from the most
 recent Timestamp option that was received; however, there are
 exceptions that are explained below.

默認開啓, 做用:1.更加精準的測量RTT; 2.防迴繞序列號(PAWS)

reference

tcp_tw_reuse && tcp_tw_recyle

reference(一篇極好的文章)

tcp_tw_reuse
By enabling net.ipv4.tcp_tw_reuse, Linux will reuse an existing connection in the TIME-WAIT state for a new outgoing connection if the new timestamp is strictly bigger than the most recent timestamp recorded for the previous connection: an outgoing connection in the TIME-WAIT state can be reused after just one second.

Q : 重用(reuse)什麼
A : connection, 內核中的相關套接字數據結構
Q : 誰重用這些數據結構
A : 處於TIME_WAIT狀態的一方,再一次發起相同鏈接(TCP套接字四元組一致)的時候,進行重用。
Q : 具體流程以及爲何依賴tcp_timestamps
A : 見以下分析
Once a new connection replaces the TIME-WAIT entry [time 1], the SYN segment of the new connection is ignored (thanks to the timestamps) [time 2] and won’t be answered by a RST [time 3] but only by a retransmission of the FIN and ACK segment [time 3]. The FIN segment will then be answered with a RST (because the local connection is in the SYN-SENT state)[time 4] which will allow the transition out of the LAST-ACK state. The initial SYN segment will eventually be resent (after one second) because there was no answer and the connection will be established without apparent error, except a slight delay:

clipboard.png

tcp_tw_recyle
建議不要打開該選項
Starting from Linux 4.10 (commit 95a22caee396), Linux will randomize timestamp offsets for each connection, making this option completely broken, with or without NAT.

須要瞭解內核套接字的數據結構:TODO

net.ipv4.tcp_syncookies

[TCP/IP詳解 卷一(中文 第二版) P455]
當net.ipv4.tcp_syncookies = 1, 表示開啓SYN Cookies。 當出現SYN等待隊列溢出時,啓用cookies來處理,可防範SYN攻擊,默認爲0,表示關閉。

tcp_dsack

[TCP/IP詳解 卷一(中文 第二版) P482]

tcp_sack

默認開啓
[TCP/IP詳解 卷一(中文 第二版) P478]

somaxconn

[TCP/IP詳解 卷一(中文 第二版) P455]
Each listening endpoint has a fixed-length queue of connections that have been completely accepted by TCP (i.e., the three-way handshake is complete) but not yet accepted by the application. The application specifies a limit to this queue, commonly called the backlog. This backlog must be between 0 and a system-specific maximum called net.core.somaxconn, inclusive (default 128).

netdev_max_backlog

TODO

rmem_max && wmem_max && rmem_default && wmem_default

reference

net.core.rmem_default = 262144  // 單個鏈接的讀緩存(其實,讀緩存仍是動態變化的,這是一個上限)
net.core.rmem_max = 16777216  // 當調用setsockopt設置最大讀緩存時,不能超過rmem_max
net.core.wmem_default = 262144  
net.core.wmem_max = 16777216

tcp_moderate_rcvbuf && tcp_rmem && tcp_wmem && tcp_mem

reference

設置好最大緩存限制後就高枕無憂了嗎?對於一個TCP鏈接來講,可能已經充分利用網絡資源,使用大窗口、大緩存來保持高速傳輸了。好比在長肥網絡中,緩存上限可能會被設置爲幾十兆字節,但系統的總內存倒是有限的,當每個鏈接都全速飛奔使用到最大窗口時,1萬個鏈接就會佔用內存到幾百G了,這就限制了高併發場景的使用,公平性也得不到保證。咱們但願的場景是,在併發鏈接比較少時,把緩存限制放大一些,讓每個TCP鏈接開足馬力工做;當併發鏈接不少時,此時系統內存資源不足,那麼就把緩存限制縮小一些,使每個TCP鏈接的緩存儘可能的小一些,以容納更多的鏈接。

linux爲了實現這種場景,引入了自動調整內存分配的功能,由tcp_moderate_rcvbuf配置決定,以下:
net.ipv4.tcp_moderate_rcvbuf = 1
默認tcp_moderate_rcvbuf配置爲1,表示打開了TCP內存自動調整功能。若配置爲0,這個功能將不會生效(慎用)。
當咱們在編程中對鏈接設置了SO_SNDBUF、SO_RCVBUF,將會使linux內核再也不對這樣的鏈接執行自動調整功能!

net.ipv4.tcp_rmem = 8192 87380 16777216  
net.ipv4.tcp_wmem = 8192 65536 16777216  
net.ipv4.tcp_mem = 8388608 12582912 16777216

tcp_rmem[3]數組表示任何一個TCP鏈接上的讀緩存上限,其中tcp_rmem[0]表示最小上限(好比,使用調用setsockopt設置最大讀緩存時,若其值小於8192,那麼最大讀緩存會被設置爲8192),tcp_rmem[1]表示初始上限(注意,它會覆蓋適用於全部協議的rmem_default配置),tcp_rmem[2]表示最大上限。
tcp_wmem[3]數組表示寫緩存,與tcp_rmem[3]相似,再也不贅述。

tcp_mem[3]數組就用來設定TCP內存的總體使用情況,因此它的值很大(它的單位也不是字節,而是--4K或者8K等這樣的單位!)。這3個值定義了TCP總體內存的無壓力值、壓力模式開啓閥值、最大使用值。以這3個值爲標記點則內存共有4種狀況:

一、只要系統TCP的整體內存超了 tcp_mem[2] ,新內存分配都會失敗。
二、tcp_rmem[0]或者tcp_wmem[0]優先級也很高,只要條件1不超限,那麼只要鏈接內存小於這兩個值,就保證新內存分配必定成功。
三、只要整體內存不超過tcp_mem[0],那麼新內存在不超過鏈接緩存的上限時也能保證分配成功。
四、tcp_mem[1]與tcp_mem[0]構成了開啓、關閉內存壓力模式的開關。在壓力模式下,鏈接緩存上限可能會減小。在非壓力模式下,鏈接緩存上限可能會增長,最多增長到tcp_rmem[2]或者tcp_wmem[2]。

TODO

tcp_adv_win_scale
tcp_allowed_congestion_control
tcp_app_win
tcp_autocorking
tcp_available_congestion_control
tcp_base_mss
tcp_challenge_ack_limit
tcp_congestion_control

tcp_early_retrans
tcp_ecn
tcp_fack
tcp_fastopen
tcp_fastopen_key

tcp_frto
tcp_invalid_ratelimit
tcp_keepalive_intvl
tcp_keepalive_probes
tcp_keepalive_time
tcp_limit_output_bytes
tcp_low_latency
tcp_max_orphans
tcp_max_ssthresh

tcp_max_tw_buckets
tcp_mem
tcp_min_tso_segs
tcp_moderate_rcvbuf
tcp_mtu_probing
tcp_no_metrics_save
tcp_notsent_lowat
tcp_orphan_retries
tcp_reordering
tcp_retrans_collapse
tcp_rfc1337
tcp_rmem

tcp_slow_start_after_idle
tcp_stdurg
tcp_thin_dupack
tcp_thin_linear_timeouts

tcp_tso_win_divisortcp_tw_recycletcp_window_scalingtcp_wmemtcp_workaround_signed_windowsudp_memudp_rmem_minudp_wmem_minxfrm4_gc_thresh

相關文章
相關標籤/搜索