TCP和UDPsocket中SO_SNDBUF和SO_RCVBUF_轉

時間 2019-12-01

標籤 tcp udpsocket sndbuf rcvbuf 欄目系統網絡简体版

原文原文鏈接

1.Background

Winsock kernel buffer

To optimize performance at the application layer, Winsock copies data buffers from application send calls to a Winsock kernel buffer. Then, the stack uses its own heuristics (such as Nagle algorithm) to determine when to actually put the packet on the wire.
You can change the amount of Winsock kernel buffer allocated to the socket using the SO_SNDBUF option (it is 8K by default). If necessary, Winsock can buffer significantly more than the SO_SNDBUF buffer size.html

send completion in most cases

In most cases, the send completion in the application only indicates the data buffer in an application send call is copied to the Winsock kernel buffer and does not indicate that the data has hit the network medium.
The only exception is when you disable the Winsock buffering by setting SO_SNDBUF to 0.編程

rules to indicate a send completion

Winsock uses the following rules to indicate a send completion to the application (depending on how the send is invoked, the completion notification could be the function returning from a blocking call, signaling an event or calling a notification function, and so forth):windows

If the socket is still within SO_SNDBUF quota, Winsock copies the data from the application send and indicates the send completion to the application.
If the socket is beyond SO_SNDBUF quota and there is only one previously buffered send still in the stack kernel buffer, Winsock copies the data from the application send and indicates the send completion to the application.
If the socket is beyond SO_SNDBUF quota and there is more than one previously buffered send in the stack kernel buffer, Winsock copies the data from the application send. Winsock does not indicate the send completion to the application until the stack completes enough sends to put the socket back within SO_SNDBUF quota or only one outstanding send condition.

https://support.microsoft.com/en-us/kb/214397緩存

2.SO_SNDBUF & SO_RCVBUF

2.1基本說明

SO_SNDBUF
Sets send buffer size. This option takes an int value. (it is 8K by default).
SO_RCVBUF
Sets receive buffer size. This option takes an int value.服務器

Note: SO stands for Socket Option網絡

每一個套接口都有一個發送緩衝區和一個接收緩衝區，使用SO_SNDBUF & SO_RCVBUF能夠改變缺省緩衝區大小。app

對於客戶端，SO_RCVBUF選項須在connect以前設置.
對於服務器，SO_RCVBUF選項須在listen前設置.less

2.2 Using in C/C++

int setsockopt(SOCKET s,int level,int optname,const char* optval,int optlen);socket

SOCKET socket = ...
int nRcvBufferLen = 64*1024; int nSndBufferLen = 4*1024*1024; int nLen = sizeof(int); setsockopt(socket, SOL_SOCKET, SO_SNDBUF, (char*)&nSndBufferLen, nLen); setsockopt(socket, SOL_SOCKET, SO_RCVBUF, (char*)&nRcvBufferLen, nLen);

TCP的可靠性

TCP的突出特色是可靠性比較好，主要是怎麼實現的呢?
可靠性好不意味着不出錯，可靠性好意味着容錯能力強。
容錯能力強就要求有備份，也就是說要有緩存，這樣的話才能支持重傳等功能。
每一個Socket都有本身的Send Buffer和Receive Buffer。
當進行send和recv操做時，當即返回，實際上是將數據並無發送出去，而是存放在對應的Send Buffer和Receive Buffer立刻返回成功。ide

文獻上send buffer的一點說明

udp send buffer

「we show the socket send buffer as a dashed box because it doesn't really exist.
A UDP socket has a send buffer size (which we can change with the SO_SNDBUF socket option, Section 7.5), but this is simply an upper limit on the maximum-sized UDP datagram that can be written to the socket.
If an application writes a datagram larger than the socket send buffer size, EMSGSIZE is returned.
Since UDP is unreliable, it does not need to keep a copy of the application's data and does not need an actual send buffer.
(The application data is normally copied into a kernel buffer
of some form as it passes down the protocol stack, but this copy is discarded by the datalink layer after the data is transmitted.)」
（UNIX® Network Programming Volume 1, Third Edition: The Sockets Networking API，Pub Date: November 21, 2003）

根據以上《UNIX 網絡編程第一卷》（此版本是2003年出版的，可是未查詢到其它有效的文獻）中的描述，針對UDP而言，利用SO_SNDBUF設置的值，是可寫到該socket的UDP報文的最大值；若是當前程序接收到的報文大於send buffer size，會返回EMSGSIZE。

做用和意義

接收緩衝區

如何使用接收緩衝區

接收緩衝區把數據緩存入內核，應用進程一直沒有調用read進行讀取的話，此數據會一直緩存在相應socket的接收緩衝區內。
再囉嗦一點，無論進程是否讀取socket，對端發來的數據都會經由內核接收而且緩存到socket的內核接收緩衝區之中。
read所作的工做，就是把內核緩衝區中的數據拷貝到應用層用戶的buffer裏面，僅此而已。

接收緩衝區buffer滿以後的處理策略

接收緩衝區被TCP和UDP用來緩存網絡上來的數據，一直保存到應用進程讀走爲止。

TCP
對於TCP，若是應用進程一直沒有讀取，buffer滿了以後，發生的動做是：通知對端TCP協議中的窗口關閉。這個即是滑動窗口的實現。
保證TCP套接口接收緩衝區不會溢出，從而保證了TCP是可靠傳輸。由於對方不容許發出超過所通告窗口大小的數據。這就是TCP的流量控制，若是對方無視窗口大小而發出了超過窗口大小的數據，則接收方TCP將丟棄它。

UDP
當套接口接收緩衝區滿時，新來的數據報沒法進入接收緩衝區，此數據報就被丟棄。UDP是沒有流量控制的；快的發送者能夠很容易地就淹沒慢的接收者，致使接收方的UDP丟棄數據報。

發送緩衝區

如何使用發送緩衝區

進程調用send發送的數據的時候，最簡單狀況（也是通常狀況），將數據拷貝進入socket的內核發送緩衝區之中，而後send便會在上層返回。
換句話說，send返回之時，數據不必定會發送到對端去（和write寫文件有點相似），send僅僅是把應用層buffer的數據拷貝進socket的內核發送buffer中。
每一個UDP socket都有一個接收緩衝區，沒有發送緩衝區，從概念上來講就是隻要有數據就發，無論對方是否能夠正確接收，因此不緩衝，不須要發送緩衝區。

SO_SNDBUF的大小

爲了達到最大網絡吞吐，socket send buffer size(SO_SNDBUF)不該該小於帶寬和延遲的乘積。
以前我遇到2個性能問題，都和SO_SNDBUF設置得過小有關。
可是，寫程序的時候可能並不知道把SO_SNDBUF設多大合適，並且SO_SNDBUF也不宜設得太大，浪費內存啊(是麼？？)。

操做系統動態調整SO_SNDBUF

因而，有OS提供了動態調整緩衝大小的功能，這樣應用程序就不用再對SO_SNDBUF調優了。(接受緩衝SO_RCVBUF也是相似的問題，不該該小於帶寬和延遲的乘積)。

Dynamic send buffering for TCP was added on Windows 7 and Windows Server 2008 R2. By default, dynamic send buffering for TCP is enabled unless an application sets the SO_SNDBUF socket option on the stream socket.

較新的OS都支持socket buffer的自動調整，不須要應用程序去調優。但對Windows 2012(和Win8)之前的Windows，爲了達到最大網絡吞吐，仍是要應用程序操心一下SO_SNDBUF的設置。

另外，

須要注意的是，若是應用設置了SO_SNDBUF，Dynamic send buffering會失效。https://msdn.microsoft.com/enus/library/windows/desktop/bb736549(v=vs.85).aspx

將SO_RCVBUF SO_SNDBUF設置爲0 沒什麼好處

Let’s look at how the system handles a typical send call when the send buffer size is non-zero.
When an application makes a send call, if there is sufficient buffer space, the data is copied into the socket’s send buffers, the call completes immediately with success, and the completion is posted.
On the other hand, if the socket’s send buffer is full, then the application’s send buffer is locked and the send call fails with WSA_IO_PENDING. After the data in the send buffer is processed (for example, handed down to TCP for processing), then Winsock will process the locked buffer directly. That is, the data is handed directly to TCP from the application’s buffer and the socket’s send buffer is completely by passed.

能夠看出發送數據時，若是socket的send buffer(內核層)已滿，這時候應用程序的send buffer(應用層)會被鎖定，send 調用返回WSA_IO_PENDING。
當send buffer中的數據已經處理完，Winsock會直接處理鎖定的send buffer(應用層)。也就是說，程序跳過socket的send buffer，直接處理程序的buffer(應用層)。

The opposite is true for receiving data. When an overlapped receive call is performed, if data has already been received on the connection, it will be buffered in the socket’s receive buffer. This data will be copied directly into the application’s buffer (as much as will fit), the receive call returns success, and a completion is posted. However, if the socket’s receive buffer is empty, when the overlapped receive call is made, the application’s buffer is locked and the call fails with WSA_IO_PENDING. Once data arrives on the connection, it will be copied directly into the application’s buffer, bypassing the socket’s receive buffer altogether.

接收緩衝區的處理也是如此。

Setting the per-socket buffers to zero generally will not increase performance because the extra memory copy can be avoided as long as there are always enough overlapped send and receive operations posted. Disabling the socket’s send buffer has less of a performance impact than disabling the receive buffer because the application’s send buffer will always be locked until it can be passed down to TCP for processing. However, if the receive buffer is set to zero and there are no outstanding overlapped receive calls, any incoming data can be buffered only at the TCP level. The TCP driver will buffer only up to the receive window size, which is 17 KB—TCP will increase these buffers as needed to this limit; normally the buffers are much smaller.
These TCP buffers (one per connection) are allocated out of non-paged pool, which means if the server has 1000 connections and no receives posted at all, 17 MB of the non- paged pool will be consumed!
The non-paged pool is a limited resource, and unless the server can guarantee there are always receives posted for a connection, the per-socket receive buffer should be left intact.
Only in a few specific cases will leaving the receive buffer intact lead to decreased performance. Consider the situation in which a server handles many thousands of connections and cannot have a receive posted on each connection (this can become very expensive, as you’ll see in the next section). In addition, the clients send data sporadically. Incoming data will be buffered in the per-socket receive buffer and when the server does issue an overlapped receive, it is performing unnecessary work. The overlapped operation issues an I/O request packet (IRP) that completes, immediately after which notification is sent to the completion port. In this case, the server cannot keep enough receives posted, so it is better off performing simple non-blocking receive calls.

References:
http://pubs.opengroup.org/onlinepubs/009695399/functions/setsockopt.html
UNIX® Network Programming Volume 1, Third Edition: The Sockets Networking API，Pub Date: November 21, 2003
http://blog.csdn.net/xiaokaige198747/article/details/75388458
http://www.cnblogs.com/kex1n/p/7801343.html
http://blog.csdn.net/summerhust/article/details/6726337

做者：FlyingPenguin
連接：https://www.jianshu.com/p/755da54807cd
來源：簡書
簡書著做權歸做者全部，任何形式的轉載都請聯繫做者得到受權並註明出處。

緩衝區引出的問題：

1. https://blog.csdn.net/skdkjzz/article/details/17539979