gen_tcp參數總結

動機

在用elixir 寫 rpc server/client時, 須要對傳入gen_tcp的參數作一些考量. 如, 部分參數應該容許用戶修改, 好比sndbuf recbuf, 讓用戶根據使用場景調節, 部分參數應該屏蔽, 減小使用理解成本.
故, 深挖了一下gen_tcp的optionhtml

代碼版本

文章中貼的文件和行號來源於以下代碼版本node

  • erlang: OTP-21.0.9

options

Available options for tcp:connect

inet.erl:723linux

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Available options for tcp:connect
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
connect_options() ->
    [tos, tclass, priority, reuseaddr, keepalive, linger, sndbuf, recbuf, nodelay,
     header, active, packet, packet_size, buffer, mode, deliver, line_delimiter,
     exit_on_close, high_watermark, low_watermark, high_msgq_watermark,
     low_msgq_watermark, send_timeout, send_timeout_close, delay_send, raw,
     show_econnreset, bind_to_device].

tos

type of service
下圖來自tcp ip詳解 卷1
clipboard.pnggit

tclass

IPV6_TCLASS
{tclass, Integer}
Sets IPV6_TCLASS IP level options on platforms where this is implemented.
The behavior and allowed range varies between different systems.
The option is ignored on platforms where it is not implemented. Use with caution.
不知道具體含義, 忽略github

priority

SO_PRIORITY
          Set the protocol-defined priority for all packets to be sent
          on this socket.  Linux uses this value to order the networking
          queues: packets with a higher priority may be processed first
          depending on the selected device queueing discipline.  Setting
          a priority outside the range 0 to 6 requires the CAP_NET_ADMIN
          capability.

reuseaddr

SO_REUSEPORT (since Linux 3.9)
          Permits multiple AF_INET or AF_INET6 sockets to be bound to an
          identical socket address.  This option must be set on each
          socket (including the first socket) prior to calling bind(2)
          on the socket.  To prevent port hijacking, all of the pro‐
          cesses binding to the same address must have the same effec‐
          tive UID.  This option can be employed with both TCP and UDP
          sockets.

          For TCP sockets, this option allows accept(2) load distribu‐
          tion in a multi-threaded server to be improved by using a dis‐
          tinct listener socket for each thread.  This provides improved
          load distribution as compared to traditional techniques such
          using a single accept(2)ing thread that distributes connec‐
          tions, or having multiple threads that compete to accept(2)
          from the same socket.

          For UDP sockets, the use of this option can provide better
          distribution of incoming datagrams to multiple processes (or
          threads) as compared to the traditional technique of having
          multiple processes compete to receive datagrams on the same
          socket.

keepalive

SO_KEEPALIVE
          Enable sending of keep-alive messages on connection-oriented
          sockets.  Expects an integer boolean flag.

keepalive的可選參數和含義

root@1ba6f31f7bc3:/# cat /proc/sys/net/ipv4/tcp_keepalive_time
1800
the interval between the last data packet sent (simple ACKs are not considered data) and the first keepalive probe; after the connection is marked to need keepalive, this counter is not used any furthe
root@1ba6f31f7bc3:/# cat /proc/sys/net/ipv4/tcp_keepalive_intvl
75
the interval between subsequential keepalive probes, regardless of what the connection has exchanged in the meantime
root@1ba6f31f7bc3:/# cat /proc/sys/net/ipv4/tcp_keepalive_probes
9
the number of unacknowledged probes to send before considering the connection dead and notifying the application layer

主要問題:

  1. 沒有穿透負載均衡器.
  2. 檢測得太慢.
  3. 不知道應用狀態.

linger

SO_LINGER
          Sets or gets the SO_LINGER option.  The argument is a linger
          structure.

              struct linger {
                  int l_onoff;    /* linger active */
                  int l_linger;   /* how many seconds to linger for */
              };

          When enabled, a close(2) or shutdown(2) will not return until
          all queued messages for the socket have been successfully sent
          or the linger timeout has been reached.  Otherwise, the call
          returns immediately and the closing is done in the background.
          When the socket is closed as part of exit(2), it always
          lingers in the background.

close/shutdown前是否等待全部包都送達.緩存

sndbuf recbuf buffer

SO_SNDBUF
          Sets or gets the maximum socket send buffer in bytes.  The
          kernel doubles this value (to allow space for bookkeeping
          overhead) when it is set using setsockopt(2), and this doubled
          value is returned by getsockopt(2).  The default value is set
          by the /proc/sys/net/core/wmem_default file and the maximum
          allowed value is set by the /proc/sys/net/core/wmem_max file.
          The minimum (doubled) value for this option is 2048.
   SO_RCVBUF
          Sets or gets the maximum socket receive buffer in bytes.  The
          kernel doubles this value (to allow space for bookkeeping
          overhead) when it is set using setsockopt(2), and this doubled
          value is returned by getsockopt(2).  The default value is set
          by the /proc/sys/net/core/rmem_default file, and the maximum
          allowed value is set by the /proc/sys/net/core/rmem_max file.
          The minimum (doubled) value for this option is 256.

inet_drv.c:6708app

case INET_OPT_SNDBUF:
    {
        arg.ival= get_int32 (curr);      curr += 4;
        proto   = SOL_SOCKET;
        type    = SO_SNDBUF;
        arg_ptr = (char*) (&arg.ival);
        arg_sz  = sizeof  ( arg.ival);

        /* Adjust the size of the user-level recv buffer, so it's not
           smaller than the kernel one: */
        if (desc->bufsz <= arg.ival)
        desc->bufsz  = arg.ival;
        break;
    }

能夠看到, buffer是用戶的緩存, 必定不小於內核buffer, 然而得到的buffer小於 recbuf, sdnbuf.
懷疑: 設置了recvbuf, sndbuf纔會改變buffer.負載均衡

nodelay

TCP_NODELAYless

DISCUSSION:
             The Nagle algorithm is generally as follows:

                  If there is unacknowledged data (i.e., SND.NXT >
                  SND.UNA), then the sending TCP buffers all user
                  data (regardless of the PSH bit), until the
                  outstanding data has been acknowledged or until
                  the TCP can send a full-sized segment (Eff.snd.MSS
                  bytes; see Section 4.2.2.6).

             Some applications (e.g., real-time display window
             updates) require that the Nagle algorithm be turned
             off, so small data segments can be streamed out at the
             maximum rate.

能夠看到和延遲確認一塊兒使用時會帶來很大的延時.異步

header

http://erlang.org/doc/man/ine...
定長header, 處理定長header時能夠一用.

active

用被動模式, 異步收發.

packet, raw

包頭長度. 即用多少字節表示包長. raw 等同於 {packet, 0}

packet_size

包最大長度. 最大容許的包長.

mode

{mode, Mode :: binary | list}
Received Packet is delivered as defined by Mode.

deliver

{deliver, port | term}
When {active, true}, data is delivered on the form port : {S, {data, [H1,..Hsz | Data]}} or term : {tcp, S, [H1..Hsz | Data]}.

line_delimiter

{line_delimiter, Char}(TCP/IP sockets)
Sets the line delimiting character for line-oriented protocols (line). Defaults to $n.

exit_on_close

{exit_on_close, Boolean}
This option is set to true by default.
The only reason to set it to false is if you want to continue sending data to the socket after a close is detected, for example, if the peer uses gen_tcp:shutdown/2 to shut down the write side.

high_watermark, low_watermark, high_msgq_watermark,

low_msgq_watermark

影響socket busy state的切換.
須要搞清楚幾個問題:
socket busy state是什麼, 譬如調用發送/接收有什麼返回?
msgq data size 和 socket data size, socket data size 是否就是buffer?

send_timeout

發送超時時間, 默認無限等待

send_timeout_close

發送超時是否自動關閉.

delay_send

應用層幷包. 默認關閉. 能夠考慮開啓.

show_econnreset

是否把RST當正常關閉.

bind_to_device

使用指定的設備(網卡)

參考資料

  1. http://erlang.org/doc/man/gen...
  2. http://man7.org/linux/man-pag...
  3. http://erlang.org/doc/man/ine...
  4. https://github.com/erlang/otp
  5. https://tools.ietf.org/html/r...
  6. https://www.ietf.org/rfc/rfc3...
  7. https://tools.ietf.org/html/r...
相關文章
相關標籤/搜索