TCP最大報文段MSS源碼分析

概述

本文主要對MSS相關的幾個字段結合源碼流程進行分析;windows

字段含義

user_mss(tcp_options_received)–用戶配置的mss,優先級最高;緩存

mss_clamp(tcp_options_received)–對端通告的mss,即爲對端能接受的最大mss,對端通告的mss與user_mss中的較小值;服務器

advmss(tcp_sock)–用於通告對端的mss值,本端能接受的最大mss;cookie

mss_cache(tcp_sock)–緩存發送方當前有效的mss值,根據pmtu變化,不會超過mss_clamp;網絡

rcv_mss(inet_connection_sock)–由最近接收到的段估算的對端mss,主要用來肯定是否執行延遲確認;tcp

user_mss配置

user_mss是用戶配置的MSS,該MSS優先級最高,若是配置了該MSS,則MSS均不能超過該值;下面爲調用setsockopt設置user_mss的代碼,其操做字段爲TCP_MAXSEG;配置範圍不能小於最小MSS,不能大於最大窗口值;ide

 1 static int do_tcp_setsockopt(struct sock *sk, int level,
 2         int optname, char __user *optval, unsigned int optlen)
 3 {
 4     switch (optname) {
 5     case TCP_MAXSEG:
 6         /* Values greater than interface MTU won't take effect. However
 7          * at the point when this call is done we typically don't yet
 8          * know which interface is going to be used */
 9         if (val && (val < TCP_MIN_MSS || val > MAX_TCP_WINDOW)) {
10             err = -EINVAL;
11             break;
12         }
13         tp->rx_opt.user_mss = val;
14         break;
15 }

 

交互流程代碼分析

第一次握手
客戶端發送syn

在進行connect操做的初始化中對mss的設置以下:函數

(1) 若是有用戶配置的user_mss,則將mss_clamp(本端最大mss)設置爲user_mss;ui

(2) 調用tcp_sync_mss來同步mss,其主要是根據設備mtu,最大窗口等計算出當前有效的mss,並將該mss記錄到tp->mss_cache中;因該函數涉及篇幅較大,在本文最後進行分析;this

(3) 設置用於通告給對端的advmss,去路由表中查MSS,這裏會用到pmtu,而後將這個值與user_mss比較,取較小的值設置爲向對端通告的值;

(4) 估算對端的mss,根據advmss,mss_cache,rcv_wnd,MSS_DEFAULT,MIN_MSS估算rcv_mss;

 1 static void tcp_connect_init(struct sock *sk)
 2 {
 3     /* If user gave his TCP_MAXSEG, record it to clamp */
 4     /* (1)若是配置了user_mss,則設置最大mss爲user_mss */
 5     if (tp->rx_opt.user_mss)
 6         tp->rx_opt.mss_clamp = tp->rx_opt.user_mss;
 7     tp->max_window = 0;
 8     tcp_mtup_init(sk);
 9     /* (2)根據設備mtu同步mss */
10     tcp_sync_mss(sk, dst_mtu(dst));
11 
12     tcp_ca_dst_init(sk, dst);
13 
14     if (!tp->window_clamp)
15         tp->window_clamp = dst_metric(dst, RTAX_WINDOW);
16 
17     /* 
18         (3)設置向對端通告的mss
19         dst_metric_advmss-去路由表中查詢mss 
20         tcp_mss_clamp-取user_mss和上述查詢到的mss之間的較小值
21     */
22     tp->advmss = tcp_mss_clamp(tp, dst_metric_advmss(dst));
23 
24     /* (4)估算對端mss */
25     tcp_initialize_rcv_mss(sk);
26 }

 

在發送syn流程中,會將advmss添加到tcp首部的選項中;調用關係爲tcp_transmit_skb->tcp_syn_options->tcp_advertise_mss;可見這裏不是直接使用前面的adv_mss,而是調用tcp_advertise_mss從新獲取的;

 1 /* Compute TCP options for SYN packets. This is not the final
 2  * network wire format yet.
 3  */
 4 static unsigned int tcp_syn_options(struct sock *sk, struct sk_buff *skb,
 5                 struct tcp_out_options *opts,
 6                 struct tcp_md5sig_key **md5)
 7 {
 8     /* We always get an MSS option.  The option bytes which will be seen in
 9      * normal data packets should timestamps be used, must be in the MSS
10      * advertised.  But we subtract them from tp->mss_cache so that
11      * calculations in tcp_sendmsg are simpler etc.  So account for this
12      * fact here if necessary.  If we don't do this correctly, as a
13      * receiver we won't recognize data packets as being full sized when we
14      * should, and thus we won't abide by the delayed ACK rules correctly.
15      * SACKs don't matter, we never delay an ACK when we have any of those
16      * going out.  */
17     opts->mss = tcp_advertise_mss(sk);
18     remaining -= TCPOLEN_MSS_ALIGNED;
19 }

 

tcp_advertise_mss從新取查路由表獲取mss,而且與前面獲取的mss取較小值;

 1 /* Calculate mss to advertise in SYN segment.
 2  * RFC1122, RFC1063, draft-ietf-tcpimpl-pmtud-01 state that:
 3  *
 4  * 1. It is independent of path mtu.
 5  * 2. Ideally, it is maximal possible segment size i.e. 65535-40.
 6  * 3. For IPv4 it is reasonable to calculate it from maximal MTU of
 7  *    attached devices, because some buggy hosts are confused by
 8  *    large MSS.
 9  * 4. We do not make 3, we advertise MSS, calculated from first
10  *    hop device mtu, but allow to raise it to ip_rt_min_advmss.
11  *    This may be overridden via information stored in routing table.
12  * 5. Value 65535 for MSS is valid in IPv6 and means "as large as possible,
13  *    probably even Jumbo".
14  */
15 static __u16 tcp_advertise_mss(struct sock *sk)
16 {
17     struct tcp_sock *tp = tcp_sk(sk);
18     const struct dst_entry *dst = __sk_dst_get(sk);
19     int mss = tp->advmss;
20 
21     if (dst) {
22         unsigned int metric = dst_metric_advmss(dst);
23 
24         if (metric < mss) {
25             mss = metric;
26             tp->advmss = mss;
27         }
28     }
29 
30     return (__u16)mss;
31 }

 

服務器接收syn

服務器當前處於LISTEN狀態,收到客戶端發來的syn包,在處理過程當中,須要解析tcp首部的選項,調用關係爲tcp_conn_request->tcp_parse_options,其中解析選項的MSS部分以下,解析mss選項,與user_mss進行對比取較小值,而後將mss_clamp(最大mss)設置爲該值;

 1 /* Look for tcp options. Normally only called on SYN and SYNACK packets.
 2  * But, this can also be called on packets in the established flow when
 3  * the fast version below fails.
 4  */
 5 void tcp_parse_options(const struct sk_buff *skb,
 6                struct tcp_options_received *opt_rx, int estab,
 7                struct tcp_fastopen_cookie *foc)
 8 {
 9     switch (opcode) {
10     case TCPOPT_MSS:
11     if (opsize == TCPOLEN_MSS && th->syn && !estab) {
12     u16 in_mss = get_unaligned_be16(ptr);
13      if (in_mss) {
14          if (opt_rx->user_mss && opt_rx->user_mss < in_mss)
15              in_mss = opt_rx->user_mss;
16              opt_rx->mss_clamp = in_mss;
17     }
18     }
19     break;
20 }

 

在分配了請求控制塊,對控制塊進行初始化的時候,使用從選項中獲取的最大mss初始化控制塊的mss;

1 static void tcp_openreq_init(struct request_sock *req,
2                  const struct tcp_options_received *rx_opt,
3                  struct sk_buff *skb, const struct sock *sk)
4 {
5     struct inet_request_sock *ireq = inet_rsk(req);
6         /*  ... */
7     req->mss = rx_opt->mss_clamp;
8         /*  ... */
9 }

 

第二次握手
服務器發送syn+ack

在請求控制塊添加到鏈接鏈表以後,須要向客戶端發送syn+ack,在構造synack包時,須要在選項中指明本端的mss,調用關係以下:tcp_v4_send_synack–>tcp_make_synack–>tcp_synack_options;首先獲取mss,方法與前客戶端的方法一致,即從路由表中獲取mss,與用戶配置的user_mss進行比較,取其中較小值;而後調用選項設置將該mss加入到選項中;

 1 struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
 2                 struct request_sock *req,
 3                 struct tcp_fastopen_cookie *foc,
 4                 enum tcp_synack_type synack_type)
 5 {
 6     /* mss取從路由表中查詢的mss與user_mss之間的較小值 */
 7     mss = tcp_mss_clamp(tp, dst_metric_advmss(dst));
 8         /* 設置tcp選項 */ 
 9        tcp_header_size = tcp_synack_options(req, mss, skb, &opts, md5, foc) +   sizeof(*th);
10 }

 

 1 /* Set up TCP options for SYN-ACKs. */
 2 static unsigned int tcp_synack_options(struct request_sock *req,
 3                        unsigned int mss, struct sk_buff *skb,
 4                        struct tcp_out_options *opts,
 5                        const struct tcp_md5sig_key *md5,
 6                        struct tcp_fastopen_cookie *foc)
 7 {
 8     struct inet_request_sock *ireq = inet_rsk(req);
 9     unsigned int remaining = MAX_TCP_OPTION_SPACE;
10 
11     /* We always send an MSS option. */
12     opts->mss = mss;
13     remaining -= TCPOLEN_MSS_ALIGNED;
14 }

 

客戶端接收syn+ack

客戶端當前處於SYN_SENT狀態,此時收到服務器發來的syn+ack包,客戶端進行如下工做:(1)解析該包tcp選項中的mss ,存入opt_rx->mss_clamp (2) 經過最新的pmtu計算mss (3) 估算對端mss (4) 若是須要進入快速模式,則須要經過rcv_mss計算快速模式額度;

 1 static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb,
 2                      const struct tcphdr *th)
 3 {
 4     struct inet_connection_sock *icsk = inet_csk(sk);
 5     struct tcp_sock *tp = tcp_sk(sk);
 6     struct tcp_fastopen_cookie foc = { .len = -1 };
 7     int saved_clamp = tp->rx_opt.mss_clamp;
 8     bool fastopen_fail;
 9          /* ... */
10     /* (1)解析tcp選項 */
11     tcp_parse_options(skb, &tp->rx_opt, 0, &foc);
12         /* ... */
13         /* (2)計算mss */
14         tcp_sync_mss(sk, icsk->icsk_pmtu_cookie); 
15         /* (3)初始化rcv_mss */ 
16         tcp_initialize_rcv_mss(sk);
17         /* ... */
18        /* (4)進入快速ack模式 */
19        tcp_enter_quickack_mode(sk);
20 }

 

已鏈接狀態發送數據

tcp發送數據系統調用最終會調用tcp_sendmsg函數,該函數會在發送數據以前,獲取發送mss,該mss用於限制後續發送數據段大小;

1 int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
2 {
3         /*...*/
4     mss_now = tcp_send_mss(sk, &size_goal, flags);
5         /*...*/
6 }

 

1 static int tcp_send_mss(struct sock *sk, int *size_goal, int flags)
2 {
3     int mss_now;
4 
5     mss_now = tcp_current_mss(sk);
6     *size_goal = tcp_xmit_size_goal(sk, mss_now, !(flags & MSG_OOB));
7 
8     return mss_now;
9 }

 

tcp_current_mss函數根據當前mtu和實際頭部選項長度,來更新mss值;

 1 /* Compute the current effective MSS, taking SACKs and IP options,
 2  * and even PMTU discovery events into account.
 3  */
 4 unsigned int tcp_current_mss(struct sock *sk)
 5 {
 6     const struct tcp_sock *tp = tcp_sk(sk);
 7     const struct dst_entry *dst = __sk_dst_get(sk);
 8     u32 mss_now;
 9     unsigned int header_len;
10     struct tcp_out_options opts;
11     struct tcp_md5sig_key *md5;
12 
13     /* 獲取當前有效mss */
14     mss_now = tp->mss_cache;
15 
16     /* 路由緩存存在 */
17     if (dst) {
18         /* 獲取路徑mtu */
19         u32 mtu = dst_mtu(dst);
20 
21         /* 兩個mtu不相等,以當前mtu爲準更新mss */
22         if (mtu != inet_csk(sk)->icsk_pmtu_cookie)
23             mss_now = tcp_sync_mss(sk, mtu);
24     }
25 
26     /* 獲取頭部長度 */
27     header_len = tcp_established_options(sk, NULL, &opts, &md5) +
28              sizeof(struct tcphdr);
29     /* The mss_cache is sized based on tp->tcp_header_len, which assumes
30      * some common options. If this is an odd packet (because we have SACK
31      * blocks etc) then our calculated header_len will be different, and
32      * we have to adjust mss_now correspondingly */
33 
34     /*  頭部長度不等,須要更新mss */
35     if (header_len != tp->tcp_header_len) {
36         int delta = (int) header_len - tp->tcp_header_len;
37         mss_now -= delta;
38     }
39 
40     /* 返回mss */
41     return mss_now;
42 }

 

函數tcp_sync_mss

這個函數上面的諸多流程都有用到,這裏統一進行分析說明;

 1 /* This function synchronize snd mss to current pmtu/exthdr set.
 2 
 3    tp->rx_opt.user_mss is mss set by user by TCP_MAXSEG. It does NOT counts
 4    for TCP options, but includes only bare TCP header.
 5 
 6    tp->rx_opt.mss_clamp is mss negotiated at connection setup.
 7    It is minimum of user_mss and mss received with SYN.
 8    It also does not include TCP options.
 9 
10    inet_csk(sk)->icsk_pmtu_cookie is last pmtu, seen by this function.
11 
12    tp->mss_cache is current effective sending mss, including
13    all tcp options except for SACKs. It is evaluated,
14    taking into account current pmtu, but never exceeds
15    tp->rx_opt.mss_clamp.
16 
17    NOTE1. rfc1122 clearly states that advertised MSS
18    DOES NOT include either tcp or ip options.
19 
20    NOTE2. inet_csk(sk)->icsk_pmtu_cookie and tp->mss_cache
21    are READ ONLY outside this function.        --ANK (980731)
22  */
23 /*更新mss */
24 unsigned int tcp_sync_mss(struct sock *sk, u32 pmtu)
25 {
26     struct tcp_sock *tp = tcp_sk(sk);
27     struct inet_connection_sock *icsk = inet_csk(sk);
28     int mss_now;
29 
30     /* 發現mtu上限>路徑mtu,則重置爲路徑mtu */
31     if (icsk->icsk_mtup.search_high > pmtu)
32         icsk->icsk_mtup.search_high = pmtu;
33 
34     /* 計算當前mss */
35     mss_now = tcp_mtu_to_mss(sk, pmtu);
36     /* 根據對端通知的最大窗口和當前mss大小調整mss */
37     mss_now = tcp_bound_to_half_wnd(tp, mss_now);
38 
39     /* And store cached results */
40     /* 記錄最新的路徑mtu */
41     icsk->icsk_pmtu_cookie = pmtu;
42     /* 啓用了路徑mtu發現 */
43     if (icsk->icsk_mtup.enabled)
44         /* mss爲當前mss和mss探測下限計算所得的最小值 */
45         mss_now = min(mss_now, tcp_mtu_to_mss(sk, icsk->icsk_mtup.search_low));
46     /* 當前mss緩存 */
47     tp->mss_cache = mss_now;
48 
49     return mss_now;
50 }

 

下面兩個函數做用爲根據mtu計算mss;

1 /* 計算mss,未包含SACK */
2 int tcp_mtu_to_mss(struct sock *sk, int pmtu)
3 {
4     /* Subtract TCP options size, not including SACKs */
5     /* 去掉tcp選項的長度 */
6     return __tcp_mtu_to_mss(sk, pmtu) -
7            (tcp_sk(sk)->tcp_header_len - sizeof(struct tcphdr));
8 }

 

 1 /* 在不根據tcp選項的狀況下計算mss */
 2 static inline int __tcp_mtu_to_mss(struct sock *sk, int pmtu)
 3 {
 4     const struct tcp_sock *tp = tcp_sk(sk);
 5     const struct inet_connection_sock *icsk = inet_csk(sk);
 6     int mss_now;
 7 
 8     /* Calculate base mss without TCP options:
 9        It is MMS_S - sizeof(tcphdr) of rfc1122
10      */
11     /* 當前mss = 路徑mtu - 網絡頭 - tcp頭 */
12     mss_now = pmtu - icsk->icsk_af_ops->net_header_len - sizeof(struct tcphdr);
13 
14     /* IPv6 adds a frag_hdr in case RTAX_FEATURE_ALLFRAG is set */
15     if (icsk->icsk_af_ops->net_frag_header_len) {
16         const struct dst_entry *dst = __sk_dst_get(sk);
17 
18         if (dst && dst_allfrag(dst))
19             mss_now -= icsk->icsk_af_ops->net_frag_header_len;
20     }
21 
22     /* Clamp it (mss_clamp does not include tcp options) */
23     /* 當前mss > mss最大值,調整成最大值 */
24     if (mss_now > tp->rx_opt.mss_clamp)
25         mss_now = tp->rx_opt.mss_clamp;
26 
27     /* Now subtract optional transport overhead */
28     /* mss減去ip選項長度 */
29     mss_now -= icsk->icsk_ext_hdr_len;
30 
31     /* Then reserve room for full set of TCP options and 8 bytes of data */
32     /* 若不足48,則須要擴充保留40字節的tcp選項和8字節的tcp數據長度 */
33     /* 8+20+20+18=64,最小包長 */
34     if (mss_now < 48)
35         mss_now = 48;
36 
37     /* 返回mss */
38     return mss_now;
39 }

 

tcp_bound_to_half_wnd函數根據對端通告窗口的最大值來調整mss;若是最大窗口大於默認mss,則當前mss不能超過窗口的一半,固然也不能過小,最小68-headerlen;

 1 static inline int tcp_bound_to_half_wnd(struct tcp_sock *tp, int pktsize)
 2 {
 3     int cutoff;
 4 
 5     /* When peer uses tiny windows, there is no use in packetizing
 6      * to sub-MSS pieces for the sake of SWS or making sure there
 7      * are enough packets in the pipe for fast recovery.
 8      *
 9      * On the other hand, for extremely large MSS devices, handling
10      * smaller than MSS windows in this way does make sense.
11      */
12     /* 
13         對端通告的最大窗口> 默認mss 
14         cutoff記錄最大窗口的一半
15     */
16     if (tp->max_window > TCP_MSS_DEFAULT)
17         cutoff = (tp->max_window >> 1);
18     /* <=默認mss,則記錄最大窗口 */
19     else
20         cutoff = tp->max_window;
21     
22 
23     /* 包大小值限制在68-header <= x <=cutoff之間 */
24 
25 
26     
27     /* 包大小> cutoff,則從cutoff和最小mtu之間取大的 */
28     if (cutoff && pktsize > cutoff)
29         return max_t(int, cutoff, 68U - tp->tcp_header_len);
30 
31     /* 包大小<= cutoff,返回包大小 */
32     /* 窗口很大,則使用包大小 */
33     else
34         return pktsize;
35 }
相關文章
相關標籤/搜索