基於 UDP 的數據傳輸協議

UDT: UDP-based Data Transfer Protocolhtml

UDT: UDP-based Data Transfer Protocol算法

draft-gg-udt-03api

UDT: 基於 UDP 的數據傳輸協議(初譯)數組

(譯者:Jack)安全

Status of this Memo

This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time.It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."服務器

The list of current Internet-Drafts can be accessed atcookie

http://www.ietf.org/ietf/1id-abstracts.txt.網絡

The list of Internet-Draft Shadow Directories can be accessed at數據結構

http://www.ietf.org/shadow.html.併發

This Internet-Draft will expire on October 15, 2010.

Copyright Notice

Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document.

Abstract

This document describes UDT, or the UDP based Data Transfer protocol. UDT is designed to be an alternative data transfer protocol for the situations when TCP does not work well. One of the most common cases, and also the original motivation of UDT, is to overcome TCP's inefficiency in high bandwidth-delay product (BDP) networks. Another important target use scenario is to allow networking researchers, students, and application developers to easily implement and deploy new data transfer algorithms and protocols. Furthermore, UDT can also be used to better support firewall traversing.

UDT is completely built on top of UDP. However, UDT is connection oriented, unicast, and duplex. It supports both reliable data streaming and partial reliable messaging. The congestion control module is an open framework that can be used to implement and/or deploy different control algorithms. UDT also has a native/default control algorithm based on AIMD rate control.

Table of Contents

   1. Introduction...................................................4
   2. Packet Structures..............................................5
   3. UDP Multiplexer................................................8
   4. Timers.........................................................8
   5. Connection Setup and shutdown..................................9
      5.1 Client/Server Connection Setup............................10
      5.2 Rendezvous Connection Setup...............................10
      5.3 Shutdown..................................................11
   6. Data Sending and Receiving....................................11
      6.1 The Sender's Algorithm....................................11
      6.2 The Receiver's Algorithm..................................12
      6.3 Flow Control..............................................15
      6.4 Loss Information Compression Scheme.......................15
   7. Configurable Congestion Control (CCC).........................15
      7.1 CCC Interface.............................................15
      7.2 UDT's Native Control Algorithm............................16
   Security Considerations..........................................18
   Normative References.............................................18
   Informative References...........................................18
   Author's Addresses...............................................19

本文狀態:

這個草案已提交給 IETF,徹底符合 BCP 78 和 BCP 79 文檔。

IETF 和其它工做組成員均可能發佈 Internet 草案。 通常 Internet 草案文檔通常在超過 6 個月將可能被更新, 或者替換, 或者任什麼時候候均可能被廢除。

當前Internet草案信息能在下面站訪問:

http://www.ietf.org/download/id-abstract.txt

Internet草案文檔能在下面站訪問:

http://www.ietf.org/shadow.html

這份文檔將在 2010 年 10 月 15 日到期。

著做權

版權歸屬IETF和文檔做者。。。

摘要

本文檔介紹 UDT (基於UDP的數據傳輸協議)。UDT 是設計用來替代在使用 TCP 時的狀況並很差時的數據傳輸協議。其中最多見的狀況下,也是UDT動機,就是要克服TCP的在高帶寬時網絡延時。另外一種目標是讓網絡研究人員,學生,以及應用開發商可以輕鬆地實施和部署新的數據傳輸算法和協議。此外,UDT也能夠可用於更好地支持防火牆穿越。

UDT是徹底創建在UDP的上面。然而,UDT是面向鏈接,單播,和全雙工。它同時支持可靠的數據流和部分可靠的消息傳遞。擁塞控制模塊是一個開放的框架,可用於執行或部署不一樣的控制算法。UDT也有默認基於AIMD控制算法。

目錄:

1.     簡介    4
2.     數據包結構    6
3.     UDP 多路複用    11
4.     定時器    12
5.     創建鏈接和關閉    13
6.     數據發送和接收    15
7.     可配置的擁塞控制    22

1. Introduction 簡介

The Transmission Control Protocol (TCP) [RFC5681] has been very successful and greatly contributes to the popularity of today's Internet. Today TCP still contributes the majority of the traffic on the Internet.

However, TCP is not perfect and it is not designed for every specific applications. In the last several years, with the rapid advance of optical networks and rich Internet applications, TCP has been found inefficient as the network bandwidth-delay product (BDP) increases. Its AIMD (additive increase multiplicative decrease) algorithm reduces the TCP congestion window drastically but fails to recover it to the available bandwidth quickly. Theoretical flow level analysis has shown that TCP becomes more vulnerable to packet loss as the BDP increases higher [LM97].

To overcome the TCP's inefficiency problem over high speed wide area networks is the original motivation of UDT. Although there are new TCP variants deployed today (for example, BiC TCP [XHR04] on Linux and Compound TCP [TS06] on Windows), certain problems still exist. For example, none of the new TCP variants address RTT unfairness, the situation that connections with shorter RTT consume more bandwidth.

Moreover, as the Internet continues to evolve, new challenges and requirements to the transport protocol will always emerge. Researchers need a platform to rapidly develop and test new algorithms and protocols. Network researchers and students can use UDT to easily implement their ideas on transport protocols, in particular congestion control algorithms, and conduct experiments over real networks.

Finally, there are other situations when UDT can be found more helpful than TCP. For example, UDP-based protocol is usually easier for punching NAT firewalls. For another example, TCP's congestion control and reliability control is not desirable in certain applications of VOIP, wireless communication, etc. Application developers can use (with or without modification) UDT to suit their requirements.

Due to all those reasons and motivations described above, we believe that it is necessary to design a well defined and developed UDP-based data transfer protocol.As its name suggest, UDT is built solely on the top of UDP [RFC768]. Both data and control packets are transferred using UDP. UDT is connection-oriented in order to easily maintain congestion control, reliability, and security. It is a unicast protocol while multicast is not considered here. Finally, data can be transferred over UDT in duplex.

UDT supports both reliable data streaming and partial reliable messaging. The data streaming semantics is similar to that of TCP, while the messaging semantics can be regarded as a subset of SCTP [RFC4960].

This document defines UDT's protocol specification. The detailed description and performance analysis can be found in [GG07],and a fully functional reference implementation can be found at [UDT].

傳輸控制協議(TCP)[RFC5681]已經很是成功,大大促進了今天的互聯網的普及。TCP在如今互聯網上仍然作爲主要的通訊協議。

可是,TCP是不完美的,它不是爲每一個特定應用而設計。在過去的幾年裏,隨着光纖網絡和豐富的互聯網應用的快速推動,發現隨着網絡帶寬延遲成倍的增漲,TCP變得效率低下。它的AIMD(additive increase multiplicative decrease)的TCP算法下降擁塞窗口,但不能快速恢復到可用帶寬。理論上的流量分析代表TCP在BDP [LM97]增漲到很高的時候,更加容易丟失包。

爲了克服以上的高速廣域網上TCP的效率低下問題。UDT就是以此做爲動機的。雖然有新的TCP方案(例如:Linux 上的BiC TCP [XHR04]和Windows 上的Compound TCP [TS06]),但仍有一些問題存在。例如,新的TCP存在RTT不公平性,有可能致使鏈接佔用更多的帶寬。

另外,隨着互聯網的不斷髮展,新的傳輸協議制定將不斷出現。研究人員須要一個平臺,以迅速開發和測試新的算法和協議。網絡研究人員和學生能夠方便地使用UDT的傳輸協議的實施,特別是他們的想法擁塞控制算法,並在實際網絡中進行實驗。

最後,能夠找到不少其它須要UDT輔助TCP的情形。例如,基於UDP協議的NAT防火牆穿透。又例如,VoIP不能控制TCP的擁塞控制和可靠性,無線通訊等應用程序開發人員可使用某些應用理想(或不經修改)UDT的,以適應他們的須要。

因爲如上所述的這些緣由和動機,咱們認爲有必要設計一個基於UDP的數據傳輸協議。正如其名稱所示,UDT是單純的創建在UDP [RFC768]之上。數據包和控制數據包這二者傳輸都使用UDP傳輸。 UDT是面向鏈接,以便輕鬆維護擁塞控制的可靠性和安全性。它是一個單播協議,而多播並無做考慮。最後,UDT傳輸數據是以全雙工進行的。

UDT的同時支持可靠的數據流和可靠的消息傳遞。數據流語義上類同TCP,雖然消息語義能夠做爲SCTP協議[RFC4960]的子集同樣看。

本文檔定義了UDT的協議規範,詳細的描述和性能分析能夠在[GG07]文檔中找到,一個完整功能的參考實現能夠在udt源碼中找到。

2. Packet Structures 數據包結構

UDT has two kinds of packets: the data packets and the control packets. They are distinguished by the 1st bit (flag bit) of the packet header.

The data packet header structure is as following.

 

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |0|                     Packet Sequence Number                  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |FF |O|                     Message Number                      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                          Time Stamp                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    Destination Socket ID                      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The data packet header starts with 0. Packet sequence number uses the following 31 bits after the flag bit. UDT uses packet based sequencing, i.e., the sequence number is increased by 1 for each sent data packet in the order of packet sending. Sequence number is wrapped after it is increased to the maximum number (2^31 - 1). The next 32-bit field in the header is for the messaging. The first two bits "FF" flags the position of the packet is a message. "10" is the first packet, "01" is the last one, "11" is the only packet, and "00" is any packets in the middle. The third bit "O" means if the message should be delivered in order (1) or not (0). A message to be delivered in order requires that all previous messages must be either delivered or dropped. The rest 29 bits is the message number, similar to packet sequence number (but independent). A UDT message may contain multiple UDT packets.

Following are the 32-bit time stamp when the packet is sent and the destination socket ID. The time stamp is a relative value starting from the time when the connection is set up. The time stamp information is not required by UDT or its native control algorithm.

It is included only in case that a user defined control algorithm may require the information (See Section 6).

The Destination ID is used for UDP multiplexer. Multiple UDT socket can be bound on the same UDP port and this UDT socket ID is used to differentiate the UDT connections.

If the flag bit of a UDT packet is 1, then it is a control packet and parsed according to the following structure.

   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |1|             Type            |            Reserved           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     |                    Additional Info                      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                            Time Stamp                         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    Destination Socket ID                      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   ~                 Control Information Field                     ~
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

There are 8 types of control packets in UDT and the type information is put in bit field 1 - 15 of the header. The contents of the following fields depend on the packet type. The first 128 bits must exist in the packet header, whereas there may be an empty control information field, depending on the packet type.

Particularly, UDT uses sub-sequencing for ACK packet. Each ACK packet is assigned a unique increasing 16-bit sequence number, which is independent of the data packet sequence number. The ACK sequence number uses bits 32 - 63 ("Additional Info") in the control packet header. The ACK sequence number ranges from 0 to (2^31 - 1).

UDT的有兩種類型的數據包:數據包和控制包。他們的區別是第一位(標誌位的報頭)。

數據包結構以下圖所示:

   0                   1                   2                   3

   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   |0|                        包序號                               |

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   |FF |O|                    消息編號                             |

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   |                           時間戳                              |

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   |                       目標套接字ID                           |

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

數據包頭始於0。包序列號在數據包標誌位後的31位。 UDT數據包基於序列號,即按照每一個數據包序列號加1的順序發送數據包。序列號在封裝到數據包後將遞增,最大取值是(2 ^ 31 - 1)。(譯者注:重傳的數據包不致使序列號增長)

接下來的數據包頭的32位用於傳遞信息。開始2位爲「FF」標記的是數據包的位置的消息。 「10」是第一個數據包,「01」是最後一個,「11」是惟一的數據包,「00」是在中間的任何數據包。第三位「0」意味着若是該消息應傳輸順序(1)否(0)。若是爲1,則將必需要求以前全部消息都將傳輸完成或丟棄。其他29位是消息編號,相似包的序列號(但不相干)。一個UDT消息可能包含多個UDT的數據包。

再如下是32位的時間戳和數據包發送給目標的UDT套接字ID。時間戳是一個從鏈接時設置的一個相對值。時間戳信息不需依靠UDT或控制算法。這個可能只是包括在用戶自定義控制算法的狀況下可能須要的信息(見第6條)。

該目標套接字ID是用於UDP的多路通訊。 多個UDT套接字能夠綁定在同一個UDP端口,UDT的套接字ID是用來區分UDT的鏈接。

若是一個UDT包標誌位爲1,那麼它是一個控制數據包,而且根據如下解析結構。

   0                   1                   2                   3

   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   |1|             類型            |            保留               |

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   |     |                    附加信息                             |

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   |                           時間戳                              |

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   |                        目標套接字ID                          |

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   |                                                               |

   ~                        控制信息字段                          ~

   |                                                               |

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

UDT控制包中中共有8種類型,類型的信息將在位置1 – 15位中。如下字段的內容取決於數據包類型。包頭從開始128位必須存在,但根據數據包類型,控制信息字段則有可能爲空。

特別是,UDT的ACK數據包使用子序列號。每一個ACK數據包分配一個獨一無二的16位遞增序列號,這個序列號與數據包序列號無關。ACK包序列號的在32 – 63位(在控制數據包標識「Additional Info」的位置)。ACK序列號取值範圍從0到(2 ^ 31 - 1)。

   TYPE 0x0:  Protocol Connection Handshake

              Additional Info: Undefined

              Control Info:

              1) 32 bits: UDT version

              2) 32 bits: Socket Type (STREAM or DGRAM)

              3) 32 bits: initial packet sequence number

              4) 32 bits: maximum packet size (including UDP/IP headers)

              5) 32 bits: maximum flow window size

              6) 32 bits: connection type (regular or rendezvous)

              7) 32 bits: socket ID

              8) 32 bits: SYN cookie

              9) 128 bits: the IP address of the peer's UDP socket

 

   TYPE 0x1:  Keep-alive

              Additional Info: Undefined

              Control Info: None

 

   TYPE 0x2:  Acknowledgement (ACK)

              Additional Info: ACK sequence number

              Control Info:

              1) 32 bits: The packet sequence number to which all the

                 previous packets have been received (excluding)

              [The following fields are optional]

              2) 32 bits: RTT (in microseconds)

              3) 32 bits: RTT variance

              4) 32 bits: Available buffer size (in bytes)

              5) 32 bits: Packets receiving rate (in number of packets

                          per second)

              6) 32 bits: Estimated link capacity (in number of packets

                          per second)

 

   TYPE 0x3:  Negative Acknowledgement (NAK)

              Additional Info: Undefined

              Control Info:

              1) 32 bits integer array of compressed loss information

                 (see section 3.9).

 

   TYPE 0x4:  Unused

 

   TYPE 0x5:  Shutdown

              Additional Info: Undefined

              Control Info: None

 

   TYPE 0x6:  Acknowledgement of Acknowledgement (ACK2)

              Additional Info: ACK sequence number

              Control Info: None

 

   TYPE 0x7:  Message Drop Request:

              Additional Info: Message ID

              Control Info:

              1) 32 bits: First sequence number in the message

              2) 32 bits: Last sequence number in the message

 

   TYPE 0x7FFF: Explained by bits 16 - 31, reserved for user defined

              Control Packet

Finally, Time Stamp and Destination Socket ID also exist in the control packets.

TYPE 0x0:鏈接握手協議

              附加信息(Additional Info):未定義

              控制信息:

              1)32位:UDT的版本

              2)32位:UDT的SOCKET類型(STREAM or DGRAM)

              3)32位:初始序列號

              4)32位:最大數據包大小(包括UDP / IP的報頭)

              5)32位:最大流量窗口大小

              6)32位:鏈接類型(regular 或 rendezvous)

              7)32位:套接字ID

              8)32位:SYN Cookie

              9)128位:UDP套接字的IP地址

 

   TYPE 0x1:保持存活

              附加信息(Additional Info):未定義

              控制方式:無

 

   TYPE 0x2:應答(ACK)

              附加信息(Additional Info):ACK序列號

              控制信息:

              1)32位:該數據包的序列號,而不含全部的之前已收到的數據包(不含)

              [如下字段是可選]

              2)32位:RTT(微秒)

              3)32位:RTTVar 

              4)32位:可用緩衝區的大小(字節)

              5)32位:數據包接收速率(每秒接收數據包個數)

              6)32位:鏈路容量估值(每秒接收數據包個數)

 

   TYPE 0x3:確認應答(NAK)

              附加信息(Additional Info):未定義

              控制信息:

              1)丟失信息的32位整數數組(見節3.9)。

 

   TYPE 0x4:未使用

 

   TYPE 0x5:關閉

              附加信息(Additional Info):未定義

              控制方式:無

 

   TYPE 0x6:應答一個應答(ACK2)

              附加信息(Additional Info):未定義

              控制方式:無

 

   TYPE 0x7:消息投遞請求:

              附加信息(Additional Info):消息ID

              控制信息:

              1)32位:在消息中最開始的序列號

              2)32位:消息中最後的序列號

 

   TYPE 0x7FFF:位16 - 31,用戶自定義保留

最後,時間和目標套接字ID也存在於控制包。

3. UDP Multiplexer UDP 多路複用

A UDP multiplexer is used to handle concurrent UDT connections sharing the same UDP port. The multiplexer dispatch incoming UDT packets to the corresponding UDT sockets according to the destination socket ID in the packet header.

One multiplexer is used for all UDT connections bound to the same UDP port. That is, UDT sockets on different UDP port will be handled by different multiplexers.

A multiplexer maintains two queues. The sending queue includes the sockets with at least one packet scheduled for sending. The UDT sockets in the sending queue are ordered by the next packet sending time. A high performance timer is maintained by the sending queue and when it is time for the first socket in the queue to send its packet, the packet will be sent and the socket will be removed. If there are more packets for that socket to be sent, the socket will be re-inserted to the queue.

The receiving queue reads incoming packets and dispatches them to the corresponding sockets. If the destination ID is 0, the packet will be sent to the listening socket (if there is any), or to a socket that is in rendezvous connection phase. (See Section 5.)

Similar to the sending queue, the receiving queue also maintains a list of sockets waiting for incoming packets. The receiving queue scans the list to check if any timer expires for each socket every SYN (SYN = 0.01 second, defined in Section 4).

一個UDP多路複用是用於處理併發UDT的鏈接共享相同的UDP端口。多路複用調度傳入的UDT套接字是根據在包頭的目的套接字ID。

一個用於多路複用的同一個UDP端口綁定全部UDT鏈接。這也是,UDT套接字上的不一樣的UDP端口將會有不一樣的多路複用。

多路複用須要維護二個隊列。發送隊列具備至少能爲套接字發送分配一個數據包。UDT的套接字發送數據包是按順序發送。在發送隊列上維護一個高性能的計時器,定時器在第一次套接字發送數據包隊列時啓動,數據包被髮送後套接字將被刪除。若是有更多該套接字發送數據包,套接字將從新插入到隊列中。

接收隊列讀取傳來的數據包,並調度這些數據包到相應的套接字。若是目標ID是0,該數據包將被髮送到監聽套接字(若是有),或聚集到一個鏈接時的套接字。 (見第5節。)

相似發送隊列,接收隊列也一樣維護一個套接字傳入等待接收數據包的列表。接收隊列掃描列表,檢查每個定時器在每一個套接字過時的SYN (SYN = 0.01秒,第4節定義)。

4. Timers 定時器

UDT uses four timers to trigger different periodical events. Each event has its own period and they are all independent. They use the system time as origins and should process wrapping if the system time wraps.

For a certain periodical event E in UDT, suppose the time variable is ET and its period is p. If E is set or reset at system time t0 (ET = t0), then at any time t1, (t1 - ET >= p) is the condition to check if E should be triggered.

The four timers are ACK, NAK, EXP and SND. SND is used in the sender only for rate-based packet sending (see Section 6.1), whereas the other three are used in the receiver only.

ACK is used to trigger an acknowledgement (ACK). Its period is set by the congestion control module. However, UDT will send an ACK no longer than every 0.01 second, even though the congestion control does not need timer-based ACK. Here, 0.01 second is defined as the SYN time, or synchronization time, and it affects many of the other timers used in UDT. NAK is used to trigger a negative acknowledgement (NAK). Its period is dynamically updated to 4 * RTT_+ RTTVar + SYN, where RTTVar is the variance of RTT samples.

EXP is used to trigger data packets retransmission and maintain connection status. Its period is dynamically updated to N * (4 * RTT + RTTVar + SYN), where N is the number of continuous timeouts. To avoid unnecessary timeout, a minimum threshold (e.g., 0.5 second)should be used in the implementation.

The recommended granularity of their periods is microseconds.However, accurate time keeping is not necessary, except for SND.

In the rest of this document, a name of a time variable will be used to represent the associated event, the variable itself, or the value of its period, depending on the context. For example, ACK can mean either the ACK event or the value of ACK period.

UDT使用4個定時器來觸發不一樣的週期性事件。每一個事件都有本身的時期,他們都是獨立的,他們使用的系統時間做爲時間源。 對於UDT的某些週期性事件E,設時間變量爲ET和週期爲P,若是E設置或從新設置在系統時間T0(ET=T0),而後在任一時間T1,將會檢查條件(T1 – ET > = P),知足條件時事件E被觸發。

四個定時器是ACK,NAK,EXP和SND。SND僅是用在發送數據包速率(見第6.1節),而另外3個定時器只用於接收。 ACK是用來觸發一個確認應答(ACK)。它的週期是由擁塞控制模塊設置,UDT將發送一個ACK將不超過每秒0.01秒,儘管擁塞控制模塊不須要定時器ACK,0.01秒是定義SYN時間,或者同步時間,還有它會影響UDT中的其它定時器。

NAK是用於觸發一個否認應答。它的週期是由4 * RTT_+ RTTVar + SYN 動態更新的,其中RTTVar是數據包的RTTVar。

EXP用於觸發數據包重傳和保持鏈接狀態。它的週期是根據N * (4 * RTT + RTTVar + SYN)動態更新的,其中N是鏈接超時值,爲了不沒必要要的超時,最低下限(例如0.5秒)應根據狀況而定。

其推薦的週期單位爲微秒,不必定須要很精確的時間單位,除了SND。

在本文檔的其它部分,一個時間變量名稱將被用來表明相關的事件,變量自己,仍是它的週期值,取決於上下文。例如,可能意味着,要麼是ACK事件或ACK事件的週期。

5. Connection Setup and shutdown 創建鏈接和關閉

UDT supports two different connection setup methods, the traditional client/server mode and the rendezvous mode. In the latter mode, both UDT sockets connect to each other at (approximately) the same time.

The UDT client (in rendezvous mode, both peer are clients) sends a handshake request (type 0 control packet) to the server or the peer side. The handshake packet has the following information (suppose UDT socket A sends this handshake to B):

1) UDT version: this value is for compatibility purpose. The current version is 4.

2) Socket Type: STREAM (0) or DGRAM (1).

3) Initial Sequence Number: It is the sequence number for the first data packet that A will send out. This should be a random value.

4) Packet Size: the maximum size of a data packet (including all headers). This is usually the value of MTU.

5) Maximum Flow Window Size: This value may not be necessary; however, it is needed in the current reference implementation.

6) Connection Type. This information is used to differential the connection setup modes and request/response.

7) Socket ID. The client UDT socket ID.

8) Cookie. This is a cookie value used to avoid SYN flooding attack [RFC4987].

9) Peer IP address: B's IP address.

UDT的支持兩種不一樣的鏈接方式,即傳統的client/server鏈接模式。在後一種模式下,UDT套接字彼此在(大約)同一時間鏈接。

UDT的client(在rendezvous模式,兩個結點都是客戶端)發送一個握手請求(TYPE 0x0的控制數據包)到服務器或另外一端。握手數據包包含如下數據(假設UDT套接字A發送到B的握手):

1)UDT 版本:這個值爲是爲了兼容而設置,當前版本爲4.

2)套接字類型:STREAM (0) or DGRAM (1).

3)初始序列號:它是A將發送的第一個數據包的序列號。這應該是一個隨機值。

4)數據包大小:數據包的最大大小(包括全部頭的最大大小)。這是一般的MTU值。

5)最大流量窗口:這個值可能不是必需的,可是,它是須要在當前實現中。

6)鏈接類型:這個信息是用在不一樣的鏈接模式和請求/響應。

7)套接字ID:客戶端當前的UDT套接字ID。

8)Cookie:這是一個cookie值,用於避免SYN洪水攻擊,參見【RFC4987】。

9)結點IP地址:結點的IP地址。

5.1 Client/Server Connection Setup 客戶端/服務端鏈接設置

One UDT entity starts first as the server (listener). The server accepts and processes incoming connection request, and creates new UDT socket for each new connection. A client that wants to connect to the server will send a handshake packet first. The client should keep on sending the handshake packet every constant interval until it receives a response handshake from the server or a timeout timer expires.

When the server first receives the connection request from a client, it generates a cookie value according to the client address and a secret key and sends it back to the client. The client must then send back the same cookie to the server.

The server, when receiving a handshake packet and the correct cookie, compares the packet size and maximum window size with its own values and set its own values as the smaller ones. The result values are also sent back to the client by a response handshake packet, together with the server's version and initial sequence number. The server is ready for sending/receiving data right after this step is finished.

However, it must send back response packet as long as it receives any further handshakes from the same client.

The client can start sending/receiving data once it gets a response handshake packet from the server. Further response handshake messages, if received any, should be omitted. The connection type from the client should be set to 1 and the response from the server should be set to -1. The client should also check if the response is from the server that the original request was sent to.

首先一個UDT做爲實際的服務器(偵聽端)。該服務器便接受並處理傳入的鏈接請求,併爲每一個新的鏈接建立新的UDT的套接字。一個客戶端要鏈接到該服務器必須首先發送一個握手包。客戶端應該繼續按照發送週期發送握手數據包,直到它接收來自服務器的響應或握手超時計時器。

當服務器第一次接收來自客戶端鏈接請求,它生成一個根據客戶的地址和密鑰cookie值並將它發送回客戶端。客戶端必須再發送回相同的cookie到服務器。

當服務器收到一個握手包和一個正確的cookie,和本身的最大數據包大小和窗口大小值比較,併爲本身設置較小的值。結果值也一樣以響應握手數據包發回給客戶端,並加上服務器的版本和最初的序列號。服務器在完成最後發送數據這一步以後爲發送/接收數據就緒。可是,只要它接收來自同一客戶端的任何進一步的握手,它必須發送迴響應數據包。

一旦從服務器得到響應握手數據包,客戶端即可以開始向服務器發送/接收數據。若是再接收到任何響應握手消息應忽略它。從客戶端鏈接類型應設置爲1,響應的服務器應設置爲-1。客戶端也應該檢查響應是不是從原來的請求服務器發送來的。

5.2 Rendezvous Connection Setup Rendezvous鏈接設置

In this mode, both clients send a connect request to each other at the same time. The initial connection type is set to 0. Once a peer receives a connection request, it sends back a response. If the connection type is 0, then the response sends back -1; if the connection type is -1, then the response sends back -2; No response will be sent for -2 request.

The rendezvous peer does the same check on the handshake messages (version, packet size, window size, etc.) as described in Section 5.1. In addition, the peer only process the connection request from the address it has sent a connection request to. Finally, rendezvous connection should be rejected by a regular UDT server (listener).

A peer initializes the connection when it receives -1 response. The rendezvous connection setup is useful when both peers are behind firewalls. It can also provide better security and usability when a listening server is not desirable.

在這個模式中,這兩個客戶端同時發送一個鏈接請求。最初的鏈接類型設置爲0。一旦一個結點收到鏈接請求,它將發回一個響應。若是鏈接類型是0,那麼發送迴響應-1;若是鏈接類型是-1,那麼發送迴響應-2;無迴應將發送-2請求。

Rendezvour結點不一樣的是在握手消息上的檢查(版本,數據包大小,窗口大小等),如5.1節所述。此外,結點只處理來自該地址的鏈接請求它發出了一個鏈接請求。最後,服務器應該丟棄這個Rendezvour的鏈接。

一個結點初始化鏈接時接收到-1的響應。Rendezvour鏈接設置在對於防火牆後面兩個同結點比較有用。在沒有服務器時,它也能夠提供很好的安全性和可用性時。

5.3 Shutdown 關閉

If one of the connected UDT entities is being closed, it will send a shutdown message to the peer side. The peer side, after received this message, will also be closed. This shutdown message, delivered using UDP, is only sent once and not guaranteed to be received. If the message is not received, the peer side will be closed after 16 continuous EXP timeout (see section 3.5). However, the total timeout value should be between a minimum threshold and a maximum threshold. In our reference implementation, we use 3 seconds and 30 seconds, respectively.

若是一個存在鏈接UDT套接字已關閉,它將發送一個關閉消息給另外一個結點。另外一個結點接收到這個消息,將一樣也關閉。這個關閉消息使用UDP發送,並且只發送一次,因此不保證另外一端能收到。若是這個消息沒有接收到,則另外一端將在EXP定時器超時16次後關閉(見3.5節)。可是,這個超時值應該在最小下限和最大上限之間。在實際實現中,咱們分別使用3秒和30秒。

6. Data Sending and Receiving 數據發送和接收

Each UDT entity has two logical parts: the sender and the receiver. The sender sends (and retransmits) application data according to the flow control and congestion control. The receiver receives both data packets and control packets, and sends out control packets according to the received packets and the timers. The receiver is responsible for triggering and processing all control events, including congestion control and reliability control, and their related mechanisms.

UDT always tries to pack application data into fixed size packets (the maximum packet size negotiated during connection setup), unless there is not enough data to be sent. We explained the rationale of some of the UDT data sending/receiving schemes in [GHG04b].

每一個UDT實現有兩個邏輯部分:發送和接收。這個發送部分(和重傳)是根據應用程序數據的流量控制和擁塞控制。接收部分接收數據包和控制的數據包,並根據接收到的數據包來控制定時器來發出控制數據包。接收部分負責觸發和處理全部控制事件,包括擁塞控制和可靠控制以及它們的相關機制。

除發送的數據不足時,UDT 老是將應用程序數據包打包成固定的大小的數據包(最大數據包大小協商在鏈接過程當中肯定)。解釋UDT發送/接收的這些基本原理在[GHG04b]。

6.1 The Sender's Algorithm 發送算法

Data Structures and Variables:

1. Sender's Loss List: The sender's loss list is used to store the sequence numbers of the lost packets fed back by the receiver through NAK packets or inserted in a timeout event. The numbers are stored in increasing order.

Data Sending Algorithm:

1) If the sender's loss list is not empty, retransmit the first packet in the list and remove it from the list. Go to 5).

2) In messaging mode, if the packets has been the loss list for a time more than the application specified TTL (time-to-live), send a message drop request and remove all related packets from the loss list. Go to 1).

3) Wait until there is application data to be sent.

4) a. If the number of unacknowledged packets exceeds the flow/congestion window size, wait until an ACK comes. Go to 1).

b. Pack a new data packet and send it out.

5) If the sequence number of the current packet is 16n, where n is an integer, go to 2).

6) Wait (SND - t) time, where SND is the inter-packet interval updated by congestion control and t is the total time used by step 1 to step 5. Go to 1).

發送端的各算法數據結構和變量:

1)發送端丟失列表:發件端的丟失列表用於保存經過接收端接收到 NAK 數據包或插入超時事件中丟失的數據包的序列號。列表中的序列號以升序排列。

數據發送算法以下:

1)若是發件端的丟失列表不爲空,從新傳輸列表中的第一個數據包,並從列表中刪除。而後轉5)。

2)在消息傳遞模式下,若是數據包一直是比應用程序指定的TTL(往返時間)更大,發送一個丟棄的消息的請求,和刪除列表中的全部丟失相關的數據包。並前往1)。

3)等待應用程序數據被髮送。

4)a.若是未答應的數據包的數目超出了流量/擠塞窗口大小,轉到 1)。

b.打包一個新的數據包,併發送它。

5)若是當前包的序列號是 16n,其中 n 是一個整數,轉到 2)。

6)等待時間到(SND – t),其中SND是擁塞控制在 inter-packet 間隔,t 是步驟 1 到步驟 5 所用的總時間的時間。 轉到 1)。

6.2 The Receiver's Algorithm 接收算法

Data Structures and Variables:

1) Receiver's Loss List:

It is a list of tuples whose values include:

the sequence numbers of detected lost data packets, the latest feedback time of each tuple, and a parameter k that is the number of times each one has been fed back in NAK. Values are stored in the increasing order of packet sequence numbers.

2) ACK History Window:

A circular array of each sent ACK and the time it is sent out. The most recent value will overwrite the oldest one if no more free space in the array.

3) PKT History Window:

A circular array that records the arrival time of each data packet.

4) Packet Pair Window:

A circular array that records the time interval between each probing packet pair.

5) LRSN:

A variable to record the largest received data packet sequence number. LRSN is initialized to the initial sequence number minus 1.

6) ExpCount:

A variable to record number of continuous EXP time-out events.

數據結構和變量:

1)接收丟失列表:

它的元素是tuple。值包括下面內容: 檢測到的丟失數據的序列號,每一個元組是最新的反饋時間,這是一個參數k的次數每一個反饋在NAK的序列號。列表中的序列號以升序排列。

2)ACK歷史窗口: 每發送一個ACK和它發出的時間是循環數組(譯者注:環形緩衝)。若是數組沒有可用空間,最近的值將覆蓋第一個。

3)PKT歷史窗口: 一個環形數組記錄每一個數據包到達時間。

4)數據包對窗口: 一個環形數組記錄每一個探測包對之間的時間間隔。

5)LRSN: 一個變量來記錄最大接收數據包的序列號。 LRSN被初始化爲初始序列數減1。

6)ExpCount:

一個變量來記錄Exp連續超時事件的數量。

Data Receiving Algorithm:

1) Query the system time to check if ACK, NAK, or EXP timer has expired. If there is any, process the event (as described below in this section) and reset the associated time variables. For ACK, also check the ACK packet interval.

2) Start time bounded UDP receiving. If no packet arrives, go to 1).

1) Reset the ExpCount to 1. If there is no unacknowledged data packet, or if this is an ACK or NAK control packet, reset the EXP timer.

3) Check the flag bit of the packet header. If it is a control packet, process it according to its type and go to 1).

4) If the sequence number of the current data packet is 16n + 1, where n is an integer, record the time interval between this packet and the last data packet in the Packet Pair Window.

5) Record the packet arrival time in PKT History Window.

6) a. If the sequence number of the current data packet is greater than LRSN + 1, put all the sequence numbers between (but excluding) these two values into the receiver's loss list and send them to the sender in an NAK packet.

b. If the sequence number is less than LRSN, remove it from the receiver's loss list.

7) Update LRSN. Go to 1).

數據接收算法:

1)查詢系統時間來檢查ACK,NAK,或EXP定時器是否超時,若是有任何處理事件和重置相關時間變量,一樣檢查ACK包的時間。

2)開始定時UDP接收。若是沒有數據包到達,到1)。

1)將ExpCount重置爲1。若是沒有未確認數據包,或者若是這是一個ACK或NAK控制數據包,重置Exp定時器。

3)檢查數據包的報頭標誌位。若是它是一個控制分組,根據其類型處理它,並轉1)。

4)若是當前數據包的序列號是16n + 1,其中n是一個整數,記錄當前包和最後數據包在包數據包對窗口的時間間隔。

5)記錄數據包到達時間到PKT歷史窗口。

6)a.若是當前的數據數據包序列號大於 LRSN + 1,把丟失列表中全部序列號之間 (但不包括) 這兩個值到接收者的,並將它們以 NAK數據包發送到發送端。

b.若是序列號小於LRSN,從接收丟失清單中刪除。

7)更新LRSN。轉到1)。

ACK Event Processing:

1) Find the sequence number prior to which all the packets have been received by the receiver (ACK number) according to the following rule:

if the receiver's loss list is empty, the ACK number is LRSN + 1; otherwise it is the smallest sequence number in the receiver's loss list.

2) If (a) the ACK number equals to the largest ACK number ever acknowledged by ACK2, or (b) it is equal to the ACK number in the last ACK and the time interval between this two ACK packets is less than 2 RTTs, stop (do not send this ACK).

3) Assign this ACK a unique increasing ACK sequence number. Pack the ACK packet with RTT, RTT Variance, and flow window size (available receiver buffer size). If this ACK is not triggered by ACK timers,send out this ACK and stop.

4) Calculate the packet arrival speed according to the following algorithm:

Calculate the median value of the last 16 packet arrival intervals (AI) using the values stored in PKT History Window.In these 16 values, remove those either greater than AI*8 or less than AI/8. If more than 8 values are left, calculate the average of the left values AI', and the packet arrival speed is 1/AI' (number of packets per second). Otherwise, return 0.

5) Calculate the estimated link capacity according to the following algorithm:

Calculate the median value of the last 16 packet pair intervals (PI) using the values in Packet Pair Window, and the link capacity is 1/PI (number of packets per second).

6) Pack the packet arrival speed and estimated link capacity into the ACK packet and send it out.

7) Record the ACK sequence number, ACK number and the departure time of this ACK in the ACK History Window.

ACK事件處理:

1) 找到序列號以前,已由接收端 (ACK 號) 根據如下規則爲接收全部數據包: 若是接收丟失列表是空,ACK編號是LRSN+1,不然爲接收列表中的最小的序列號。

2) 若是(a)的ACK等於的以前最大的ACK應答ACK2值, ACK值和這兩個 ACK 數據包之間的時間間隔是少於 2 RTTs (不發送此 ACK)。

3)指定這個ACK應答增長一個不重複的序列號。將RTT值打包到ACK包,RTT的變更,和流量窗口大小(可接收緩衝區大小)的ACK數據包。若是這個應答是不會觸發的應答定時器,發出此ACK並中止。

4)計算包達到速度算法:

計算過去16個包的到達時間間隔(AI)使用PKT歷史窗口保存這16個值,刪除這16個值中大於AI*8或小於AI/8的修正,計算平均值,平均值AI'和數據包到達速度1/AI'(每秒數據包數)。不然,返回0。

5)預計鏈路容量算法: 計算過去的16對包之間的時間間隔(PI)的在窗口中的中間值,而鏈路中的容量是1/PI(每秒數據包數)

6)根據數據包到達速度計算出的帶寬來打包和發送ACK 包。

7)記錄ACK序列號,ACK歷史窗口用於記錄ACK編號和發送出時的時間。

NAK Event Processing:

Search the receiver's loss list, find out all those sequence numbers whose last feedback time is k*RTT before, where k is initialized as 2 and increased by 1 each time the number is fed back. Compress (according to section 6.4) and send these numbers back to the sender in an NAK packet.

EXP Event Processing: 1) Put all the unacknowledged packets into the sender's loss list. 2) If (ExpCount > 16) and at least 3 seconds has elapsed since that last time when ExpCount is reset to 1, or, 3 minutes has elapsed, close the UDT connection and exit. 3) If the sender's loss list is empty, send a keep-alive packet to the peer side. 4) Increase ExpCount by 1.

NAK事件處理:

搜索接收丟失表,找出全部序列號反饋時間是K*RTT以前的,這裏K是初始化成2和在每次反饋時按1遞增,壓縮(按照6.4節)以NAK數據包發送這個數值給發送端。

EXP 事件處理: 1)將全部未應答放入發送丟失列表中。 2)若是(ExpCount > 16)且在最少3秒中將ExpCount設置爲1,或通過3分鐘,關閉UDT鏈接和退出。 3)若是發送端丟失列表爲空,發送一個心跳包到這個結點。 4)自增ExpCount。 On ACK packet received: 1) Update the largest acknowledged sequence number. 2) Send back an ACK2 with the same ACK sequence number in this ACK. 3) Update RTT and RTTVar. 4) Update both ACK and NAK period to 4 * RTT + RTTVar + SYN. 5) Update flow window size. 6) If this is a Light ACK, stop. 7) Update packet arrival rate: A = (A * 7 + a) / 8, where a is the value carried in the ACK. 8) Update estimated link capacity: B = (B * 7 + b) / 8, where b is the value carried in the ACK. 9) Update sender's buffer (by releasing the buffer that has been acknowledged). 10) Update sender's loss list (by removing all those that has been acknowledged). 收到ACK數據包: 1) 更新應答序列號。 2)按照ACK的序列號發回一個ACK2。 3)更新RTT和RTTVar。 4)更新ACK和NAK週期爲4 * RTT + RTTVar + SYN。 5)更新流量窗口大小。 6)若是這是一個Light ACK,則中止。 7)更新包到達速率A = (A * 7 + a) / 8,其中a的值取自ACK。 8)更新預計帶寬:B = (B * 7 + b) / 8,其中b的值取自ACK。 9)更新發送端緩衝(釋放的應答後的緩衝)。 10)更新發送丟失列表(移除已經答應的)。 On NAK packet received: 1) Add all sequence numbers carried in the NAK into the sender's loss list. 2) Update the SND period by rate control (see section 3.6). 3) Reset the EXP time variable. 收到NAK數據 1)添加全部序列號在NAK到發送丟失列表。 2)根據速率更新SND週期(見3.6節)。 3)重置EXP時間變量。 On ACK2 packet received: 1) Locate the related ACK in the ACK History Window according to the ACK sequence number in this ACK2. 2) Update the largest ACK number ever been acknowledged. 3) Calculate new rtt according to the ACK2 arrival time and the ACK departure time, and update the RTT value as: RTT = (RTT * 7 + rtt) / 8. 4) Update RTTVar by: RTTVar = (RTTVar * 3 + abs(RTT - rtt)) / 4. 5) Update both ACK and NAK period to 4 * RTT + RTTVar + SYN. 接收到ACK2: 1)根據ACK2的序列號從ACK歷史窗口中找出。 2)更新最大ACK編號在應答後。 3)計算新的RTT根據ACK2到達的時間和ACK發送出的時間,以及更新RTT值:RTT = (RTT * 7 + rtt)/ 8。 4)更新RTTVar:RTTVar = (RTTVar * 3 + abs(RTT - rtt)) / 4。 5)更新ACK和NAK週期到 4 * RTT + RTTVar + SYN。 On message drop request received: 1) Tag all packets belong to the message in the receiver buffer so that they will not be read. 2) Remove all corresponding packets in the receiver's loss list. 接收到丟棄消息請求 1)標記全部在接收緩衝中相應的數據包再也不讀取。 2)移除全部相應的接收丟失列表。 On Keep-alive packet received: Do nothing. On Handshake/Shutdown packet received: See Section 5.

接收到心跳包:

不作任何事。

在握手/關閉時接收見5節。

6.3 Flow Control 流量窗口

The flow control window size is 16 initially.

On ACK packet received: The flow window size is updated to the receiver's available buffer size.

流量控制窗口大小開始爲16。在接收到ACK包後:流量窗口大小更新至接收到的緩衝大小。

6.4 Loss Information Compression Scheme 丟失信息壓縮方案

The loss information carried in an NAK packet is an array of 32-bit integers. If an integer in the array is a normal sequence number (1st bit is 0), it means that the packet with this sequence number is lost; if the 1st bit is 1, it means all the packets starting from (including) this number to (including) the next number in the array (whose 1st bit must be 0) are lost.

For example, the following information carried in an NAK:

0x00000002, 0x80000006, 0x0000000B, 0x0000000E

means packets with sequence number 2, 6, 7, 8, 9, 10, 11, and 14 are lost.

量丟失信息在NAK數據包中是以一個32位的整數數組保存。若是這個數組中的整數是一個正常的序列號(第一位是 0),這表示這個數據包丟失了,若是第一位是1意味着從這個序列號開始到下一個數組中的序列號之間的數據包都丟失了(包括首尾)。

例如:

0x00000002, 0x80000006, 0x0000000B, 0x0000000E

裏面包含的數據包序列號是2,6,7,8,9,10,11和14都丟失了。

7. Configurable Congestion Control (CCC)

7. 可配置的擁塞控制

The congestion control in UDT is an open framework so that user-defined control algorithm can be easily implemented and switched. Particularly, the native control algorithm is also implemented by this framework. The user-defined algorithm may redefine several control routines to read and adjust several UDT parameters. The routines will be called when certain event occurs. For example, when an ACK is received, the control algorithm may increase the congestion window size.

UDT裏的擁塞控制是一個簡單開放的用戶自定算法義框架。另外UDT自帶的控制算法也是基於這個框架。用戶定義控制算法可能只須要重定義幾個UDT的路由(成員函數)和參數便可。路由將在事件發生時被調用。例如,當ACK接收時,這個控制算法可能自增擁塞窗口大小。

7.1 CCC Interface CCC接口

UDT allow users to access two congestion control parameters: the congestion window size and the inter-packet sending interval. Users may adjust these two parameters to realize window-based control,rate-based control, or a hybrid approach.

In addition, the following parameters should also be exposed.

   1) RTT
   2) Maximum Segment/Packet Size
   3) Estimated Bandwidth
   4) The latest packet sequence number that has been sent so far
   5) Packet arriving rate at the receiver side

UDT容許用戶訪問兩個擁塞控制中的參數:擁塞窗口大小和inter-packet發送間隔。用戶可能改變這兩個參數以達到控制擁塞窗口大小和發送速率,或混合處理。

另外,下面參數應該也一樣暴露。

    1)RTT。
    2)最大分片數據包大小。
    3)帶寬估值
    4)最新的發送過的序列號。
    5)接收數據包的到達速率。

A UDT implementation may expose additional parameters as well. This information can be used in user-defined congestion control algorithms to adjust the packet sending rate.

The following control events can be redefined via CCC (e.g., by a callback function).

   1) init: when the UDT socket is connected.
   2) close: when the UDT socket is closed.
   3) onACK: when ACK is received.
   4) onLOSS: when NACK is received.
   5) onTimeout: when timeout occurs.
   6) onPktSent: when a data packet is sent.
   7) onPktRecv: when a data packet is received.

一個UDT的實現可能須要添加其它的參數。這信息能在用戶自定義擁塞算法中校訂發送數據包速率。

在CCC中下面控制事件能重定義

   1)init,當UDT套接字鏈接上時。
   2)close,當UDT套接字關閉時。
   3)onACK,當接收到ACK時。
   4)onLOSS,當接收到NACK時。
   5)onTimeout,發生超時時。
   6)onPktSent,當數據發送後。
   7)onPktRecv,當數據接收後。

Users can also adjust the following parameters in the user-defined control algorithms.

1) ACK interval: An ACK may be sent every fixed number of packets. User may define this interval. If this value is -1, then it means no ACK will be sent based on packet interval.

2) ACK Timer: An ACK will also be sent every fixed time interval. This is mandatory in UDT. The maximum and default ACK time interval is SYN.

3) RTO: UDT uses 4 * RTT + RTTVar to compute RTO. Users may redefine this. Detailed description and discussion of UDT/CCC can be found in [GG05].

用戶還能夠根據下面參數調整擁塞控制算法。

1)ACK週期:每隔一個時間將可能發送一個ACK包。用戶可能定義這個週期。若是它的值爲-1,那麼意味着沒有ACK將發送基於分組間隔。

2)ACK定時器:發送一個ACK也是在固守時間間隔發送。這是UDT強制的,最大的和默認的時間間隔是SYN和ACK。

3)RTO:UDT使用4 * RTT * RTTVar計算RTO。用戶可能自定義它。詳細說明和討論在[GG05]。

7.2 UDT's Native Control Algorithm UDT 默認控制算法

UDT has a native and default control algorithm, which will be used if no user-defined algorithm is implemented and configured. The native UDT algorithm should be implemented using CCC.

UDT's native algorithm is a hybrid congestion control algorithm, hence it adjusts both the congestion window size and the inter-packet interval. The native algorithm uses timer-based ACK and the ACK interval is SYN.

The initial congestion window size is 16 packets and the initial inter-packet interval is 0. The algorithm start with Slow Start phase until the first ACK or NAK arrives.

UDT 有一個默認的控制算法,該算法若是沒有用戶自定義算法實現和配置,UDT的CCC將使用它作爲默認算法。

UDT 的默認算法是一種混合擁塞控制算法,由於它即調整擁塞窗口的大小,也控制發包時間間隔。默認算法使用ACK定時器和ACK週期是SYN。

初始化時擁塞窗口大小是16個數據包,初始化時時間間隔是0,直到第一個ACK或NAK到達算法便開始運行慢啓動階段。

On ACK packet received:

1) If the current status is in the slow start phase, set the congestion window size to the product of packet arrival rate and (RTT + SYN). Slow Start ends. Stop.

2) Set the congestion window size (CWND) to: CWND = A * (RTT + SYN) + 16.

3) The number of sent packets to be increased in the next SYN period (inc) is calculated as: if (B <= C) inc = 1/PS; else inc = max(10^(ceil(log10((B-C)*PS*8))) * Beta/PS, 1/PS); where B is the estimated link capacity and C is the current sending speed. All are counted as packets per second. PS is the fixed size of UDT packet counted in bytes. Beta is a constant value of 0.0000015.

4) The SND period is updated as: SND = (SND * SYN) / (SND * inc + SYN). 當接收到一個ACK包:

1)若是當前狀態是慢啓動階段,設置擁塞窗口大小爲包到達速度和(RTT + SYN)。慢啓動完成則中止。

2)設置擁塞窗口大小(CWND)爲:CWND = A * (RTT + SYN) + 16。

3)發送數據包數量是遞增,下一個SYN時間週期計算方法以下: If (B <= C) Inc = 1 / PS; else inc = max(10^(ceil(log10((B-C)*PS*8))) * Beta / PS,1 / PS); 其中B是鏈路帶寬估值,C是當前發送速率。都計算爲每秒數據包個數。PS是固定的UDT包的大小是以字節爲單位。Bate是一個常數0.000015。 4)SND週期更新計算方法: SND = (SND * SYN) / (SND * inc + SYN)。 These four parameters are used in rate decrease, and their initial values are in the parentheses: AvgNAKNum (1), NAKCount (1), DecCount(1), LastDecSeq (initial sequence number - 1). We define a congestion period as the period between two NAKs in which the first biggest lost packet sequence number is greater than the LastDecSeq, which is the biggest sequence number when last time the packet sending rate is decreased. AvgNAKNum is the average number of NAKs in a congestion period. NAKCount is the current number of NAKs in the current period. 這四個參數用於下降速率,他們的初始值是括號中的值: AvgNAKNum (1) NAKCount (1) DecCount(1),LastDecSeq (初始序列號爲-1)。 咱們定義的擁塞週期爲兩個NAKs的第一個最大的丟失的數據包序列號,是大於該的 LastDecSeq 數據包發送率下跌的最大序列號時最後時間之間的時間段。AvgNAKNum 擁塞週期是 NAKs 的平均數。NAKCount 是目前週期的 NAKs 當前週期。 On NAK packet received: 1) If it is in slow start phase, set inter-packet interval to 1/recvrate. Slow start ends. Stop. 2) If this NAK starts a new congestion period, increase inter-packet interval (snd) to snd = snd * 1.125; Update AvgNAKNum, reset NAKCount to 1, and compute DecRandom to a random (average distribution) number between 1 and AvgNAKNum. Update LastDecSeq. Stop. 3) If DecCount <= 5, and NAKCount == DecCount * DecRandom: a. Update SND period: SND = SND * 1.125; b. Increase DecCount by 1; c. Record the current largest sent sequence number (LastDecSeq). The native UDT control algorithm is designed for bulk data transfer over high BDP networks. [GHG04a] 接收到NAK包: 1)若是它處於慢啓動階段,設置inter-packet週期爲1 / recvrate。慢啓動結束則中止。 2)若是與這個 NAK 開始一個新的擁塞週期。增長 inter-packet 週期 (snd) = snd * 1.125 ; 更新 AvgNAKNum,重置 NAKCount 爲 1,並計算 DecRandom爲 1 和 AvgNAKNum 之間的隨機 (平均分佈) 數。更新 LastDecSeq。中止。 3)若是DecCount <= 5,和 NAKCount == DecCount * DecRandom: a.更新SND週期:SND = SND * 1.125; b.遞增DecCount; c.記錄當前最大發送的序列號(LastDecSeq) 默認UDT控制算法是專爲容量數據大高速BDP網絡傳輸設計[GHG04a]。

Security Considerations

UDT's security mechanism is similar to that of TCP. Most of TCP's approach to counter security attack should also be implemented in UDT. IANA Considerations This document has no actions for IANA.

安全考慮

UDT 的安全機制相似 TCP,TCP 大多數針對安全攻擊的方案也能在 UDT 中實施。

Normative References 引用標準

[RFC768] J. Postel, User Datagram Protocol, Aug. 1980.

Informative References

[RFC4987] W. Eddy, TCP SYN Flooding Attacks and Common Mitigations.

[GG07] Yunhong Gu and Robert L. Grossman, UDT: UDP-based Data Transfer for High-Speed Wide Area Networks, Computer Networks (Elsevier). Volume 51, Issue 7. May 2007.

[GG05] Yunhong Gu and Robert L. Grossman, Supporting Configurable Congestion Control in Data Transport Services, SC 2005, Nov 12 - 18, Seattle, WA, USA.

[GHG04b] Yunhong Gu, Xinwei Hong, and Robert L. Grossman, Experiences in Design and Implementation of a High Performance Transport Protocol, SC 2004, Nov 6 - 12, Pittsburgh, PA, USA.

[GHG04a] Yunhong Gu, Xinwei Hong, and Robert L. Grossman, An Analysis of AIMD Algorithms with Decreasing Increases, First Workshop on Networks for Grid Applications (Gridnets 2004), Oct. 29, San Jose, CA, USA.

[LM97] T. V. Lakshman and U. Madhow, The Performance of TCP/IP for Networks with High Bandwidth-Delay Products and Random Loss, IEEE/ACM Trans. on Networking, vol. 5 no 3, July 1997, pp. 336- 350.

[RFC5681] Allman, M., Paxson, V. and E. Blanton, TCP Congestion Control, September 2009.

[RFC4960] R. Stewart, Ed. Stream Control Transmission Protocol. September 2007.

[TS06] K. Tan, Jingmin Song, Qian Zhang, Murari Sridharan, A Compound TCP Approach for High-speed and Long Distance Networks, in IEEE Infocom, April 2006, Barcelona, Spain.

[UDT] UDT: UDP-based Data Transfer, URL http://udt.sf.net.

[XHR04] Lisong Xu, Khaled Harfoush, and Injong Rhee, Binary Increase Congestion Control for Fast Long-Distance Networks, INFOCOM 2004.

Author's Addresses 做者地址

   Yunhong Gu
   National Center for Data Mining
   University of Illinois at Chicago
   713 SEO, M/C 249, 851 S Morgan St
   Chicago, IL 60607, USA
   Phone: +1 (312) 413-9576
   Email: yunhong@lac.uic.edu

譯者注: 水平有限, 譯錯之處在所不免, 歡迎指出.

相關文章
相關標籤/搜索