流量控制是路由器上報文的接收和發送機制及排隊系統的統稱。這包括在一個輸入接口上決定以何種速率接收何種報文,在一個輸出接口上以何種速率、何種順序輸出何種報文。php
傳統的流量控制涉及到整流(sharping),調度(scheduling), 分類(classifying),監管(policing),dropping(丟棄), 標記(marking)等工做。html
分組交換網絡和電路交換網絡的一個重要不一樣之處是:分組交換網絡是無狀態的,而電路交換網絡(好比電話網)必須保持其狀態。分組交換網絡和IP網絡同樣被設計成無狀態的,實際上,無狀態是IP的一個根本優點。node
無狀態的缺陷是不能對不一樣類型數據流進行區分。但經過流量控制,管理員就可以基於報文的屬性對其進行排隊和區別。它甚至可以被用於模擬電路交換網絡,將無狀態網絡模擬成有狀態網絡。linux
有不少實際的理由去考慮使用流量控制,而且流量控制也有不少有意義的應用場景。下面是一些利用流量控制能夠解決或改善的問題的例子,下面的列表不是流量控制能夠解決的問題的完整列表,此處僅僅介紹了一些能經過流量控制來解決的幾類問題git
經常使用的流量控制解決方案github
一個流量控制系統,根據須要實現的功能,大體包含一下幾個組件:api
其中,分類器不是必須的,如一些無類流量控制系統。下表是Linux中的對應實現的組件概念。網絡
traditional element | Linux component |
---|---|
shaping | The class offers shaping capabilities. |
scheduling | A qdisc is a scheduler. Schedulers can be simple such as the FIFO or complex, containing classes and other qdiscs, such as HTB. |
classifying | The filter object performs the classification through the agency of a classifier object. Strictly speaking, Linux classifiers cannot exist outside of a filter. |
policing | A policer exists in the Linux traffic control implementation only as part of a filter . |
dropping | To drop traffic requires a filter with a policer which uses "drop" as an action. |
marking | The dsmark qdisc is used for marking. |
Linux TC包含了強大的流控各方面的功能。在使用以前,先簡單瞭解一下其中的邏輯。app
Linux TC流量控制的相關名詞解釋:less
Queueing Discipline (qdisc)
An algorithm that manages the queue of a device, either incoming (ingress) or outgoing (egress).
root qdisc
The root qdisc is the qdisc attached to the device.
Classless qdisc
A qdisc with no configurable internal subdivisions.
Classful qdisc
A classful qdisc contains multiple classes. Some of these classes contains a further qdisc, which may again be classful, but need not be. According to the strict definition, pfifo_fast is classful, because it contains three bands which are, in fact, classes. However, from the user's configuration perspective, it is classless as the classes can't be touched with the tc tool.
Classes
A classful qdisc may have many classes, each of which is internal to the qdisc. A class, in turn, may have several classes added to it. So a class can have a qdisc as parent or an other class. A leaf class is a class with no child classes. This class has 1 qdisc attached to it. This qdisc is responsible to send the data from that class. When you create a class, a fifo qdisc is attached to it. When you add a child class, this qdisc is removed. For a leaf class, this fifo qdisc can be replaced with an other more suitable qdisc. You can even replace this fifo qdisc with a classful qdisc so you can add extra classes.
Classifier
Each classful qdisc needs to determine to which class it needs to send a packet. This is done using the classifier.
Filter
Classification can be performed using filters. A filter contains a number of conditions which if matched, make the filter match.
Scheduling
A qdisc may, with the help of a classifier, decide that some packets need to go out earlier than others. This process is called Scheduling, and is performed for example by the pfifo_fast qdisc mentioned earlier. Scheduling is also called 'reordering', but this is confusing.
Shaping
The process of delaying packets before they go out to make traffic confirm to a configured maximum rate. Shaping is performed on egress. Colloquially, dropping packets to slow traffic down is also often called Shaping.
Policing
Delaying or dropping packets in order to make traffic stay below a configured bandwidth. In Linux, policing can only drop a packet and not delay it - there is no 'ingress queue'.
Work-Conserving
A work-conserving qdisc always delivers a packet if one is available. In other words, it never delays a packet if the network adaptor is ready to send one (in the case of an egress qdisc).
non-Work-Conserving
Some queues, like for example the Token Bucket Filter, may need to hold on to a packet for a certain time in order to limit the bandwidth. This means that they sometimes refuse to pass a packet, even though they have one available.
首先須要注意的是:Linux tc只對egress方向實現了良好的控制,而對ingress方向控制有限,簡而言之,控發不控收。
下面看實現中的幾個重要概念:
隊列。隊列是流控的基礎概念。經過使用隊列和其餘機制,能夠進行整流,調度等工做。
令牌桶。這是個很是重要的因素。爲了控制出隊的速率,一種方式就是直接統計隊列中出隊的報文或字節數,可是爲了保證精確性就須要複雜的計算。在流量控制中普遍應用的另外一種方式就是令牌桶,令牌桶以必定的速率產生令牌,報文或字節出隊時從令牌桶中取令牌,只有取到令牌後才能出隊。
咱們能夠打一個比方,一羣人正排隊等待乘坐遊樂場的遊覽車。讓咱們想象如今有一條固定的道路,遊覽車以固定的速度抵達,每一個人都必須等待遊覽車到達後才能乘坐。遊覽車和遊客就能夠類比爲令牌和報文,這種機制就是速率限制或流量整形,在一個固定的時間段內只有一部分人能乘坐遊覽車。
繼續上面的比方,設想有大量的遊覽車正停在車站等待遊客乘坐,但如今沒有一個遊客。若是如今有一大羣遊客同時過來了,那麼他們均可以立刻乘上游覽車。在這裏,咱們就能夠將車站類比爲桶,一個桶中包含必定數量的令牌,桶中的令牌能夠一次性被使用完而無論數據包到達的時間。
讓咱們來完成這個比方,遊覽車以固定的速率抵達車站,若是沒人乘坐就會停滿車站,即令牌以必定的速率進入桶中,若是令牌一直沒被使用那麼桶就能夠被裝滿,而若是令牌不斷的被使用那麼桶就不會滿。令牌桶是處理會產生流量突發應用(好比HTTP)的關鍵思想。
使用令牌桶過濾器的排隊規則(TBF qdisc,Token Bucket Filter)是流量整形的一個經典例子(在TBF小節中有一個圖表,經過該圖表能夠形象化的幫助讀者理解令牌桶)。TBF以給定的速度產生令牌,當桶中有令牌時才發送數據,令牌是整流的基本思想。
Linux tc中主要的組件是qdisc, class, filter。
qdisc包含classful qdisc和classless disc。二者的區別是glassful qdisc能夠包含多個分類,能夠更加精細的控制流量。
常見的classless qdisc有:choke, codel, p/bfifo,fq, fq_codel, gred, hhf, ingress,mqprio, multiq, netem, pfifo_fast, pie, red, rr, sfb, sfq, tbf。linux默認使用的就是fifo_fast。
常見的classful qdisc有:ATM, CBQ, DRR, DSMARK, HFSC, HTB, PRIO, QFQ
分類只存在於可分類排隊規則(classful qdisc)(例如,HTB和CBQ)中。分類能夠很複雜,它能夠包含多個子分類,也能夠只包含一個子qdisc。在超級複雜的流量控制應用場景中,一個類中再包含一個可分類qdisc也是能夠的。
任何一個分類均可以和任意多個filter相關聯,這樣就能夠選擇一個子分類或運用一個filter來從新排列或丟棄進入分類中的數據包。
葉子分類是qdisc中的最後一個分類,它包含一個qdisc(默認是pfifo)而且不包含任意子分類。任何包含子分類的分類都是內部分類而不是子分類。
Linux的過濾器能夠容許用戶利用一個或多個過濾器將數據包分類至輸出隊列上。它包含了一個分類器實現,常見的分類器如u32,u32分類器能夠容許用戶基於數據包的屬性來選擇數據包。
不管是qdisc,仍是class, 都須要有一個惟一標識符。就是所說的句柄。它們都採用major:minor格式來命名,注意他們都是以十六進制解析。對於他們的使用,在栗子中會作具體說明。
接下來咱們主要介紹一下classful qdisc的狀況。看一下數據包的流程。
flow within classful qdisc & class
When traffic enters a classful qdisc, it needs to be sent to any of the classes within - it needs to be 'classified'. To determine what to do with a packet, the so called 'filters' are consulted. It is important to know that the filters are called from within a qdisc, and not the other way around!
The filters attached to that qdisc then return with a decision, and the qdisc uses this to enqueue the packet into one of the classes. Each subclass may try other filters to see if further instructions apply. If not, the class enqueues the packet to the qdisc it contains.
Besides containing other qdiscs, most classful qdiscs also perform shaping. This is useful to perform both packet scheduling (with SFQ, for example) and rate control. You need this in cases where you have a high speed interface (for example, ethernet) to a slower device (a cable modem).
How filters are used to classify traffic
Recapping, a typical hierarchy might look like this:
1: root qdisc | 1:1 child class / | \ / | \ / | \ / | \ 1:10 1:11 1:12 child classes | | | | 11: | leaf class | | 10: 12: qdisc / \ / \ 10:1 10:2 12:1 12:2 leaf classes
But don't let this tree fool you! You should not imagine the kernel to be at the apex of the tree and the network below, that is just not the case. Packets get enqueued and dequeued at the root qdisc, which is the only thing the kernel talks to.
A packet might get classified in a chain like this: 1: -> 1:1 -> 1:12 -> 12: -> 12:2
The packet now resides in a queue in a qdisc attached to class 12:2. In this example, a filter was attached to each 'node' in the tree, each choosing a branch to take next. This can make sense. However, this is also possible: 1: -> 12:2
In this case, a filter attached to the root decided to send the packet directly to 12:2.
How packets are dequeued to the hardware
When the kernel decides that it needs to extract packets to send to the interface, the root qdisc 1: gets a dequeue request, which is passed to 1:1, which is in turn passed to 10:, 11: and 12:, each of which queries its siblings, and tries to dequeue() from them. In this case, the kernel needs to walk the entire tree, because only 12:2 contains a packet.
In short, nested classes ONLY talk to their parent qdiscs, never to an interface. Only the root qdisc gets dequeued by the kernel!
The upshot of this is that classes never get dequeued faster than their parents allow. And this is exactly what we want: this way we can have SFQ in an inner class, which doesn't do any shaping, only scheduling, and have a shaping outer qdisc, which does the shaping.
HTB是一種classful qdisc,是一種分層分類流控方法,是Linux經常使用的一種流控配置。接下來就來看一下使用配置:
配置HTB須要四個步驟:
#tc qdisc add dev eth0 root handle 1: htb default 30 //添加root qdisc, 1:是 1:0的簡寫 #tc class add dev eth0 parent 1: classid 1:1 htb rate 6mbit burst 15k //以根1:爲根,建立class #tc class add dev eth0 parent 1:1 classid 1:10 htb rate 5mbit burst 15k #tc class add dev eth0 parent 1:1 classid 1:20 htb rate 3mbit ceil 6mbit burst 15k #tc class add dev eth0 parent 1:1 classid 1:30 htb rate 1kbit ceil 6mbit burst 15k #tc qdisc add dev eth0 parent 1:10 handle 10: sfq perturb 10 //爲leaf class添加qdisc,默認爲pfifo #tc qdisc add dev eth0 parent 1:20 handle 20: sfq perturb 10 #tc qdisc add dev eth0 parent 1:30 handle 30: sfq perturb 10 # 添加過濾器 , 直接把流量導向相應的類 : #U32="tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32" #$U32 match ip dport 80 0xffff flowid 1:10 //關聯filter到class #$U32 match ip sport 25 0xffff flowid 1:20
其中建立class時,其中的參數意義以下:
default 這是HTB排隊規則的一個可選參數,默認值爲0, 當值爲0時意味着會繞過全部和rootqdisc相關聯的分類,而後以最大的速度出隊任何未分類的流量。 rate 這個參數用來設置流量發送的最小指望速率。這個速率能夠被看成承諾信息速率(CIR), 或者給某個葉子分類的保證帶寬。 ceil 這個參數用來設置流量發送的最大指望速率。租借機制將會決定這個參數的實際用處。 這個速率能夠被稱做「突發速率」。 burst 這個參數是rate桶的大小(參見令牌桶這一節)。HTB將會在更多令牌到達以前將burst個字節的數據包出隊。 cburst 這個參數是ceil桶的大小(參見令牌桶這一節)。HTB將會更多令牌(ctoken)到達以前將cburst個字節的數據包出隊。 quantum 這個是HTB控制租借機制的關鍵參數。正常狀況下,HTB本身會計算合適的quantum值,而不是由用戶來設定。對這個值的輕微調整都會對租借和整形形成巨大的影響,由於HTB不只會根據這個值向各個子分類分發流量(速率應高於rate,小於ceil),還會根據此值輸出各個子分類中的數據。 r2q 一般,quantum 的值由HTB本身計算,用戶能夠經過此參數設置一個值來幫助HTB爲某個分類計算一個最優的quantum值。 mtu prio
入向的流控常見作法是經過把接口的流量重定向到ifb設備,而後在ifb的egress上作流控,間接達到控制入向的目的。簡單的使用示例以下:
#modprobe ifb //須要加載ifb模塊 #ip link set dev ifb0 up txqueuelen 1000 #tc qdisc add dev eth1 ingress //添加ingress qdisc #tc filter add dev eth1 parent ffff: protocol ip u32 match u32 0 0flowid 1:1 action mirred egress redirect dev ifb0 //重定向流量到ifb #tc qdisc add dev ifb0 root netem delay 50ms loss 1% //在ifb上配置操做,這裏使用了netem,也能夠和出向同樣,配置qdisc, class, filter。
The tc tool allows you to gather statistics of queuing disciplines in Linux. Unfortunately statistic results are not explained by authors so that you often can't use them. Here I try to help you to understand HTB's stats. First whole HTB stats. The snippet bellow is taken during simulation from chapter 3. # tc -s -d qdisc show dev eth0 qdisc pfifo 22: limit 5p Sent 0 bytes 0 pkts (dropped 0, overlimits 0) qdisc pfifo 21: limit 5p Sent 2891500 bytes 5783 pkts (dropped 820, overlimits 0) qdisc pfifo 20: limit 5p Sent 1760000 bytes 3520 pkts (dropped 3320, overlimits 0) qdisc htb 1: r2q 10 default 1 direct_packets_stat 0 Sent 4651500 bytes 9303 pkts (dropped 4140, overlimits 34251) First three disciplines are HTB's children. Let's ignore them as PFIFO stats are self explanatory. overlimits tells you how many times the discipline delayed a packet. direct_packets_stat tells you how many packets was sent thru direct queue. Other stats are sefl explanatory. Let's look at class' stats: tc -s -d class show dev eth0 class htb 1:1 root prio 0 rate 800Kbit ceil 800Kbit burst 2Kb/8 mpu 0b cburst 2Kb/8 mpu 0b quantum 10240 level 3 Sent 5914000 bytes 11828 pkts (dropped 0, overlimits 0) rate 70196bps 141pps lended: 6872 borrowed: 0 giants: 0 class htb 1:2 parent 1:1 prio 0 rate 320Kbit ceil 4000Kbit burst 2Kb/8 mpu 0b cburst 2Kb/8 mpu 0b quantum 4096 level 2 Sent 5914000 bytes 11828 pkts (dropped 0, overlimits 0) rate 70196bps 141pps lended: 1017 borrowed: 6872 giants: 0 class htb 1:10 parent 1:2 leaf 20: prio 1 rate 224Kbit ceil 800Kbit burst 2Kb/8 mpu 0b cburst 2Kb/8 mpu 0b quantum 2867 level 0 Sent 2269000 bytes 4538 pkts (dropped 4400, overlimits 36358) rate 14635bps 29pps lended: 2939 borrowed: 1599 giants: 0 I deleted 1:11 and 1:12 class to make output shorter. As you see there are parameters we set. Also there are level and DRR quantum informations. overlimits shows how many times class was asked to send packet but he can't due to rate/ceil constraints (currently counted for leaves only). rate, pps tells you actual (10 sec averaged) rate going thru class. It is the same rate as used by gating. lended is # of packets donated by this class (from its rate) and borrowed are packets for whose we borrowed from parent. Lends are always computed class-local while borrows are transitive (when 1:10 borrows from 1:2 which in turn borrows from 1:1 both 1:10 and 1:2 borrow counters are incremented). giants is number of packets larger than mtu set in tc command. HTB will work with these but rates will not be accurate at all. Add mtu to your tc (defaults to 1600 bytes).
echo 1 > /sys/module/sch_htb/parameters/htb_rate_est
來打開。