Linux流量控制(TC)之表面

1.1 流量控制是什麼

​ 流量控制是路由器上報文的接收和發送機制及排隊系統的統稱。這包括在一個輸入接口上決定以何種速率接收何種報文,在一個輸出接口上以何種速率、何種順序輸出何種報文。php

​ 傳統的流量控制涉及到整流(sharping),調度(scheduling), 分類(classifying),監管(policing),dropping(丟棄), 標記(marking)等工做。html

  • 整流。整流器經過延遲數據包來使流量保持在必定速率。整流就是讓包在輸出隊列上被髮送以前進行延時,而後必定的速率發送,使網絡流量保持在必定的速率之下,這是大部分用戶進行流量控制的目的。
  • 調度。調度就是對隊列中的輸入輸出報文進行排列。最常的調度方法就是FIFO(先進先出),更普遍的來講,在輸出隊列上的任何流量控制均可以被稱做調度,由於報文被排列以被輸出。
  • 分類。分類就是將流量進行劃分以便區別處理,例如拆分後放到不一樣的輸出隊列中。在報文的接收、路由、發送過程當中,網絡設備能夠用多種方式來分類報文。分類包括對報文進行標記,標記能夠在邊際網絡中由一個單一的控制單元來完成,也能夠在每一跳中都進行標記。
  • 監管。監管做爲流量控制的一部分,就是用於限制流量。監管經常使用於網絡邊際設備,使某個節點不能使用多於分配給它的帶寬。監管器以特定的速率接收數據包,當流量超過這一速率時就對接收的數據包執行相應的動做。最嚴格的動做就是丟棄數據包,儘管該數據包能夠被從新分類。
  • 丟棄。丟棄就是經過某種機制來選擇哪一個數據包被丟掉。如RED。
  • 標記。標記流量控制在數據包中插入了DSCP部分,在一個可管理網絡中,其可被其它路由器利用和識別(一般用於DiffServ,差分服務)。

1.2 爲何須要流量控制

​ 分組交換網絡和電路交換網絡的一個重要不一樣之處是:分組交換網絡是無狀態的,而電路交換網絡(好比電話網)必須保持其狀態。分組交換網絡和IP網絡同樣被設計成無狀態的,實際上,無狀態是IP的一個根本優點。node

​ 無狀態的缺陷是不能對不一樣類型數據流進行區分。但經過流量控制,管理員就可以基於報文的屬性對其進行排隊和區別。它甚至可以被用於模擬電路交換網絡,將無狀態網絡模擬成有狀態網絡。linux

​ 有不少實際的理由去考慮使用流量控制,而且流量控制也有不少有意義的應用場景。下面是一些利用流量控制能夠解決或改善的問題的例子,下面的列表不是流量控制能夠解決的問題的完整列表,此處僅僅介紹了一些能經過流量控制來解決的幾類問題git

經常使用的流量控制解決方案github

  • 經過TBF和帶子分類的HTB將帶寬限制在一個數值之下
  • 經過HTB分類(HTB class)和分類(classifying)並配合filter,來限制指定用戶、服務或客戶端的帶寬。
  • 經過提高ACK報文的優先級,以及使用wondershaper來最大化非對稱線路上的TCP吞吐量。
  • 經過帶子分類的HTB和分類(classifying)爲某個應用或用戶保留帶寬。
  • 經過HTB分類(HTB class)中的(優先級)PRIO機制來提升延時敏感型應用的性能。
  • 經過HTB的租借機制來管理多餘的帶寬。
  • 經過HTB的租借機制來實現全部帶寬的公平分配。
  • 經過監管器(policer)加上帶丟棄動做的過濾器(filter)來使某種類型的流量被丟棄。

1.3 如何進行流量控制

1.3.1 流量控制通常組成

一個流量控制系統,根據須要實現的功能,大體包含一下幾個組件:api

  • 調度器
  • 分類器(可選)
  • 監管器
  • 過濾器

其中,分類器不是必須的,如一些無類流量控制系統。下表是Linux中的對應實現的組件概念。網絡

traditional element Linux component
shaping The class offers shaping capabilities.
scheduling A qdisc is a scheduler. Schedulers can be simple such as the FIFO or complex, containing classes and other qdiscs, such as HTB.
classifying The filter object performs the classification through the agency of a classifier object. Strictly speaking, Linux classifiers cannot exist outside of a filter.
policing A policer exists in the Linux traffic control implementation only as part of a filter.
dropping To drop traffic requires a filter with a policer which uses "drop" as an action.
marking The dsmark qdisc is used for marking.

1.3.2 Linux TC

Linux TC包含了強大的流控各方面的功能。在使用以前,先簡單瞭解一下其中的邏輯。app

Linux TC流量控制的相關名詞解釋:less

  • Queueing Discipline (qdisc)

    An algorithm that manages the queue of a device, either incoming (ingress) or outgoing (egress).

  • root qdisc

    The root qdisc is the qdisc attached to the device.

  • Classless qdisc

    A qdisc with no configurable internal subdivisions.

  • Classful qdisc

    A classful qdisc contains multiple classes. Some of these classes contains a further qdisc, which may again be classful, but need not be. According to the strict definition, pfifo_fast is classful, because it contains three bands which are, in fact, classes. However, from the user's configuration perspective, it is classless as the classes can't be touched with the tc tool.

  • Classes

    A classful qdisc may have many classes, each of which is internal to the qdisc. A class, in turn, may have several classes added to it. So a class can have a qdisc as parent or an other class. A leaf class is a class with no child classes. This class has 1 qdisc attached to it. This qdisc is responsible to send the data from that class. When you create a class, a fifo qdisc is attached to it. When you add a child class, this qdisc is removed. For a leaf class, this fifo qdisc can be replaced with an other more suitable qdisc. You can even replace this fifo qdisc with a classful qdisc so you can add extra classes.

  • Classifier

    Each classful qdisc needs to determine to which class it needs to send a packet. This is done using the classifier.

  • Filter

    Classification can be performed using filters. A filter contains a number of conditions which if matched, make the filter match.

  • Scheduling

    A qdisc may, with the help of a classifier, decide that some packets need to go out earlier than others. This process is called Scheduling, and is performed for example by the pfifo_fast qdisc mentioned earlier. Scheduling is also called 'reordering', but this is confusing.

  • Shaping

    The process of delaying packets before they go out to make traffic confirm to a configured maximum rate. Shaping is performed on egress. Colloquially, dropping packets to slow traffic down is also often called Shaping.

  • Policing

    Delaying or dropping packets in order to make traffic stay below a configured bandwidth. In Linux, policing can only drop a packet and not delay it - there is no 'ingress queue'.

  • Work-Conserving

    A work-conserving qdisc always delivers a packet if one is available. In other words, it never delays a packet if the network adaptor is ready to send one (in the case of an egress qdisc).

  • non-Work-Conserving

    Some queues, like for example the Token Bucket Filter, may need to hold on to a packet for a certain time in order to limit the bandwidth. This means that they sometimes refuse to pass a packet, even though they have one available.

1.3.2 Linux TC詳解

首先須要注意的是:Linux tc只對egress方向實現了良好的控制,而對ingress方向控制有限,簡而言之,控發不控收。

下面看實現中的幾個重要概念:

  • 隊列。隊列是流控的基礎概念。經過使用隊列和其餘機制,能夠進行整流,調度等工做。

  • 令牌桶。這是個很是重要的因素。爲了控制出隊的速率,一種方式就是直接統計隊列中出隊的報文或字節數,可是爲了保證精確性就須要複雜的計算。在流量控制中普遍應用的另外一種方式就是令牌桶,令牌桶以必定的速率產生令牌,報文或字節出隊時從令牌桶中取令牌,只有取到令牌後才能出隊。

    咱們能夠打一個比方,一羣人正排隊等待乘坐遊樂場的遊覽車。讓咱們想象如今有一條固定的道路,遊覽車以固定的速度抵達,每一個人都必須等待遊覽車到達後才能乘坐。遊覽車和遊客就能夠類比爲令牌和報文,這種機制就是速率限制或流量整形,在一個固定的時間段內只有一部分人能乘坐遊覽車。

    繼續上面的比方,設想有大量的遊覽車正停在車站等待遊客乘坐,但如今沒有一個遊客。若是如今有一大羣遊客同時過來了,那麼他們均可以立刻乘上游覽車。在這裏,咱們就能夠將車站類比爲桶,一個桶中包含必定數量的令牌,桶中的令牌能夠一次性被使用完而無論數據包到達的時間。

    讓咱們來完成這個比方,遊覽車以固定的速率抵達車站,若是沒人乘坐就會停滿車站,即令牌以必定的速率進入桶中,若是令牌一直沒被使用那麼桶就能夠被裝滿,而若是令牌不斷的被使用那麼桶就不會滿。令牌桶是處理會產生流量突發應用(好比HTTP)的關鍵思想。

    使用令牌桶過濾器的排隊規則(TBF qdisc,Token Bucket Filter)是流量整形的一個經典例子(在TBF小節中有一個圖表,經過該圖表能夠形象化的幫助讀者理解令牌桶)。TBF以給定的速度產生令牌,當桶中有令牌時才發送數據,令牌是整流的基本思想。

Linux tc中主要的組件是qdisc, class, filter。

  • qdisc包含classful qdisc和classless disc。二者的區別是glassful qdisc能夠包含多個分類,能夠更加精細的控制流量。

    • 常見的classless qdisc有:choke, codel, p/bfifo,fq, fq_codel, gred, hhf, ingress,mqprio, multiq, netem, pfifo_fast, pie, red, rr, sfb, sfq, tbf。linux默認使用的就是fifo_fast。

    • 常見的classful qdisc有:ATM, CBQ, DRR, DSMARK, HFSC, HTB, PRIO, QFQ

  • 分類只存在於可分類排隊規則(classful qdisc)(例如,HTB和CBQ)中。分類能夠很複雜,它能夠包含多個子分類,也能夠只包含一個子qdisc。在超級複雜的流量控制應用場景中,一個類中再包含一個可分類qdisc也是能夠的。

    任何一個分類均可以和任意多個filter相關聯,這樣就能夠選擇一個子分類或運用一個filter來從新排列或丟棄進入分類中的數據包。

    葉子分類是qdisc中的最後一個分類,它包含一個qdisc(默認是pfifo)而且不包含任意子分類。任何包含子分類的分類都是內部分類而不是子分類。

  • Linux的過濾器能夠容許用戶利用一個或多個過濾器將數據包分類至輸出隊列上。它包含了一個分類器實現,常見的分類器如u32,u32分類器能夠容許用戶基於數據包的屬性來選擇數據包。

不管是qdisc,仍是class, 都須要有一個惟一標識符。就是所說的句柄。它們都採用major:minor格式來命名,注意他們都是以十六進制解析。對於他們的使用,在栗子中會作具體說明。

接下來咱們主要介紹一下classful qdisc的狀況。看一下數據包的流程。

  • flow within classful qdisc & class

    When traffic enters a classful qdisc, it needs to be sent to any of the classes within - it needs to be 'classified'. To determine what to do with a packet, the so called 'filters' are consulted. It is important to know that the filters are called from within a qdisc, and not the other way around!

    The filters attached to that qdisc then return with a decision, and the qdisc uses this to enqueue the packet into one of the classes. Each subclass may try other filters to see if further instructions apply. If not, the class enqueues the packet to the qdisc it contains.

    Besides containing other qdiscs, most classful qdiscs also perform shaping. This is useful to perform both packet scheduling (with SFQ, for example) and rate control. You need this in cases where you have a high speed interface (for example, ethernet) to a slower device (a cable modem).

  • How filters are used to classify traffic

    Recapping, a typical hierarchy might look like this:

1:   root qdisc
                      |
                     1:1    child class
                   /  |  \
                  /   |   \
                 /    |    \
                 /    |    \
              1:10  1:11  1:12   child classes
               |      |     | 
               |     11:    |    leaf class
               |            | 
               10:         12:   qdisc
              /   \       /   \
           10:1  10:2   12:1  12:2   leaf classes

​ But don't let this tree fool you! You should not imagine the kernel to be at the apex of the tree and the network below, that is just not the case. Packets get enqueued and dequeued at the root qdisc, which is the only thing the kernel talks to.

​ A packet might get classified in a chain like this: 1: -> 1:1 -> 1:12 -> 12: -> 12:2

​ The packet now resides in a queue in a qdisc attached to class 12:2. In this example, a filter was attached to each 'node' in the tree, each choosing a branch to take next. This can make sense. However, this is also possible: 1: -> 12:2

​ In this case, a filter attached to the root decided to send the packet directly to 12:2.

  • How packets are dequeued to the hardware

    When the kernel decides that it needs to extract packets to send to the interface, the root qdisc 1: gets a dequeue request, which is passed to 1:1, which is in turn passed to 10:, 11: and 12:, each of which queries its siblings, and tries to dequeue() from them. In this case, the kernel needs to walk the entire tree, because only 12:2 contains a packet.

    In short, nested classes ONLY talk to their parent qdiscs, never to an interface. Only the root qdisc gets dequeued by the kernel!

    The upshot of this is that classes never get dequeued faster than their parents allow. And this is exactly what we want: this way we can have SFQ in an inner class, which doesn't do any shaping, only scheduling, and have a shaping outer qdisc, which does the shaping.

1.3.3 HTB的配置使用

HTB是一種classful qdisc,是一種分層分類流控方法,是Linux經常使用的一種流控配置。接下來就來看一下使用配置:

配置HTB須要四個步驟:

  • 建立root qdisc
  • 建立class
  • 建立filter,關聯到class
  • 添加leaf class disc(非必需)
#tc qdisc add dev eth0 root handle 1: htb default 30 //添加root qdisc, 1:是 1:0的簡寫
#tc class add dev eth0 parent 1: classid 1:1 htb rate 6mbit burst 15k //以根1:爲根,建立class
#tc class add dev eth0 parent 1:1 classid 1:10 htb rate 5mbit burst 15k 
#tc class add dev eth0 parent 1:1 classid 1:20 htb rate 3mbit ceil 6mbit burst 15k 
#tc class add dev eth0 parent 1:1 classid 1:30 htb rate 1kbit ceil 6mbit burst 15k 
#tc qdisc add dev eth0 parent 1:10 handle 10: sfq perturb 10 //爲leaf class添加qdisc,默認爲pfifo
#tc qdisc add dev eth0 parent 1:20 handle 20: sfq perturb 10 
#tc qdisc add dev eth0 parent 1:30 handle 30: sfq perturb 10 
# 添加過濾器 , 直接把流量導向相應的類 : 
#U32="tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32"
#$U32 match ip dport 80 0xffff flowid 1:10 //關聯filter到class
#$U32 match ip sport 25 0xffff flowid 1:20

其中建立class時,其中的參數意義以下:

default

這是HTB排隊規則的一個可選參數,默認值爲0, 當值爲0時意味着會繞過全部和rootqdisc相關聯的分類,而後以最大的速度出隊任何未分類的流量。

rate

這個參數用來設置流量發送的最小指望速率。這個速率能夠被看成承諾信息速率(CIR), 或者給某個葉子分類的保證帶寬。

ceil

這個參數用來設置流量發送的最大指望速率。租借機制將會決定這個參數的實際用處。 這個速率能夠被稱做「突發速率」。

burst

這個參數是rate桶的大小(參見令牌桶這一節)。HTB將會在更多令牌到達以前將burst個字節的數據包出隊。

cburst

這個參數是ceil桶的大小(參見令牌桶這一節)。HTB將會更多令牌(ctoken)到達以前將cburst個字節的數據包出隊。

quantum

這個是HTB控制租借機制的關鍵參數。正常狀況下,HTB本身會計算合適的quantum值,而不是由用戶來設定。對這個值的輕微調整都會對租借和整形形成巨大的影響,由於HTB不只會根據這個值向各個子分類分發流量(速率應高於rate,小於ceil),還會根據此值輸出各個子分類中的數據。

r2q

一般,quantum 的值由HTB本身計算,用戶能夠經過此參數設置一個值來幫助HTB爲某個分類計算一個最優的quantum值。

mtu

prio

1.3.4 入向流控

入向的流控常見作法是經過把接口的流量重定向到ifb設備,而後在ifb的egress上作流控,間接達到控制入向的目的。簡單的使用示例以下:

#modprobe ifb    //須要加載ifb模塊

#ip link set dev ifb0 up txqueuelen 1000

#tc qdisc add dev eth1 ingress  //添加ingress qdisc

#tc filter add dev eth1 parent ffff: protocol ip u32 match u32 0 0flowid 1:1 action mirred egress redirect dev ifb0   //重定向流量到ifb

#tc qdisc add dev ifb0 root netem delay 50ms loss 1%  //在ifb上配置操做,這裏使用了netem,也能夠和出向同樣,配置qdisc, class, filter。

1.3.5 統計查看

  • 使用tc qdisc show dev xx 查看qdisc
  • 使用tc class show dev xx 查看class
  • 使用tc filter show dev xx 查看filter,注意這裏都是查看默認爲root,即出向的規則,若是要查看入向的,須要使用tc filter show dev xx ingress 。
The tc tool allows you to gather statistics of queuing disciplines in Linux. Unfortunately statistic results are not explained by authors so that you often can't use them. Here I try to help you to understand HTB's stats.
First whole HTB stats. The snippet bellow is taken during simulation from chapter 3.

# tc -s -d qdisc show dev eth0
 qdisc pfifo 22: limit 5p
 Sent 0 bytes 0 pkts (dropped 0, overlimits 0) 

 qdisc pfifo 21: limit 5p
 Sent 2891500 bytes 5783 pkts (dropped 820, overlimits 0) 

 qdisc pfifo 20: limit 5p
 Sent 1760000 bytes 3520 pkts (dropped 3320, overlimits 0) 

 qdisc htb 1: r2q 10 default 1 direct_packets_stat 0
 Sent 4651500 bytes 9303 pkts (dropped 4140, overlimits 34251) 

First three disciplines are HTB's children. Let's ignore them as PFIFO stats are self explanatory.
overlimits tells you how many times the discipline delayed a packet. direct_packets_stat tells you how many packets was sent thru direct queue. Other stats are sefl explanatory. Let's look at class' stats:

tc -s -d class show dev eth0
class htb 1:1 root prio 0 rate 800Kbit ceil 800Kbit burst 2Kb/8 mpu 0b 
    cburst 2Kb/8 mpu 0b quantum 10240 level 3 
 Sent 5914000 bytes 11828 pkts (dropped 0, overlimits 0) 
 rate 70196bps 141pps 
 lended: 6872 borrowed: 0 giants: 0

class htb 1:2 parent 1:1 prio 0 rate 320Kbit ceil 4000Kbit burst 2Kb/8 mpu 0b 
    cburst 2Kb/8 mpu 0b quantum 4096 level 2 
 Sent 5914000 bytes 11828 pkts (dropped 0, overlimits 0) 
 rate 70196bps 141pps 
 lended: 1017 borrowed: 6872 giants: 0

class htb 1:10 parent 1:2 leaf 20: prio 1 rate 224Kbit ceil 800Kbit burst 2Kb/8 mpu 0b 
    cburst 2Kb/8 mpu 0b quantum 2867 level 0 
 Sent 2269000 bytes 4538 pkts (dropped 4400, overlimits 36358) 
 rate 14635bps 29pps 
 lended: 2939 borrowed: 1599 giants: 0

I deleted 1:11 and 1:12 class to make output shorter. As you see there are parameters we set. Also there are level and DRR quantum informations.
overlimits shows how many times class was asked to send packet but he can't due to rate/ceil constraints (currently counted for leaves only).
rate, pps tells you actual (10 sec averaged) rate going thru class. It is the same rate as used by gating.
lended is # of packets donated by this class (from its rate) and borrowed are packets for whose we borrowed from parent. Lends are always computed class-local while borrows are transitive (when 1:10 borrows from 1:2 which in turn borrows from 1:1 both 1:10 and 1:2 borrow counters are incremented).
giants is number of packets larger than mtu set in tc command. HTB will work with these but rates will not be accurate at all. Add mtu to your tc (defaults to 1600 bytes).

1.3.6 雜項說明

  • 查看統計信息時,看不到統計速度rate等?內核爲了性能,默認關閉了顯示,能夠經過echo 1 > /sys/module/sch_htb/parameters/htb_rate_est來打開。

1.3.7 參考文檔

相關文章
相關標籤/搜索