從veth看虛擬網絡設備的qdisc

背景

前段時間在測試docker的網絡性能的時候,發現了一個veth的性能問題,後來給docker官方提交了一個PR,參考set tx_queuelen to 0 when create veth device,引發了一些討論。再後來,RedHat的網絡專家Jesper Brouer 出來詳細的討論了一下這個問題。git

veth qdisc

能夠看到,veth設備qdisc隊列,而環回設備/橋接設備是沒qdisc隊列的,參考br_dev_setup函數。github

內核實現

在註冊(建立)設備時,qdisc設置爲noop_qdisc, register_netdevice -> dev_init_schedulerdocker

void dev_init_scheduler(struct net_device *dev)
{
    dev->qdisc = &noop_qdisc;
    netdev_for_each_tx_queue(dev, dev_init_scheduler_queue, &noop_qdisc);
    dev_init_scheduler_queue(dev, &dev->rx_queue, &noop_qdisc);

    setup_timer(&dev->watchdog_timer, dev_watchdog, (unsigned long)dev);
}

打開設備時,若是沒有配置qdisc時,就指定爲默認的pfifo_fast隊列: dev_open -> dev_activate,bash

void dev_activate(struct net_device *dev)
{
    int need_watchdog;

    /* No queueing discipline is attached to device;
       create default one i.e. pfifo_fast for devices,
       which need queueing and noqueue_qdisc for
       virtual interfaces
     */

    if (dev->qdisc == &noop_qdisc)
        attach_default_qdiscs(dev);
...
}

static void attach_default_qdiscs(struct net_device *dev)
{
    struct netdev_queue *txq;
    struct Qdisc *qdisc;

    txq = netdev_get_tx_queue(dev, 0);

    if (!netif_is_multiqueue(dev) || dev->tx_queue_len == 0) {
        netdev_for_each_tx_queue(dev, attach_one_default_qdisc, NULL);
        dev->qdisc = txq->qdisc_sleeping;
        atomic_inc(&dev->qdisc->refcnt);
    } else {///multi queue
        qdisc = qdisc_create_dflt(dev, txq, &mq_qdisc_ops, TC_H_ROOT);
        if (qdisc) {
            qdisc->ops->attach(qdisc);
            dev->qdisc = qdisc;
        }
    }
}

static void attach_one_default_qdisc(struct net_device *dev,
                     struct netdev_queue *dev_queue,
                     void *_unused)
{
    struct Qdisc *qdisc;

    if (dev->tx_queue_len) {
        qdisc = qdisc_create_dflt(dev, dev_queue,
                      &pfifo_fast_ops, TC_H_ROOT);
        if (!qdisc) {
            printk(KERN_INFO "%s: activation failed\n", dev->name);
            return;
        }

        /* Can by-pass the queue discipline for default qdisc */
        qdisc->flags |= TCQ_F_CAN_BYPASS;
    } else {
        qdisc =  &noqueue_qdisc;
    }
    dev_queue->qdisc_sleeping = qdisc;
}

建立noqueue

開始嘗試直接刪除設備默認的pfifo_fast隊列,發現會出錯:網絡

# tc qdisc del dev vethd4ea root
RTNETLINK answers: No such file or directory
# tc  -s qdisc ls dev vethd4ea
qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 29705382 bytes 441562 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0

後來看到Jesper Brouer給出一個替換默認隊列的方式,嘗試了一下,成功完成。函數

替換默認的qdisc隊列oop

# tc qdisc replace dev vethd4ea root pfifo limit 100
# tc  -s qdisc ls dev vethd4ea                      
qdisc pfifo 8001: root refcnt 2 limit 100p
 Sent 264 bytes 4 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0 
# ip link show vethd4ea
9: vethd4ea: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc pfifo master docker0 state UP mode DEFAULT qlen 1000
link/ether 3a:15:3b:e1:d7:6d brd ff:ff:ff:ff:ff:ff

修改隊列長度性能

# ifconfig vethd4ea txqueuelen 0

刪除qdisc測試

# tc qdisc del dev vethd4ea root                    
# ip link show vethd4ea                
9: vethd4ea: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP mode DEFAULT 
link/ether 3a:15:3b:e1:d7:6d brd ff:ff:ff:ff:ff:ff

能夠看到,UP的veth設備成功修改爲noqueue。atom

小結

總之,給虛擬網絡設備建立默認的qdisc,是不太合理的。這會讓虛擬機(或者容器)的網絡瓶頸過早的出如今qdisc,而不是真實的物理設備(除非應用須要建立qdisc)。更多詳細參考這裏

本文轉自https://hustcat.github.io/veth/

相關文章
相關標籤/搜索