支持網絡協議棧的底層網卡驅動是一個怎麼也繞不過去的話題,以Intel PRO/100網卡驅動爲例,分析一下Linux下網卡驅動的實現。同時也兼談一些pci總線的問題。PCI總線的框架系統只提供對PCI總線系統的框架性管理,對於具體的PCI設備提供何種功能,不作任何的管理。css
PCI總線html
PCI總線規定了如下設計目標:node
設備標識api
每一個設備都經過一個16(8:5:3)位編號惟一標識,Linux中定義爲pci_dev的數據結構數組
地址空間安全
配置信息網絡
儘管該結構長度必須是256字節,但只有前64字節是標準化的。VendorID和DeviceID惟一地標誌了廠商和設備類型。這兩個ID合起來一般稱之爲設備的簽名。兩個具備類似名稱的附加字段:Subsystem Vendor ID和Subsystem Device ID,也能夠同時使用,以更精確的描述設備的通用接口。Rev ID用於區分不一樣的設備修訂級別。Class Code字段用於將設備分配到不一樣的功能組,該字段分爲兩部分。前8個比特位表示基類,而剩餘的16個比特位表示基類的一個子類。數據結構
當PCI設備上電時,硬件保持未激活狀態。換句話說,該設備只會對配置事務作出響應。上電時,設備上不會有內存和I/O端口映射到計算機的地址空間,其餘設備相關功能,例如中斷報告,也被禁止。幸運的是,每一個PCI主板均配備有可以處理PCI的固件,固件經過讀寫PCI控制器中的寄存器,提供了對設備配置地址空間的訪問。系統引導時,固件(或Linux內核)在每一個PCI外設上執行配置事務,以便爲它提供的每一個地址區域分配一個安全的位置。當驅動程序訪問設備的時候,它的內存和I/O區域已經被映射到了處理器的地址空間。驅動程序能夠修改這個默認配置,不過歷來不須要這樣作。app
內核中PCI設備的實現框架
系統爲PCI驅動程序提供的框架,能夠粗略的分爲兩個類別:
PCI總線:
1: struct pci_bus {
2: struct list_head node; /* node in list of buses */
3: struct pci_bus *parent; /* parent bus this bridge is on */
4: struct list_head children; /* list of child buses */
5: struct list_head devices; /* list of devices on this bus */
6: struct pci_dev *self; /* bridge device as seen by parent */
7: struct list_head slots; /* list of slots on this bus */
8: struct resource *resource[PCI_BRIDGE_RESOURCE_NUM];
9: struct list_head resources; /* address space routed to this bus */
10:
11: struct pci_ops *ops; /* configuration access functions */
12: void *sysdata; /* hook for sys-specific extension */
13: struct proc_dir_entry *procdir; /* directory entry in /proc/bus/pci */
14:
15: unsigned char number; /* bus number */
16: unsigned char primary; /* number of primary bridge */
17: unsigned char secondary; /* number of secondary bridge */
18: unsigned char subordinate; /* max number of subordinate buses */
19: unsigned char max_bus_speed; /* enum pci_bus_speed */
20: unsigned char cur_bus_speed; /* enum pci_bus_speed */
21:
22: char name[48];
23:
24: unsigned short bridge_ctl; /* manage NO_ISA/FBB/et al behaviors */
25: pci_bus_flags_t bus_flags; /* Inherited by child busses */
26: struct device *bridge;
27: struct device dev;
28: struct bin_attribute *legacy_io; /* legacy I/O for this bus */
29: struct bin_attribute *legacy_mem; /* legacy mem */
30: unsigned int is_added:1;
31: };
1: extern struct list_head pci_root_buses; /* list of all known PCI buses */
全部已知的PCI總線都經過pci_root_buses鏈接起來。
struct pci_bus結構分爲不一樣的功能部分。第一部分包括與其餘PCI數據結構創建關聯所需的全部成員。node是一個鏈表元素,用於將全部總線鏈接到全局鏈表中。parent是一個指針,指向更高層次總線的數據結構。每一個總線只可能有一個父總線。某個總線的下級總線或子總線都必須經過children做爲表頭的鏈表管理。全部總線上附接的設備頭經過devices爲表頭的鏈表管理。除總線0之外,全部系統總線均可以經過一個PCI橋接器尋址,橋接器相似一個普通的PCI設備。每一個總線的self指向橋接器的pci_dev實例。resource數組只是用於保存該總線在虛擬內存中佔用的地址區域。
1: struct resource {
2: resource_size_t start;
3: resource_size_t end;
4: const char *name;
5: unsigned long flags;
6: struct resource *parent, *sibling, *child;
7: };
ops成員,其中包含大量函數指針。這些是一組用於訪問配置空間的函數。sysdata成員使得總線結構能夠關聯到特定於硬件的函數。proc提供了一個到proc文件系統的接口,以便使用/proc/bus/pci向用戶空間導出有關各個總線的信息。number是一個連續號碼,在系統中惟一地標誌了該總線。subordinate是該特定總線能夠擁有的下級總線的最大數目。name字段包含該總線的一個文本名稱。
在PCI子系統初始化時,會創建全部系統總線的列表。這些總線以兩種不一樣的方式彼此鏈接。第一種方法使用一個線性鏈表,表頭是上文所述的pci_root_buses全局變量,包括系統中的全部總線。parent和children結構成員,方便了以樹的形式表示PCI總線的二維拓撲結構。
PCI設備
1: /*
2: * The pci_dev structure is used to describe PCI devices.
3: */
4: struct pci_dev {
5: struct list_head bus_list; /* node in per-bus list */
6: struct pci_bus *bus; /* bus this device is on */
7: struct pci_bus *subordinate; /* bus this device bridges to */
8:
9: void *sysdata; /* hook for sys-specific extension */
10: struct proc_dir_entry *procent; /* device entry in /proc/bus/pci */
11: struct pci_slot *slot; /* Physical slot this device is in */
12:
13: unsigned int devfn; /* encoded device & function index */
14: unsigned short vendor;
15: unsigned short device;
16: unsigned short subsystem_vendor;
17: unsigned short subsystem_device;
18: unsigned int class; /* 3 bytes: (base,sub,prog-if) */
19: u8 revision; /* PCI revision, low byte of class word */
20: u8 hdr_type; /* PCI header type (`multi' flag masked out) */
21: u8 pcie_cap; /* PCI-E capability offset */
22: u8 pcie_type:4; /* PCI-E device/port type */
23: u8 pcie_mpss:3; /* PCI-E Max Payload Size Supported */
24: u8 rom_base_reg; /* which config register controls the ROM */
25: u8 pin; /* which interrupt pin this device uses */
26:
27: struct pci_driver *driver; /* which driver has allocated this device */
28: u64 dma_mask; /* Mask of the bits of bus address this
29: device implements. Normally this is
30: 0xffffffff. You only need to change
31: this if your device has broken DMA
32: or supports 64-bit transfers. */
33:
34: struct device_dma_parameters dma_parms;
35:
36: pci_power_t current_state; /* Current operating state. In ACPI-speak,
37: this is D0-D3, D0 being fully functional,
38: and D3 being off. */
39: int pm_cap; /* PM capability offset in the
40: configuration space */
41: unsigned int pme_support:5; /* Bitmask of states from which PME#
42: can be generated */
43: unsigned int pme_interrupt:1;
44: unsigned int pme_poll:1; /* Poll device's PME status bit */
45: unsigned int d1_support:1; /* Low power state D1 is supported */
46: unsigned int d2_support:1; /* Low power state D2 is supported */
47: unsigned int no_d1d2:1; /* Only allow D0 and D3 */
48: unsigned int mmio_always_on:1; /* disallow turning off io/mem
49: decoding during bar sizing */
50: unsigned int wakeup_prepared:1;
51: unsigned int d3_delay; /* D3->D0 transition time in ms */
52:
53: #ifdef CONFIG_PCIEASPM
54: struct pcie_link_state *link_state; /* ASPM link state. */
55: #endif
56:
57: pci_channel_state_t error_state; /* current connectivity state */
58: struct device dev; /* Generic device interface */
59:
60: int cfg_size; /* Size of configuration space */
61:
62: /*
63: * Instead of touching interrupt line and base address registers
64: * directly, use the values stored here. They might be different!
65: */
66: unsigned int irq;
67: struct resource resource[DEVICE_COUNT_RESOURCE]; /* I/O and memory regions + expansion ROMs */
68: resource_size_t fw_addr[DEVICE_COUNT_RESOURCE]; /* FW-assigned addr */
69:
70: /* These fields are used by common fixups */
71: unsigned int transparent:1; /* Transparent PCI bridge */
72: unsigned int multifunction:1;/* Part of multi-function device */
73: /* keep track of device state */
74: unsigned int is_added:1;
75: unsigned int is_busmaster:1; /* device is busmaster */
76: unsigned int no_msi:1; /* device may not use msi */
77: unsigned int block_cfg_access:1; /* config space access is blocked */
78: unsigned int broken_parity_status:1; /* Device generates false positive parity */
79: unsigned int irq_reroute_variant:2; /* device needs IRQ rerouting variant */
80: unsigned int msi_enabled:1;
81: unsigned int msix_enabled:1;
82: unsigned int ari_enabled:1; /* ARI forwarding */
83: unsigned int is_managed:1;
84: unsigned int is_pcie:1; /* Obsolete. Will be removed.
85: Use pci_is_pcie() instead */
86: unsigned int needs_freset:1; /* Dev requires fundamental reset */
87: unsigned int state_saved:1;
88: unsigned int is_physfn:1;
89: unsigned int is_virtfn:1;
90: unsigned int reset_fn:1;
91: unsigned int is_hotplug_bridge:1;
92: unsigned int __aer_firmware_first_valid:1;
93: unsigned int __aer_firmware_first:1;
94: pci_dev_flags_t dev_flags;
95: atomic_t enable_cnt; /* pci_enable_device has been called */
96:
97: u32 saved_config_space[16]; /* config space saved at suspend time */
98: struct hlist_head saved_cap_space;
99: struct bin_attribute *rom_attr; /* attribute descriptor for sysfs ROM entry */
100: int rom_attr_enabled; /* has display of the rom attribute been enabled? */
101: struct bin_attribute *res_attr[DEVICE_COUNT_RESOURCE]; /* sysfs file for resources */
102: struct bin_attribute *res_attr_wc[DEVICE_COUNT_RESOURCE]; /* sysfs file for WC mapping of resources */
103: #ifdef CONFIG_PCI_MSI
104: struct list_head msi_list;
105: struct kset *msi_kset;
106: #endif
107: struct pci_vpd *vpd;
108: #ifdef CONFIG_PCI_ATS
109: union {
110: struct pci_sriov *sriov; /* SR-IOV capability related */
111: struct pci_dev *physfn; /* the PF this VF is associated with */
112: };
113: struct pci_ats *ats; /* Address Translation Service */
114: #endif
115: };
bus_list用於將設備放置到特定於總線的設備鏈表上。bus成員用於創建設備和總線之間的逆向關聯。它指向設備所在總線的pci_bus實例。另外一個到總線的關聯保存在subordinate成員中,僅當設備表示鏈接兩個PCI總線的PCI鏈接器時,該成員才包含有效值(不然爲NULL指針)。若是確實如此(橋接器),則subordinate指向「下級」PCI總線的數據結構。其餘數據結構的內容包括對PCI設備的配置空間內容的存儲,其中填充的是系統初始化時從硬件讀取的數據。driver指向用於控制該設備的驅動程序。每一個PCI驅動程序都經過該結構的一個實例惟一的標識。dev用於將PCI設備關聯到通用設備模型。irq指定了該設備的中斷數目,resource數組保存了驅動程序爲I/O內存分配的資源。
PCI驅動程序
1: struct pci_driver {
2: struct list_head node;
3: const char *name;
4: const struct pci_device_id *id_table; /* must be non-NULL for probe to be called */
5: int (*probe) (struct pci_dev *dev, const struct pci_device_id *id); /* New device inserted */
6: void (*remove) (struct pci_dev *dev); /* Device removed (NULL if not a hot-plug capable driver) */
7: int (*suspend) (struct pci_dev *dev, pm_message_t state); /* Device suspended */
8: int (*suspend_late) (struct pci_dev *dev, pm_message_t state);
9: int (*resume_early) (struct pci_dev *dev);
10: int (*resume) (struct pci_dev *dev); /* Device woken up */
11: void (*shutdown) (struct pci_dev *dev);
12: struct pci_error_handlers *err_handler;
13: struct device_driver driver;
14: struct pci_dynids dynids;
15: };
用於實現PCI驅動程序,表示了通用內核代碼和設備的底層硬件驅動程序之間的接口。每一個PCI驅動程序都必須將其函數填到該接口中,使得內核能夠一致的控制可用的驅動程序。PCI驅動程序最重要的方面是對檢測、安裝、移除設備的支持。爲此提供了兩個函數指針,probe檢測該驅動程序是否支持某個PCI設備,remove用於移除設備。只用系統支持熱插拔時,移除PCI設備纔有意義。驅動程序必須知道它負責管理的設備。pci_dev_id惟一的標識所支持的設備,與pci_dev是對同一事物的不一樣層面的刻畫。id_table數組中保存了該設備支持的設備。
1: struct pci_device_id {
2: __u32 vendor, device; /* Vendor and device ID or PCI_ANY_ID*/
3: __u32 subvendor, subdevice; /* Subsystem ID's or PCI_ANY_ID */
4: __u32 class, class_mask; /* (class,subclass,prog-if) triplet */
5: kernel_ulong_t driver_data; /* Data private to the driver */
6: };
1: static DEFINE_PCI_DEVICE_TABLE(e100_id_table) = {
2: INTEL_8255X_ETHERNET_DEVICE(0x1029, 0),
3: INTEL_8255X_ETHERNET_DEVICE(0x1030, 0),
4: INTEL_8255X_ETHERNET_DEVICE(0x1031, 3),
5: INTEL_8255X_ETHERNET_DEVICE(0x1032, 3),
6: INTEL_8255X_ETHERNET_DEVICE(0x1033, 3),
7: INTEL_8255X_ETHERNET_DEVICE(0x1034, 3),
8: INTEL_8255X_ETHERNET_DEVICE(0x1038, 3),
9: INTEL_8255X_ETHERNET_DEVICE(0x1039, 4),
10: INTEL_8255X_ETHERNET_DEVICE(0x103A, 4),
11: INTEL_8255X_ETHERNET_DEVICE(0x103B, 4),
12: INTEL_8255X_ETHERNET_DEVICE(0x103C, 4),
13: INTEL_8255X_ETHERNET_DEVICE(0x103D, 4),
14: INTEL_8255X_ETHERNET_DEVICE(0x103E, 4),
15: INTEL_8255X_ETHERNET_DEVICE(0x1050, 5),
16: INTEL_8255X_ETHERNET_DEVICE(0x1051, 5),
17: INTEL_8255X_ETHERNET_DEVICE(0x1052, 5),
18: INTEL_8255X_ETHERNET_DEVICE(0x1053, 5),
19: INTEL_8255X_ETHERNET_DEVICE(0x1054, 5),
20: INTEL_8255X_ETHERNET_DEVICE(0x1055, 5),
21: INTEL_8255X_ETHERNET_DEVICE(0x1056, 5),
22: INTEL_8255X_ETHERNET_DEVICE(0x1057, 5),
23: INTEL_8255X_ETHERNET_DEVICE(0x1059, 0),
24: INTEL_8255X_ETHERNET_DEVICE(0x1064, 6),
25: INTEL_8255X_ETHERNET_DEVICE(0x1065, 6),
26: INTEL_8255X_ETHERNET_DEVICE(0x1066, 6),
27: INTEL_8255X_ETHERNET_DEVICE(0x1067, 6),
28: INTEL_8255X_ETHERNET_DEVICE(0x1068, 6),
29: INTEL_8255X_ETHERNET_DEVICE(0x1069, 6),
30: INTEL_8255X_ETHERNET_DEVICE(0x106A, 6),
31: INTEL_8255X_ETHERNET_DEVICE(0x106B, 6),
32: INTEL_8255X_ETHERNET_DEVICE(0x1091, 7),
33: INTEL_8255X_ETHERNET_DEVICE(0x1092, 7),
34: INTEL_8255X_ETHERNET_DEVICE(0x1093, 7),
35: INTEL_8255X_ETHERNET_DEVICE(0x1094, 7),
36: INTEL_8255X_ETHERNET_DEVICE(0x1095, 7),
37: INTEL_8255X_ETHERNET_DEVICE(0x10fe, 7),
38: INTEL_8255X_ETHERNET_DEVICE(0x1209, 0),
39: INTEL_8255X_ETHERNET_DEVICE(0x1229, 0),
40: INTEL_8255X_ETHERNET_DEVICE(0x2449, 2),
41: INTEL_8255X_ETHERNET_DEVICE(0x2459, 2),
42: INTEL_8255X_ETHERNET_DEVICE(0x245D, 2),
43: INTEL_8255X_ETHERNET_DEVICE(0x27DC, 7),
44: { 0, }
45: };
內核提供了pci_match_id函數,將PCI設備數據與ID表中的數據進行比較。
1: const struct pci_device_id *pci_match_id(const struct pci_device_id *ids,
2: struct pci_dev *dev)
註冊驅動程序
1: int __must_check __pci_register_driver(struct pci_driver *, struct module *,
2: const char *mod_name)
1: static int __init e100_init_module(void)
2: {
3: if (((1 << debug) - 1) & NETIF_MSG_DRV) {
4: pr_info("%s, %s\n", DRV_DESCRIPTION, DRV_VERSION);
5: pr_info("%s\n", DRV_COPYRIGHT);
6: }
7: return pci_register_driver(&e100_driver);
8: }
驅動程序與設備的關聯
1: /**
2: * driver_attach - try to bind driver to devices.
3: * @drv: driver.
4: *
5: * Walk the list of devices that the bus has on it and try to
6: * match the driver with each one. If driver_probe_device()
7: * returns 0 and the @dev->driver is set, we've found a
8: * compatible pair.
9: */
10: int driver_attach(struct device_driver *drv)
11: {
12: return bus_for_each_dev(drv->bus, NULL, drv, __driver_attach);
13: }
14: EXPORT_SYMBOL_GPL(driver_attach);
1: static int __driver_attach(struct device *dev, void *data)
2: {
3: struct device_driver *drv = data;
4:
5: /*
6: * Lock device and try to bind to it. We drop the error
7: * here and always return 0, because we need to keep trying
8: * to bind to devices and some drivers will return an error
9: * simply if it didn't support the device.
10: *
11: * driver_probe_device() will spit a warning if there
12: * is an error.
13: */
14:
15: if (!driver_match_device(drv, dev))
16: return 0;
17:
18: if (dev->parent) /* Needed for USB */
19: device_lock(dev->parent);
20: device_lock(dev);
21: if (!dev->driver)
22: driver_probe_device(drv, dev);
23: device_unlock(dev);
24: if (dev->parent)
25: device_unlock(dev->parent);
26:
27: return 0;
28: }
e100.c分析
1: static int __devinit e100_probe(struct pci_dev *pdev,
2: const struct pci_device_id *ent)
3: {
4: struct net_device *netdev;
5: struct nic *nic;
6: int err;
7:
8: if (!(netdev = alloc_etherdev(sizeof(struct nic)))) {
9: if (((1 << debug) - 1) & NETIF_MSG_PROBE)
10: pr_err("Etherdev alloc failed, aborting\n");
11: return -ENOMEM;
12: }
13:
14: netdev->netdev_ops = &e100_netdev_ops;
15: SET_ETHTOOL_OPS(netdev, &e100_ethtool_ops);
16: netdev->watchdog_timeo = E100_WATCHDOG_PERIOD;
17: strncpy(netdev->name, pci_name(pdev), sizeof(netdev->name) - 1);
18:
19: nic = netdev_priv(netdev);
20: netif_napi_add(netdev, &nic->napi, e100_poll, E100_NAPI_WEIGHT);
21: nic->netdev = netdev;
22: nic->pdev = pdev;
23: nic->msg_enable = (1 << debug) - 1;
24: nic->mdio_ctrl = mdio_ctrl_hw;
25: pci_set_drvdata(pdev, netdev);
26:
27: if ((err = pci_enable_device(pdev))) {
28: netif_err(nic, probe, nic->netdev, "Cannot enable PCI device, aborting\n");
29: goto err_out_free_dev;
30: }
31:
32: if (!(pci_resource_flags(pdev, 0) & IORESOURCE_MEM)) {
33: netif_err(nic, probe, nic->netdev, "Cannot find proper PCI device base address, aborting\n");
34: err = -ENODEV;
35: goto err_out_disable_pdev;
36: }
37:
38: if ((err = pci_request_regions(pdev, DRV_NAME))) {
39: netif_err(nic, probe, nic->netdev, "Cannot obtain PCI resources, aborting\n");
40: goto err_out_disable_pdev;
41: }
42:
43: if ((err = pci_set_dma_mask(pdev, DMA_BIT_MASK(32)))) {
44: netif_err(nic, probe, nic->netdev, "No usable DMA configuration, aborting\n");
45: goto err_out_free_res;
46: }
47:
48: SET_NETDEV_DEV(netdev, &pdev->dev);
49:
50: if (use_io)
51: netif_info(nic, probe, nic->netdev, "using i/o access mode\n");
52:
53: nic->csr = pci_iomap(pdev, (use_io ? 1 : 0), sizeof(struct csr));
54: if (!nic->csr) {
55: netif_err(nic, probe, nic->netdev, "Cannot map device registers, aborting\n");
56: err = -ENOMEM;
57: goto err_out_free_res;
58: }
59:
60: if (ent->driver_data)
61: nic->flags |= ich;
62: else
63: nic->flags &= ~ich;
64:
65: e100_get_defaults(nic);
66:
67: /* D100 MAC doesn't allow rx of vlan packets with normal MTU */
68: if (nic->mac < mac_82558_D101_A4)
69: netdev->features |= NETIF_F_VLAN_CHALLENGED;
70:
71: /* locks must be initialized before calling hw_reset */
72: spin_lock_init(&nic->cb_lock);
73: spin_lock_init(&nic->cmd_lock);
74: spin_lock_init(&nic->mdio_lock);
75:
76: /* Reset the device before pci_set_master() in case device is in some
77: * funky state and has an interrupt pending - hint: we don't have the
78: * interrupt handler registered yet. */
79: e100_hw_reset(nic);
80:
81: pci_set_master(pdev);
82:
83: init_timer(&nic->watchdog);
84: nic->watchdog.function = e100_watchdog;
85: nic->watchdog.data = (unsigned long)nic;
86:
87: INIT_WORK(&nic->tx_timeout_task, e100_tx_timeout_task);
88:
89: if ((err = e100_alloc(nic))) {
90: netif_err(nic, probe, nic->netdev, "Cannot alloc driver memory, aborting\n");
91: goto err_out_iounmap;
92: }
93:
94: if ((err = e100_eeprom_load(nic)))
95: goto err_out_free;
96:
97: e100_phy_init(nic);
98:
99: memcpy(netdev->dev_addr, nic->eeprom, ETH_ALEN);
100: memcpy(netdev->perm_addr, nic->eeprom, ETH_ALEN);
101: if (!is_valid_ether_addr(netdev->perm_addr)) {
102: if (!eeprom_bad_csum_allow) {
103: netif_err(nic, probe, nic->netdev, "Invalid MAC address from EEPROM, aborting\n");
104: err = -EAGAIN;
105: goto err_out_free;
106: } else {
107: netif_err(nic, probe, nic->netdev, "Invalid MAC address from EEPROM, you MUST configure one.\n");
108: }
109: }
110:
111: /* Wol magic packet can be enabled from eeprom */
112: if ((nic->mac >= mac_82558_D101_A4) &&
113: (nic->eeprom[eeprom_id] & eeprom_id_wol)) {
114: nic->flags |= wol_magic;
115: device_set_wakeup_enable(&pdev->dev, true);
116: }
117:
118: /* ack any pending wake events, disable PME */
119: pci_pme_active(pdev, false);
120:
121: strcpy(netdev->name, "eth%d");
122: if ((err = register_netdev(netdev))) {
123: netif_err(nic, probe, nic->netdev, "Cannot register net device, aborting\n");
124: goto err_out_free;
125: }
126: nic->cbs_pool = pci_pool_create(netdev->name,
127: nic->pdev,
128: nic->params.cbs.max * sizeof(struct cb),
129: sizeof(u32),
130: 0);
131: netif_info(nic, probe, nic->netdev,
132: "addr 0x%llx, irq %d, MAC addr %pM\n",
133: (unsigned long long)pci_resource_start(pdev, use_io ? 1 : 0),
134: pdev->irq, netdev->dev_addr);
135:
136: return 0;
137:
138: err_out_free:
139: e100_free(nic);
140: err_out_iounmap:
141: pci_iounmap(pdev, nic->csr);
142: err_out_free_res:
143: pci_release_regions(pdev);
144: err_out_disable_pdev:
145: pci_disable_device(pdev);
146: err_out_free_dev:
147: pci_set_drvdata(pdev, NULL);
148: free_netdev(netdev);
149: return err;
150: }
__driver_attach函數會調用此函數。該函數調用完成後,構建完成以下的數據結構。
首先分配網卡設備結構,填充netdev_ops和ethtool_ops函數指針,而後分配struct nic結構,填充相應的結構,最後將個數據結構組裝在一塊兒,造成上圖的數據結構。而後正式進行pci設備的啓動工做。最後會向系統註冊netdev設備。當PCI層發現它正在搜索驅動程序的設備ID與前面提到的id_table匹配,就會調用此函數。此函數應該開啓硬件、分配net_device結構、初始化並註冊新設備。此函數中,驅動程序也會分配正確工做所需的全部數據結構。
1: static inline void e100_write_flush(struct nic *nic)
2: {
3: /* Flush previous PCI writes through intermediate bridges
4: * by doing a benign read */
5: (void)ioread8(&nic->csr->scb.status);
6: }
把PCI總線讀一下,強迫寫操做完成。
1: static void e100_enable_irq(struct nic *nic)
2: {
3: unsigned long flags;
4: //自旋鎖,關中斷
5: spin_lock_irqsave(&nic->cmd_lock, flags);
6: iowrite8(irq_mask_none, &nic->csr->scb.cmd_hi);//開網卡中斷
7: e100_write_flush(nic);//刷新,命令生效
8: spin_unlock_irqrestore(&nic->cmd_lock, flags);
9: }
設置多播地址
1: static void e100_multi(struct nic *nic, struct cb *cb, struct sk_buff *skb)
2: {
3: struct net_device *netdev = nic->netdev;
4: struct netdev_hw_addr *ha;
5: u16 i, count = min(netdev_mc_count(netdev), E100_MAX_MULTICAST_ADDRS);
6:
7: cb->command = cpu_to_le16(cb_multi);
8: cb->u.multi.count = cpu_to_le16(count * ETH_ALEN);
9: i = 0;
10: netdev_for_each_mc_addr(ha, netdev) {
11: if (i == count)
12: break;
13: memcpy(&cb->u.multi.addr[i++ * ETH_ALEN], &ha->addr,
14: ETH_ALEN);
15: }
16: }
17:
18: static void e100_set_multicast_list(struct net_device *netdev)
19: {
20: struct nic *nic = netdev_priv(netdev);
21:
22: netif_printk(nic, hw, KERN_DEBUG, nic->netdev,
23: "mc_count=%d, flags=0x%04X\n",
24: netdev_mc_count(netdev), netdev->flags);
25:
26: if (netdev->flags & IFF_PROMISC)
27: nic->flags |= promiscuous;
28: else
29: nic->flags &= ~promiscuous;
30:
31: if (netdev->flags & IFF_ALLMULTI ||
32: netdev_mc_count(netdev) > E100_MAX_MULTICAST_ADDRS)
33: nic->flags |= multicast_all;
34: else
35: nic->flags &= ~multicast_all;
36:
37: e100_exec_cb(nic, NULL, e100_configure);
38: e100_exec_cb(nic, NULL, e100_multi);
39: }
更新網卡統計信息
1: static void e100_update_stats(struct nic *nic)
2: {
3: struct net_device *dev = nic->netdev;
4: struct net_device_stats *ns = &dev->stats;
5: struct stats *s = &nic->mem->stats;
6: __le32 *complete = (nic->mac < mac_82558_D101_A4) ? &s->fc_xmt_pause :
7: (nic->mac < mac_82559_D101M) ? (__le32 *)&s->xmt_tco_frames :
8: &s->complete;
9:
10: /* Device's stats reporting may take several microseconds to
11: * complete, so we're always waiting for results of the
12: * previous command. */
13:
14: if (*complete == cpu_to_le32(cuc_dump_reset_complete)) {
15: *complete = 0;
16: nic->tx_frames = le32_to_cpu(s->tx_good_frames);
17: nic->tx_collisions = le32_to_cpu(s->tx_total_collisions);
18: ns->tx_aborted_errors += le32_to_cpu(s->tx_max_collisions);
19: ns->tx_window_errors += le32_to_cpu(s->tx_late_collisions);
20: ns->tx_carrier_errors += le32_to_cpu(s->tx_lost_crs);
21: ns->tx_fifo_errors += le32_to_cpu(s->tx_underruns);
22: ns->collisions += nic->tx_collisions;
23: ns->tx_errors += le32_to_cpu(s->tx_max_collisions) +
24: le32_to_cpu(s->tx_lost_crs);
25: ns->rx_length_errors += le32_to_cpu(s->rx_short_frame_errors) +
26: nic->rx_over_length_errors;
27: ns->rx_crc_errors += le32_to_cpu(s->rx_crc_errors);
28: ns->rx_frame_errors += le32_to_cpu(s->rx_alignment_errors);
29: ns->rx_over_errors += le32_to_cpu(s->rx_overrun_errors);
30: ns->rx_fifo_errors += le32_to_cpu(s->rx_overrun_errors);
31: ns->rx_missed_errors += le32_to_cpu(s->rx_resource_errors);
32: ns->rx_errors += le32_to_cpu(s->rx_crc_errors) +
33: le32_to_cpu(s->rx_alignment_errors) +
34: le32_to_cpu(s->rx_short_frame_errors) +
35: le32_to_cpu(s->rx_cdt_errors);
36: nic->tx_deferred += le32_to_cpu(s->tx_deferred);
37: nic->tx_single_collisions +=
38: le32_to_cpu(s->tx_single_collisions);
39: nic->tx_multiple_collisions +=
40: le32_to_cpu(s->tx_multiple_collisions);
41: if (nic->mac >= mac_82558_D101_A4) {
42: nic->tx_fc_pause += le32_to_cpu(s->fc_xmt_pause);
43: nic->rx_fc_pause += le32_to_cpu(s->fc_rcv_pause);
44: nic->rx_fc_unsupported +=
45: le32_to_cpu(s->fc_rcv_unsupported);
46: if (nic->mac >= mac_82559_D101M) {
47: nic->tx_tco_frames +=
48: le16_to_cpu(s->xmt_tco_frames);
49: nic->rx_tco_frames +=
50: le16_to_cpu(s->rcv_tco_frames);
51: }
52: }
53: }
54:
55:
56: if (e100_exec_cmd(nic, cuc_dump_reset, 0))
57: netif_printk(nic, tx_err, KERN_DEBUG, nic->netdev,
58: "exec cuc_dump_reset failed\n");
59: }
網卡信息監測,根據MII的監測工具進行監測,若是發現有網卡動做,則調整統計信息,把網卡設置成UP/DOWN狀態
1: static void e100_watchdog(unsigned long data)
2: {
3: struct nic *nic = (struct nic *)data;
4: struct ethtool_cmd cmd = { .cmd = ETHTOOL_GSET };
5: u32 speed;
6:
7: netif_printk(nic, timer, KERN_DEBUG, nic->netdev,
8: "right now = %ld\n", jiffies);
9:
10: /* mii library handles link maintenance tasks */
11:
12: mii_ethtool_gset(&nic->mii, &cmd);
13: speed = ethtool_cmd_speed(&cmd);
14:
15: if (mii_link_ok(&nic->mii) && !netif_carrier_ok(nic->netdev)) {
16: netdev_info(nic->netdev, "NIC Link is Up %u Mbps %s Duplex\n",
17: speed == SPEED_100 ? 100 : 10,
18: cmd.duplex == DUPLEX_FULL ? "Full" : "Half");
19: } else if (!mii_link_ok(&nic->mii) && netif_carrier_ok(nic->netdev)) {
20: netdev_info(nic->netdev, "NIC Link is Down\n");
21: }
22:
23: mii_check_link(&nic->mii);
24:
25: /* Software generated interrupt to recover from (rare) Rx
26: * allocation failure.
27: * Unfortunately have to use a spinlock to not re-enable interrupts
28: * accidentally, due to hardware that shares a register between the
29: * interrupt mask bit and the SW Interrupt generation bit */
30: spin_lock_irq(&nic->cmd_lock);
31: iowrite8(ioread8(&nic->csr->scb.cmd_hi) | irq_sw_gen,&nic->csr->scb.cmd_hi);
32: e100_write_flush(nic);
33: spin_unlock_irq(&nic->cmd_lock);
34:
35: e100_update_stats(nic);
36: e100_adjust_adaptive_ifs(nic, speed, cmd.duplex);
37:
38: if (nic->mac <= mac_82557_D100_C)
39: /* Issue a multicast command to workaround a 557 lock up */
40: e100_set_multicast_list(nic->netdev);
41:
42: if (nic->flags & ich && speed == SPEED_10 && cmd.duplex == DUPLEX_HALF)
43: /* Need SW workaround for ICH[x] 10Mbps/half duplex Tx hang. */
44: nic->flags |= ich_10h_workaround;
45: else
46: nic->flags &= ~ich_10h_workaround;
47:
48: mod_timer(&nic->watchdog,
49: round_jiffies(jiffies + E100_WATCHDOG_PERIOD));//啓動下一次監測
50: }
1: static int e100_up(struct nic *nic)
2: {
3: int err;
4:
5: if ((err = e100_rx_alloc_list(nic)))//分配收包隊列
6: return err;
7: if ((err = e100_alloc_cbs(nic)))//分配控制隊列
8: goto err_rx_clean_list;
9: if ((err = e100_hw_init(nic)))//硬件初始化
10: goto err_clean_cbs;
11: e100_set_multicast_list(nic->netdev);//設置多播地址
12: e100_start_receiver(nic, NULL);//準備工做
13: mod_timer(&nic->watchdog, jiffies);//時間狗,自動檢查網卡狀態
14: if ((err = request_irq(nic->pdev->irq, e100_intr, IRQF_SHARED,
15: nic->netdev->name, nic->netdev)))//請求IRQ分配
16: goto err_no_irq;
17: netif_wake_queue(nic->netdev);//喚醒網絡隊列,通知核心,這個網卡啓動了
18: napi_enable(&nic->napi);//NAPI方式,把pool使能
19: /* enable ints _after_ enabling poll, preventing a race between
20: * disable ints+schedule */
21: e100_enable_irq(nic);//使能中斷
22: return 0;
23:
24: err_no_irq:
25: del_timer_sync(&nic->watchdog);
26: err_clean_cbs:
27: e100_clean_cbs(nic);
28: err_rx_clean_list:
29: e100_rx_clean_list(nic);
30: return err;
31: }
網卡啓動函數
1: static const struct net_device_ops e100_netdev_ops = {
2: .ndo_open = e100_open,
3: .ndo_stop = e100_close,
4: .ndo_start_xmit = e100_xmit_frame,
5: .ndo_validate_addr = eth_validate_addr,
6: .ndo_set_rx_mode = e100_set_multicast_list,
7: .ndo_set_mac_address = e100_set_mac_address,
8: .ndo_change_mtu = e100_change_mtu,
9: .ndo_do_ioctl = e100_do_ioctl,
10: .ndo_tx_timeout = e100_tx_timeout,
11: #ifdef CONFIG_NET_POLL_CONTROLLER
12: .ndo_poll_controller = e100_netpoll,
13: #endif
14: };
e100.c實現的網絡設備方法。其基本做用以下:
open,打開接口。在ifconfig激活接口時,接口將被打開。open函數應該註冊全部的系統資源(I/O端口,IRQ,DMA等等),打開硬件,並對設備執行全部其餘所需的設置。
stop,中止接口。當接口終止時應該被中止。在該函數中執行的操做與打開時執行的操做相反。包括中止出口隊列、釋放硬件資源以及中止設備驅動程序使用的任何定時器。
hard_start_xmit,該方法初始化數據包的傳輸。完整的數據包(協議頭和數據)包含在一個套接字緩衝區(sk_buffer)結構中。
tx_timeout,若是數據包的傳輸在合理的時間段內失敗,則假定丟失了中斷或接口被鎖住,這是網絡代碼將調用該方法。它負責解決問題並從新開始數據包的傳輸。
do_ioctl,執行接口特有的ioctl命令。若是接口不須要實現任何接口特有的命令,則設置爲NULL
1: static int e100_open(struct net_device *netdev)
2: {
3: struct nic *nic = netdev_priv(netdev);
4: int err = 0;
5:
6: netif_carrier_off(netdev);
7: if ((err = e100_up(nic)))
8: netif_err(nic, ifup, nic->netdev, "Cannot open interface, aborting\n");
9: return err;
10: }
1: static int e100_close(struct net_device *netdev)
2: {
3: e100_down(netdev_priv(netdev));
4: return 0;
5: }
1: static void e100_down(struct nic *nic)
2: {
3: /* wait here for poll to complete */
4: napi_disable(&nic->napi);
5: netif_stop_queue(nic->netdev);
6: e100_hw_reset(nic);
7: free_irq(nic->pdev->irq, nic->netdev);
8: del_timer_sync(&nic->watchdog);
9: netif_carrier_off(nic->netdev);
10: e100_clean_cbs(nic);
11: e100_rx_clean_list(nic);
12: }
基本上就是e100_open的逆操做。
1: static int e100_tx_clean(struct nic *nic) //對發包隊列進行清理
2: {
3: struct net_device *dev = nic->netdev;
4: struct cb *cb;
5: int tx_cleaned = 0;
6:
7: spin_lock(&nic->cb_lock);
8:
9: /* Clean CBs marked complete */
10: for (cb = nic->cb_to_clean;
11: cb->status & cpu_to_le16(cb_complete);
12: cb = nic->cb_to_clean = cb->next) {
13: rmb(); /* read skb after status */
14: netif_printk(nic, tx_done, KERN_DEBUG, nic->netdev,
15: "cb[%d]->status = 0x%04X\n",
16: (int)(((void*)cb - (void*)nic->cbs)/sizeof(struct cb)),
17: cb->status);
18:
19: if (likely(cb->skb != NULL)) {
20: dev->stats.tx_packets++;
21: dev->stats.tx_bytes += cb->skb->len;
22:
23: pci_unmap_single(nic->pdev,
24: le32_to_cpu(cb->u.tcb.tbd.buf_addr),
25: le16_to_cpu(cb->u.tcb.tbd.size),
26: PCI_DMA_TODEVICE);//解除PCI通道的DMA映射
27: dev_kfree_skb_any(cb->skb);//釋放skb
28: cb->skb = NULL;
29: tx_cleaned = 1;
30: }
31: cb->status = 0;
32: nic->cbs_avail++;
33: }
34:
35: spin_unlock(&nic->cb_lock);
36:
37: /* Recover from running out of Tx resources in xmit_frame */
38: if (unlikely(tx_cleaned && netif_queue_stopped(nic->netdev)))
39: netif_wake_queue(nic->netdev);//喚醒網卡的等待隊列
40:
41: return tx_cleaned;
42: }
1: static int e100_rx_alloc_skb(struct nic *nic, struct rx *rx)
2: {
3: if (!(rx->skb = netdev_alloc_skb_ip_align(nic->netdev, RFD_BUF_LEN)))
4: return -ENOMEM;
5:
6: /* Init, and map the RFD. */
7: skb_copy_to_linear_data(rx->skb, &nic->blank_rfd, sizeof(struct rfd));
8: rx->dma_addr = pci_map_single(nic->pdev, rx->skb->data,
9: RFD_BUF_LEN, PCI_DMA_BIDIRECTIONAL);
10:
11: if (pci_dma_mapping_error(nic->pdev, rx->dma_addr)) {
12: dev_kfree_skb_any(rx->skb);
13: rx->skb = NULL;
14: rx->dma_addr = 0;
15: return -ENOMEM;
16: }
17:
18: /* Link the RFD to end of RFA by linking previous RFD to
19: * this one. We are safe to touch the previous RFD because
20: * it is protected by the before last buffer's el bit being set */
21: if (rx->prev->skb) {
22: struct rfd *prev_rfd = (struct rfd *)rx->prev->skb->data;
23: put_unaligned_le32(rx->dma_addr, &prev_rfd->link);
24: pci_dma_sync_single_for_device(nic->pdev, rx->prev->dma_addr,
25: sizeof(struct rfd), PCI_DMA_BIDIRECTIONAL);
26: }
27:
28: return 0;
29: }
給收包過程分配skb,這個過程主要完成skb的分配工做,若是rx隊列沒有skb,則新分配一個,不然吧狀態同步一下,而後直接使用就的skb,用於提升效率。分配好的skb要作pci_map動做,就是把內存掛在網卡的DMA通道,等有中斷髮生,內存就是網絡數據包了,校驗的動做在後面會作。
1: static int e100_rx_indicate(struct nic *nic, struct rx *rx,
2: unsigned int *work_done, unsigned int work_to_do)
3: {
4: struct net_device *dev = nic->netdev;
5: struct sk_buff *skb = rx->skb;
6: struct rfd *rfd = (struct rfd *)skb->data;
7: u16 rfd_status, actual_size;
8:
9: if (unlikely(work_done && *work_done >= work_to_do))
10: return -EAGAIN;
11:
12: /* Need to sync before taking a peek at cb_complete bit */
13: pci_dma_sync_single_for_cpu(nic->pdev, rx->dma_addr,
14: sizeof(struct rfd), PCI_DMA_BIDIRECTIONAL);//同步一下狀態,也就是skb的前16字節的內存,後面根據rdf_status判斷包是否收全了。
15: rfd_status = le16_to_cpu(rfd->status);
16:
17: netif_printk(nic, rx_status, KERN_DEBUG, nic->netdev,
18: "status=0x%04X\n", rfd_status);
19: rmb(); /* read size after status bit */
20:
21: /* If data isn't ready, nothing to indicate */
22: if (unlikely(!(rfd_status & cb_complete))) {
23: /* If the next buffer has the el bit, but we think the receiver
24: * is still running, check to see if it really stopped while
25: * we had interrupts off.
26: * This allows for a fast restart without re-enabling
27: * interrupts */
28: if ((le16_to_cpu(rfd->command) & cb_el) &&
29: (RU_RUNNING == nic->ru_running))
30:
31: if (ioread8(&nic->csr->scb.status) & rus_no_res)
32: nic->ru_running = RU_SUSPENDED;
33: pci_dma_sync_single_for_device(nic->pdev, rx->dma_addr,
34: sizeof(struct rfd),
35: PCI_DMA_FROMDEVICE);
36: return -ENODATA;
37: }
38:
39: /* Get actual data size */
40: actual_size = le16_to_cpu(rfd->actual_size) & 0x3FFF;
41: if (unlikely(actual_size > RFD_BUF_LEN - sizeof(struct rfd)))
42: actual_size = RFD_BUF_LEN - sizeof(struct rfd);
43:
44: /* Get data */
45: pci_unmap_single(nic->pdev, rx->dma_addr,
46: RFD_BUF_LEN, PCI_DMA_BIDIRECTIONAL);//解除DMA映射,這樣skb->data能夠自由使用了
47:
48: /* If this buffer has the el bit, but we think the receiver
49: * is still running, check to see if it really stopped while
50: * we had interrupts off.
51: * This allows for a fast restart without re-enabling interrupts.
52: * This can happen when the RU sees the size change but also sees
53: * the el bit set. */
54: if ((le16_to_cpu(rfd->command) & cb_el) &&
55: (RU_RUNNING == nic->ru_running)) {
56:
57: if (ioread8(&nic->csr->scb.status) & rus_no_res)
58: nic->ru_running = RU_SUSPENDED;
59: }
60:
61: /* Pull off the RFD and put the actual data (minus eth hdr) */
62: skb_reserve(skb, sizeof(struct rfd));
63: skb_put(skb, actual_size);
64: skb->protocol = eth_type_trans(skb, nic->netdev);
65:
66: if (unlikely(!(rfd_status & cb_ok))) {
67: /* Don't indicate if hardware indicates errors */
68: dev_kfree_skb_any(skb);
69: } else if (actual_size > ETH_DATA_LEN + VLAN_ETH_HLEN) {
70: /* Don't indicate oversized frames */
71: nic->rx_over_length_errors++;
72: dev_kfree_skb_any(skb);
73: } else {
74: dev->stats.rx_packets++;
75: dev->stats.rx_bytes += actual_size;
76: netif_receive_skb(skb);
77: if (work_done)
78: (*work_done)++;
79: }
80:
81: rx->skb = NULL;
82:
83: return 0;
84: }
主要的收包過程,有中斷髮生後,這個函數把接收的包首先接觸PCI_DMA映射,而後糾錯,最後要把包送到協議棧。
1: static int e100_poll(struct napi_struct *napi, int budget)
2: {
3: struct nic *nic = container_of(napi, struct nic, napi);
4: unsigned int work_done = 0;
5:
6: e100_rx_clean(nic, &work_done, budget);
7: e100_tx_clean(nic);
8:
9: /* If budget not fully consumed, exit the polling mode */
10: if (work_done < budget) {
11: napi_complete(napi);
12: e100_enable_irq(nic);
13: }
14:
15: return work_done;
16: }