Linux下的內核模塊機制

時間 2019-11-20

標籤 linux 內核模塊機制欄目 Linux 简体版

原文原文鏈接

2017-06-20html

Linux的內核模塊機制容許開發者動態的向內核添加功能，咱們常見的文件系統、驅動程序等均可以經過模塊的方式添加到內核而無需對內核從新編譯，這在很大程度上減小了操做的複雜度。模塊機制使內核預編譯時沒必要包含不少無關功能，把內核作到最精簡，後期能夠根據須要進行添加。而針對驅動程序，由於涉及到具體的硬件，很難使通用的，且其中可能包含了各個廠商的私密接口，廠商幾乎不會容許開發者把源代碼公開，這就和linux的許可相悖，模塊機制很好的解決了這個衝突，容許驅動程序後期進行添加而不合併到內核。OK，下面結合源代碼討論下模塊機制的實現。linux

相似於普通的可執行文件，模塊通過編譯後獲得.ko文件，其自己也是可重定位目標文件，相似於gcc -c 獲得的.o目標文件。對於可重定位的概念，請參考PE文件格式（雖然是windows下的，可是原理相似）。算法

既然是重定位文件，在把模塊加載到內核的時候就須要進行重定位，回想下用戶可執行文件的重定位，通常若是一個程序的可執行文件總能加載到本身的理想位置，因此對於用戶可執行文件，通常不怎麼須要重定位，而對於動態庫文件就不一樣了，庫文件格式是一致的，可是可能須要加載多個庫文件，那麼有些庫文件必然沒法加載到本身的理想位置，就須要進行重定位。而內核模塊因爲和內核共享同一個內核地址空間，更不能保證本身的理想地址不被佔用，因此通常狀況內核模塊也須要進行重定位。在加載到內核時，還有一個重要的工做即便解決模塊之間的依賴，模塊A中引用了其餘模塊的函數，那麼在加載到內核以前其實模塊A並不知道所引用的函數地址，所以只能作一個標記，在加載到內核的時候在根據符號表解決引用問題！這些都是在加載內核的核心繫統調用sys_init_module完成。windows

內核中的數據結構數組

每個內核模塊在內核中都對應一個數據結構module,全部的模塊經過一個鏈表維護。因此有些惡意模塊企圖經過從鏈表摘除結構來達到隱藏模塊的目的。部分紅員列舉以下：數據結構

struct module
{
    enum module_state state;

    /* Member of list of modules */
    struct list_head list;//全部的模塊構成雙鏈表，包頭爲全局變量modules

    /* Unique handle for this module */
    char name[MODULE_NAME_LEN];//模塊名字，惟一，通常存儲去掉.ko的部分

    /* Sysfs stuff. */
    struct module_kobject mkobj;
    struct module_attribute *modinfo_attrs;
    const char *version;
    const char *srcversion;
    struct kobject *holders_dir;

    /* Exported symbols *//**/
    const struct kernel_symbol *syms;//導出符號信息，指向一個kernel_symbol的數組，有num_syms個表項。
    const unsigned long *crcs;//一樣有num_syms個表項，不過存儲的是符號的校驗和
    unsigned int num_syms;

    /* Kernel parameters. */
    struct kernel_param *kp;
    unsigned int num_kp;

    /* GPL-only exported symbols. */
    unsigned int num_gpl_syms;//具體意義同上面符號，可是這裏只適用於GPL兼容的模塊
    const struct kernel_symbol *gpl_syms;
    const unsigned long *gpl_crcs;

#ifdef CONFIG_UNUSED_SYMBOLS
    /* unused exported symbols. */
    const struct kernel_symbol *unused_syms;
    const unsigned long *unused_crcs;
    unsigned int num_unused_syms;

    /* GPL-only, unused exported symbols. */
    unsigned int num_unused_gpl_syms;
    const struct kernel_symbol *unused_gpl_syms;
    const unsigned long *unused_gpl_crcs;
#endif

#ifdef CONFIG_MODULE_SIG
    /* Signature was verified. */
    bool sig_ok;
#endif

    /* symbols that will be GPL-only in the near future. */
    const struct kernel_symbol *gpl_future_syms;
    const unsigned long *gpl_future_crcs;
    unsigned int num_gpl_future_syms;

    /* Exception table */
    unsigned int num_exentries;
    struct exception_table_entry *extable;

    /* Startup function. */
    int (*init)(void);//模塊初始化函數指針

    /* If this is non-NULL, vfree after init() returns */
    void *module_init;/若是該函數不爲空，則init結束後就能夠調用進行適當釋放

    /* Here is the actual code + data, vfree'd on unload. */
    void *module_core;//核心數據和代碼部分，在卸載的時候會調用

    /* Here are the sizes of the init and core sections */
    unsigned int init_size, core_size;//對應於上面的init和core函數，決定各自佔用的大小

    /* The size of the executable code in each section.  */
    unsigned int init_text_size, core_text_size;

    /* Size of RO sections of the module (text+rodata) */
    unsigned int init_ro_size, core_ro_size;
    。。。。。。

#ifdef CONFIG_MODULE_UNLOAD
　　　　/*模塊間的依賴關係記錄*/
    /* What modules depend on me? */
    struct list_head source_list;
    /* What modules do I depend on? */
    struct list_head target_list;

    /* Who is waiting for us to be unloaded */
    struct task_struct *waiter;//等待隊列，記錄那些進程等待模塊被卸載

    /* Destruction function. */
    void (*exit)(void);//卸載退出函數，模塊中定義的exit函數

    。。。。。。
};

依賴關係架構

模塊間的依賴關係經過兩個節點source_list和target_list記錄，前者記錄那些模塊依賴於本模塊，後者記錄本模塊依賴於那些模塊。節點經過module_use記錄，module_use以下函數

struct module_use {
    struct list_head source_list;
    struct list_head target_list;
    struct module *source, *target;
};

每一個module_use記錄一個映射關係，注意這裏把source和target放在一個一個結構裏，由於一個關係須要在源模塊和目標模塊都作記錄。若是模塊A依賴於模塊B，則生成一個module_use結構，其中source_list字段鏈入模塊B的module結構的source_list鏈表，而source指針指向模塊A的module結構。而target_list加入到模塊A中的target_list鏈表，target指針指向模塊B的模塊結構，參考下面代碼。oop

static int add_module_usage(struct module *a, struct module *b)
{
    struct module_use *use;

    pr_debug("Allocating new usage for %s.\n", a->name);
    use = kmalloc(sizeof(*use), GFP_ATOMIC);
    if (!use) {
        printk(KERN_WARNING "%s: out of memory loading\n", a->name);
        return -ENOMEM;
    }

    use->source = a;
    use->target = b;
    list_add(&use->source_list, &b->source_list);
    list_add(&use->target_list, &a->target_list);
    return 0;
}

符號信息this

內核模塊幾乎不會做爲徹底獨立的存在，均須要引用其餘模塊的函數，而這一機制就是由符號機制保證的。參考前面的module數據結構，在

const struct kernel_symbol *syms;//導出符號信息，指向一個kernel_symbol的數組，有num_syms個表項。 

const unsigned long *crcs;//一樣有num_syms個表項，不過存儲的是符號的校驗和 

unsigned int num_syms;

syms指針指向一個符號數組，也能夠稱之爲符號表，不過是局部的符號表。看下kernel_symbol結構

struct kernel_symbol
{
    unsigned long value;
    const char *name;
};

結構很簡單，value記錄符號地址，而name天然就是符號名字了。原理很簡單，藉助於find_symbol函數看下內核若是解決位引用的符號

const struct kernel_symbol *find_symbol(const char *name,
                    struct module **owner,
                    const unsigned long **crc,
                    bool gplok,
                    bool warn)
{
    struct find_symbol_arg fsa;

    fsa.name = name;
    fsa.gplok = gplok;
    fsa.warn = warn;

    if (each_symbol_section(find_symbol_in_section, &fsa)) {
        if (owner)
            *owner = fsa.owner;
        if (crc)
            *crc = fsa.crc;
        return fsa.sym;
    }

    pr_debug("Failed to find symbol %s\n", name);
    return NULL;
}

首先把參數信息封裝成一個find_symbol_arg結構，而後調用了each_symbol_section，並傳入了在section中查找symbol的函數find_symbol_in_section

bool each_symbol_section(bool (*fn)(const struct symsearch *arr,
                    struct module *owner,
                    void *data),
             void *data)
{
    struct module *mod;
    static const struct symsearch arr[] = {
        { __start___ksymtab, __stop___ksymtab, __start___kcrctab,
          NOT_GPL_ONLY, false },
        { __start___ksymtab_gpl, __stop___ksymtab_gpl,
          __start___kcrctab_gpl,
          GPL_ONLY, false },
        { __start___ksymtab_gpl_future, __stop___ksymtab_gpl_future,
          __start___kcrctab_gpl_future,
          WILL_BE_GPL_ONLY, false },
#ifdef CONFIG_UNUSED_SYMBOLS
        { __start___ksymtab_unused, __stop___ksymtab_unused,
          __start___kcrctab_unused,
          NOT_GPL_ONLY, true },
        { __start___ksymtab_unused_gpl, __stop___ksymtab_unused_gpl,
          __start___kcrctab_unused_gpl,
          GPL_ONLY, true },
#endif
    };

    if (each_symbol_in_section(arr, ARRAY_SIZE(arr), NULL, fn, data))
        return true;

    list_for_each_entry_rcu(mod, &modules, list) {
        struct symsearch arr[] = {
            { mod->syms, mod->syms + mod->num_syms, mod->crcs,
              NOT_GPL_ONLY, false },
            { mod->gpl_syms, mod->gpl_syms + mod->num_gpl_syms,
              mod->gpl_crcs,
              GPL_ONLY, false },
            { mod->gpl_future_syms,
              mod->gpl_future_syms + mod->num_gpl_future_syms,
              mod->gpl_future_crcs,
              WILL_BE_GPL_ONLY, false },
#ifdef CONFIG_UNUSED_SYMBOLS
            { mod->unused_syms,
              mod->unused_syms + mod->num_unused_syms,
              mod->unused_crcs,
              NOT_GPL_ONLY, true },
            { mod->unused_gpl_syms,
              mod->unused_gpl_syms + mod->num_unused_gpl_syms,
              mod->unused_gpl_crcs,
              GPL_ONLY, true },
#endif
        };

        if (mod->state == MODULE_STATE_UNFORMED)
            continue;

        if (each_symbol_in_section(arr, ARRAY_SIZE(arr), mod, fn, data))
            return true;
    }
    return false;
}

首先考慮的天然是內核自身的符號，根據優先順序，定義了一個數組，內核中的導出符號記錄在全局的結構中，順序分別是__start___ksymtab、__start___ksymtab_gpl、__start___ksymtab_gpl_future、__start___ksymtab_unused、__start___ksymtab_unused_gpl。而後調用each_symbol_in_section進行遍歷數組，針對每個項，調用find_symbol_in_section進行查找。若是內核中的符號沒有包含指定符號，則須要查找其餘加載模塊的符號表，這就是局部符號表，方法相似，不過是表指針記錄在module結構中而不是全局的。不在贅述。看下find_symbol_in_section

static bool find_symbol_in_section(const struct symsearch *syms,
                   struct module *owner,
                   void *data)
{
    struct find_symbol_arg *fsa = data;
    struct kernel_symbol *sym;

    sym = bsearch(fsa->name, syms->start, syms->stop - syms->start,
            sizeof(struct kernel_symbol), cmp_name);

    if (sym != NULL && check_symbol(syms, owner, sym - syms->start, data))
        return true;

    return false;
}

該函數是根據是個符號表的起始和結束區間對符號進行查找，具體查找工做有bsearch完成，經過二分查找key,即符號名字。算法挺簡單，咱們也看下

void *bsearch(const void *key, const void *base, size_t num, size_t size,
          int (*cmp)(const void *key, const void *elt))
{
    size_t start = 0, end = num;
    int result;

    while (start < end) {
        size_t mid = start + (end - start) / 2;

        result = cmp(key, base + mid * size);
        if (result < 0)
            end = mid;
        else if (result > 0)
            start = mid + 1;
        else
            return (void *)base + mid * size;
    }

    return NULL;
}

找到一個結果就調用cmp進行比較，cmp爲開始傳遞進來的比較函數，本質仍是調用strcmp函數。有這裏能夠看出，符號表種符號是有順序的，即經過首字母進行排列，首字母相同則按照第二個字母，以此類推。這樣在找到symbol後會對其進行校驗，若是沒有找到就直接返回false了……

使用未導出的函數

一、定義函數指針

二、聲明函數

三、查找符號表

之內核中的lookup_swap_cache函數爲例，函數在內核中未導出，不能直接使用。經過查找符號表，把函數地址強制轉化成函數指針，能夠爲咱們所用。

一、定義函數指針

函數原始定義以下：struct page *lookup_swap_cache(swp_entry_t swp)

typedef struct page* (*LOOKUPSWAPCACHE)(swp_entry_t);

二、聲明函數

LOOKUPSWAPCACHE lookup_swap_cache_chen;

三、賦值

lookup_swap_cache_chen=(LOOKUPSWAPCACHE)kallsyms_lookup_name("lookup_swap_ cache");

這樣就能夠在本身的模塊中使用了lookup_swap_cache_chen函數了。須要包含頭文件#include <linux/kallsyms.h>

用戶空間信息

一下信息摘自：http://www.blogbus.com/wanderer-zjhit-logs/172382425.html

內核符號表（kernel symbol table）變量名或者函數名組成，每一項是符號和地址的序對，就像域名和ip地址，格式以下：

[root@rx6600 boot]# head System.map
000000000479c4a0 A phys_start
a000000000000600 A __start_gate_mckinley_e9_patchlist
a000000000000604 A __end_gate_mckinley_e9_patchlist
a000000000000604 A __end_gate_vtop_patchlist
a000000000000604 A __start_gate_fsyscall_patchlist
a000000000000604 A __start_gate_vtop_patchlist
對於系統的oop消息、或者經過gdb的調試消息，都須要根據該對照表，將內核熟悉的函數地址轉化爲用戶熟悉的函數名稱，便於用戶進行故障定位、運行監控。
內核符號表存儲位置
System.map
磁盤中真實存在的文件，存儲內核中靜態編譯的函數和變量地址，每一個新編譯內核對應一個System.map文件，當klogd輸出內核消息時，會經過/boot/System.map來將函數、變量地址轉換爲名稱，方便用戶理解。該文件對應不一樣的編譯內核有對應的實現文件。
/proc/kallsyms
內核啓動時候建立,共oops時定位錯誤，文件大小總爲0，包含當前內核導出的、可供使用的變量或者函數
類似點：都是內核函數、變量的符號表，結構一致；對於可導出的內核變量、函數，其運行時在物理內存中的位置是同樣的。
區別：
二者側重點不一樣，System.map文件面向內核，對於內核中的沒有導出的變量或者函數名，好比kthread_create_list鏈表頭指針，也有其相應的內核地址，該文件通常是隻讀的、固定大小的，沒有動態添加模塊中的變量、函數名；而System.map在內核啓動過程當中建立，並實時更新，反映的是系統的當前最新狀況，其內部也包含內核或者是已加載模塊導出的函數、變量名稱。因此和System.map文件有差異，且文件動態變化，大小不固定。

注：/proc/kmsg文件保存了內核從最開始啓動到正常運行時的全部內核輸出消息，是內核在運行過程當中經過printk輸出的。
若是klogd啓動，klogd讀取/proc/kmsg文件的內容，而後經過syslogd程序，寫到/var/log/messages文件中，固然，syslogd能夠經過syslogd.conf文件進行配置。利用dmesg，其實也是讀取/proc/kmsg文件內容，而後顯示到終端。
dmesg和klogd都是利用了System.map文件將內核地址轉化爲對應的函數名稱，方便用戶調試。
在內核運行出現問題時，通常因爲引用了一個無效指針形成的oops錯誤，若是在應用層，通常應用程序不可能從段錯誤（即引用無效地址）中恢復，可是因爲內核穩定性比較高，通常只是會將該內核模塊殺死，並使系統維持在一個穩定狀態；若是出現更嚴重狀況，即內核出現panic，就會宕機重啓。