Nginx源代碼分析-hash

時間 2019-11-06

標籤 nginx 源代碼分析 hash 欄目 Nginx 简体版

原文原文鏈接

本文分析基於Nginx-1.2.6，與舊版本或未來版本可能有些許出入，但應該差異不大，可作參考html

上篇對nginx中的hash分析尚未結束，由於帶有wildcard或wc字樣的結構體和函數尚未說明。但我也不知道該從何提及。（其實這篇我理解的可能有錯，由於涉及到http module中的代碼，還沒讀透，因此還不吝賜教）nginx

#從配置文件的server_name指令提及# 在這裏有這個指令的詳解，這裏也有個例子。這個指令有一個動做是比較HTTP報文頭中的HOST與server_name的參數，選擇最匹配的那個server配置，匹配的前後順序爲：正則表達式

精確匹配，即HOST與server_name中某個參數相同
與帶前置通配符的域名匹配，例如sub.domain.com匹配*.domain.com
與帶後置通配符的域名匹配，例如www.example.com匹配www.example.*
與某個正則表達式匹配

好比有配置以下：數組

#下面這兩個配置等價，由於.example.com等價於 example.com  *.example.com
server {
  server_name   example.com  *.example.com  www.example.*;
}

server {
  server_name  .example.com  www.example.*;
}

當讀取完配置加載到內存後，應該維持一個什麼樣的數據結構，才能在每一個HTTP請求到來時按上述順序快速尋找到最佳匹配？數據結構

#哈希表的功效#dom

容易想到的是hash表，其實nginx中正是利用了三個hash表實現這項功能，一個是普通hash表對應不帶通配符的server_name配置，另兩個對應帶通配符的配置。函數

###爲方便描述稱存放帶通配符的key的哈希表爲wildcard hash，通常hash爲exact hash。###ui

系統處理過程大體爲：.net

解析出請求的HOSTunix
而後調用ngx_http_find_virtual_server(r,host,len)，其中r是封裝了HTTP請求的一個結構體，找到最佳匹配的配置cscf（類型爲ngx_http_core_srv_conf_t），並在r結構體中記錄下來，對應代碼以下所示。
繼續往下處理

第二步代碼：

<!-- lang: cpp -->
cscf = ngx_hash_find_combined(&r->virtual_names->names,
                              ngx_hash_key(host, len), host, len);
.....
r->srv_conf = cscf->ctx->srv_conf;
r->loc_conf = cscf->ctx->loc_conf;

在尋找最佳匹配時調用了ngx_hash.c中定義的一個函數，要重點關注這個調用：

<!-- lang: cpp -->
 cscf = ngx_hash_find_combined(&r->virtual_names->names,
                              ngx_hash_key(host, len), host, len);

/*r->virtual_names是這個類型，與server_name指令的匹配順序能夠對着看*/
typedef struct {
     ngx_hash_combined_t              names;

     ngx_uint_t                       nregex;
     ngx_http_server_name_t          *regex;
} ngx_http_virtual_names_t;

typedef struct {
    ngx_hash_t            hash;
    ngx_hash_wildcard_t  *wc_head;
    ngx_hash_wildcard_t  *wc_tail;
} ngx_hash_combined_t;

如今回到正題，ngx_hash_combined_t中有三個hash表，調用ngx_hash_find_combined(hash,key,name,len)時，就是在按前後順序在三個hash表中查找對應的項。

<!-- lang: cpp -->
void *
ngx_hash_find_combined(ngx_hash_combined_t *hash, ngx_uint_t key, u_char *name,
    size_t len)
{
    void  *value;
    
    /*在普通hash表中查找是否有精準匹配*/
    if (hash->hash.buckets) {
        value = ngx_hash_find(&hash->hash, key, name, len);

        if (value) {
            return value;
        }
    }

    if (len == 0) {
        return NULL;
    }
    /*在wc_head哈希表中查找是否能與某個帶前置通配符的項匹配*/
    if (hash->wc_head && hash->wc_head->hash.buckets) {
        value = ngx_hash_find_wc_head(hash->wc_head, name, len);

        if (value) {
        return value;
        }
    }
    /*在wc_tail哈希表中查找是否能與某個帶後置通配符的項匹配*/
    if (hash->wc_tail && hash->wc_tail->hash.buckets) {
        value = ngx_hash_find_wc_tail(hash->wc_tail, name, len);

        if (value) {
            return value;
        }
    }

    return NULL;
}

#wildcard hash 初始化#

初始化過程是ngx_hash_wildcard_init中進行的，但咱們只知道這個函數的第二個參數是ngx_hash_key_t數組，其實這個數組中存放的並非帶通配符的域名，而是通過轉換以後的。由於對sub.example.com選擇匹配時可能會要判斷是否有*.example.com或sub.example.*之類的server_name配置，那怎麼才能把sub.example.com快速匹配到*.example.com而不是*.example.org呢？

並且當給出的key值既有不帶通配符的記作A，又有帶前置通配符的記爲B，又有帶後置通配符的記爲C，咱們但願從中篩選出A存放到exact hash中，B存放到_head wildcard hash中，C存放到tail wildcard hash中，ngx_hash_keys_array_init和ngx_hash_add_key是作這個的，源碼裏面有註釋。在說明這兩個函數以前，先看下涉及到的數據結構：

<!-- lang: cpp -->

typedef struct {
    ngx_uint_t        hsize;

    ngx_pool_t       *pool;
    ngx_pool_t       *temp_pool;

    ngx_array_t       keys;
    ngx_array_t      *keys_hash;

    ngx_array_t       dns_wc_head;
    ngx_array_t      *dns_wc_head_hash;

    ngx_array_t       dns_wc_tail;
    ngx_array_t      *dns_wc_tail_hash;
} ngx_hash_keys_arrays_t;

這個數據結構的說明在這裏，下面這段是從他那裏複製過來的

hsize: 將要構建的hash表的桶的個數。對於使用這個結構中包含的信息構建的三種類型的hash表都會使用此參數。

pool: 構建這些hash表使用的pool。

temp_pool:在構建這個類型以及最終的三個hash表過程當中可能用到臨時pool。該temp_pool能夠在構建完成之後，被銷燬掉。這裏只是存放臨時的一些內存消耗。

keys: 存放全部非通配符key的數組。

keys_hash: 這是個二維數組，第一個維度表明的是bucket的編號，那麼keys_hash[i]中存放的是全部的key算出來的hash值對hsize取模之後的值爲i的key。假設有3個key,分別是key1,key2和key3假設hash值算出來之後對hsize取模的值都是i，那麼這三個key的值就順序存放在keys_hash[i][0],keys_hash[i][5], keys_hash[i][6]。該值在調用的過程當中用來保存和檢測是否有衝突的key值，也就是是否有重複。

dns_wc_head: 存放前向通配符key被處理完成之後的值。好比：「*.abc.com」被處理完成之後，變成「com.abc.」被存放在此數組中。

dns_wc_tail: 存放後向通配符key被處理完成之後的值。好比：「mail.xxx.*」被處理完成之後，變成「mail.xxx.」被存放在此數組中。

dns_wc_head_hash: 該值在調用的過程當中用來保存和檢測是否有衝突的前向通配符的key值，也就是是否有重複。 dns_wc_tail_hash: 該值在調用的過程當中用來保存和檢測是否有衝突的後向通配符的key值，也就是是否有重複。

注：keys，dns_wc_head，dns_wc_tail，三個數組中存放的元素時ngx_hash_key_t類型的，而keys_hash,dns_wc_head_hash，dns_wc_tail_hash，三個二維數組中存放的元素是ngx_str_t類型的。

ngx_hash_keys_array_init就是爲上述結構分配空間。

ngx_hash_add_key是將帶或不帶通配符的key轉換後存放在上述結構中的，其過程是

先看傳入的第三個參數標誌標明的key是否是NGX_HASH_WILDCARD_KEY， * 若是不是，則在ha->keys_hash中檢查是否衝突，衝突就返回NGX_BUSY，不然，就將這一項插入到ha->keys中。 * 若是是，就判斷通配符類型，支持的統配符有三種"*.example.com", ".example.com", and "www.example.*"，而後將第一種轉換爲"com.example."並插入到ha->dns_wc_head中，將第三種轉換爲"www.example"並插入到ha->dns_wc_tail中，對第二種比較特殊，由於它等價於"*.example.com"+"example.com",因此會一份轉換爲"com.example."插入到ha->dns_wc_head，一份爲"example.com"插入到ha->keys中。固然插入前都會檢查是否衝突。

調用ngx_hash_wildcard_init初始化後，生成的哈希表結構（是個決策樹吧）以下圖所示：

註釋：

<!-- lang: cpp -->    
    /*
     * the 2 low bits of value have the special meaning:
     *     00 - value is data pointer for both "example.com"
     *          and "*.example.com";
     *     01 - value is data pointer for "*.example.com" only;
     *     10 - value is pointer to wildcard hash allowing
     *          both "example.com" and "*.example.com";
     *     11 - value is pointer to wildcard hash allowing
     *          "*.example.com" only.
     */

在ngx_http_server_names中有使用。

<!-- lang: cpp -->    

 if (ngx_hash_keys_array_init(&ha, NGX_HASH_LARGE) != NGX_OK) {
    goto failed;
}
......
rc = ngx_hash_add_key(&ha, &name[n].name, name[n].server,
                              NGX_HASH_WILDCARD_KEY);

 if (ha.keys.nelts) {
    hash.hash = &addr->hash;
    hash.temp_pool = NULL;

    if (ngx_hash_init(&hash, ha.keys.elts, ha.keys.nelts) != NGX_OK) {
        goto failed;
    }
}

if (ha.dns_wc_head.nelts) {

    /*這裏有排序*/
    ngx_qsort(ha.dns_wc_head.elts, (size_t) ha.dns_wc_head.nelts,
              sizeof(ngx_hash_key_t), ngx_http_cmp_dns_wildcards);

    hash.hash = NULL;
    hash.temp_pool = ha.temp_pool;

    if (ngx_hash_wildcard_init(&hash, ha.dns_wc_head.elts,
                               ha.dns_wc_head.nelts)
        != NGX_OK)
    {
        goto failed;
    }

    addr->wc_head = (ngx_hash_wildcard_t *) hash.hash;
}

if (ha.dns_wc_tail.nelts) {
    /*這裏有排序*/
    ngx_qsort(ha.dns_wc_tail.elts, (size_t) ha.dns_wc_tail.nelts,
              sizeof(ngx_hash_key_t), ngx_http_cmp_dns_wildcards);

    hash.hash = NULL;
    hash.temp_pool = ha.temp_pool;

    if (ngx_hash_wildcard_init(&hash, ha.dns_wc_tail.elts,
                               ha.dns_wc_tail.nelts)
        != NGX_OK)
    {
        goto failed;
    }

    addr->wc_tail = (ngx_hash_wildcard_t *) hash.hash;
}

#wildcard hash 查找# 有了上面wildcard hash的樹狀的結構圖，會容易地讀懂代碼。

ngx_hash_find_wc_head： * 好比查找sub.example.com，會首先在wildcard hash中查找com，並根據com的value的低兩位判斷，發現是11，就繼續在級聯的hash中查找example，發現其value低兩位是01，而待查找的key還有sub，則返回((uintptr_t) value & (uintptr_t) ~3)，即sub.example.com匹配*.example.com。 * 但當你要查找的是example.com時，在第二級hash對應項example中發現value低兩位是01，而待查找的key已從後往前遍歷到頭了，則返回NULL，說明example.com不匹配*.example.com。 * 而當查找domain.com時，先查找com，其value低兩位是11，繼續在級聯hash中查找domain，發現其value是10，且待查找key已從後往前遍歷到頭，則返回級聯的wildcard hash中的value值和~3的與。說明domain.com 匹配 .domain.com。
ngx_hash_find_wc_tail：後置通配符與前置通配符的處理相似，並且更簡單，由於後置通配符沒有相似"*.example.com"與」.example.com"的區別。

這篇寫的雜亂。。哎。。。