redis源碼解析-基礎數據-dict

時間 2019-11-16

標籤 redis 源碼解析基礎數據 dict 欄目 Redis 简体版

原文原文鏈接

太長不看版java

redis字典底層使用哈希表實現

使用除留餘數法進行散列，用到了SipHash算法

使用單獨鏈表法解決衝突

經過擴張(長度變動爲首個>= 2 * used的)與收縮(長度變動爲首個 >= used的)哈希表維持載荷因子大小合理。(used爲目前已有鍵值對個數)

有持久化子進程時因子>=5 擴張,不能收縮。無持久化進程時，因子 >= 1擴張， < 0.1收縮。

rehash操做是漸進處理的，分散在觸發後對當前字典的每一個增刪改查操做中。

本篇解析基於redis 5.0.0版本，本篇涉及源碼文件爲dict.c, dict.h, siphash.c。python

dict全稱dictionary，使用鍵-值（key-value）存儲，具備極快的查找速度。常見的高級語言中都有對應的內置數據類型，python中爲dict，java/c++中爲map。c++

沒接觸太高級語言？不要緊，往下看，看完本身寫一個！git

dict相關結構定義

// 字典定義
typedef struct dict {
    // 類型信息 是一個針對某類型的字典操做函數的集合
    dictType *type;
    // 保存須要傳給那些類型特定函數的可選參數，例如複製鍵/複製值等操做函數
    void *privdata;
    // 一個長度爲2的dict_hast_table數組
    dictht ht[2];
    // rehash標記
    long rehashidx; /* rehashing not in progress if rehashidx == -1 */
    // 鍵值對個數
    unsigned long iterators; /* number of iterators currently running */
} dict;

// 字典類型數據定義
typedef struct dictType {
    uint64_t (*hashFunction)(const void *key);
    void *(*keyDup)(void *privdata, const void *key);
    void *(*valDup)(void *privdata, const void *obj);
    int (*keyCompare)(void *privdata, const void *key1, const void *key2);
    void (*keyDestructor)(void *privdata, void *key);
    void (*valDestructor)(void *privdata, void *obj);
} dictType;

// 哈希表定義
typedef struct dictht {
    // dictEntry* 類型數組
    dictEntry **table;
    // 哈希表大小
    unsigned long size;
    // 始終等於size - 1, 進行散列時有用到
    // 爲何單獨一個字段存儲: 只在增刪的時候修改，頻繁操做下減小計算(讀多寫少)
    unsigned long sizemask;
    // 目前已有鍵值對數量
    unsigned long used;
} dictht;

// 哈希節點定義
typedef struct dictEntry {
    // 鍵
    void *key;
    // 值
    union {
        void *val;
        uint64_t u64;
        int64_t s64;
        double d;
    } v;
    // 用來解決hash衝突
    struct dictEntry *next;
} dictEntry;
複製代碼

從上述定義能夠看出，redis實現的dict使用哈希表實現。衆所周知，影響哈希表查找效率有如下三個因素：github

1.散列函數是否均勻；redis

2.處理衝突的方法；算法

3.散列表的載荷因子（英語：load factor）。編程

不周知的同窗戳維基百科周知一下數組

因而就引出了三個問題:ruby

1.redis的哈希表是如何進行散列？

2.redis的哈希表如何解決衝突？

3.redis是如何保證哈希表的載荷因子處於合理區間？

redis的哈希表是如何進行散列

/* Returns the index of a free slot that can be populated with * an hash entry for the given 'key'. * If the key already exists, -1 is returned. */
static int _dictKeyIndex(dict *ht, const void *key) {
    unsigned int h;
    dictEntry *he;

    /* Expand the hashtable if needed */
    if (_dictExpandIfNeeded(ht) == DICT_ERR)
        return -1;
    /* Compute the key hash value */
    // 計算hash值後與sizemask取餘得到散列地址
    h = dictHashKey(ht, key) & ht->sizemask;
    /* Search if this slot does not already contain the given key */
    he = ht->table[h];
    while(he) {
        if (dictCompareHashKeys(ht, key, he->key))
            return -1;
        he = he->next;
    }
    return h;
}
複製代碼

散列函數通常有6種方法: 直接定址法、數字分析法、平方取中法、摺疊法、隨機數法、除留餘數法。redis內部實現採用了除留餘數法。

除留餘數法

取關鍵字被某個不大於散列表表長m的數p除後所得的餘數爲散列地址。

除留餘數法中的p, redis使用SipHash算法來進行計算，從而減小哈希衝突。值得一提的是python、perl、ruby等編程語言也使用SipHash做爲哈希算法。

/* The default hashing function uses SipHash implementation * in siphash.c. */

uint64_t siphash(const uint8_t *in, const size_t inlen, const uint8_t *k);
uint64_t siphash_nocase(const uint8_t *in, const size_t inlen, const uint8_t *k);

uint64_t dictGenHashFunction(const void *key, int len) {
    return siphash(key,len,dict_hash_function_seed);
}

uint64_t dictGenCaseHashFunction(const unsigned char *buf, int len) {
    return siphash_nocase(buf,len,dict_hash_function_seed);
}
複製代碼

redis的哈希表如何解決衝突

dictEntry *dictAddRaw(dict *d, void *key, dictEntry **existing) {
    long index;
    dictEntry *entry;
    dictht *ht;
    
    // ...
    entry = zmalloc(sizeof(*entry));
    // 將新增節點放在衝突鏈頭部，由於是單向鏈表
    entry->next = ht->table[index];
    ht->table[index] = entry;
    ht->used++;

    /* Set the hash entry fields. */
    dictSetKey(d, entry, key);
    return entry;
}
複製代碼

處理哈希衝突方法有: 線性探測法、平方探測法、僞隨機探測法、單獨鏈表法、雙散列法和再散列法。從上述代碼中能夠看出，redis採用了單獨鏈表法，在出現衝突時，將新加入節點放在鏈表頭節點(由於是單向鏈表，獲取尾部節點須要O(n)複雜度)。

redis是如何保證哈希表的載荷因子處於合理區間

載荷因子 = 填入表中的元素個數 / 哈希表的長度

考慮如下三種狀況:

載荷因子等於表中鍵值對個數, 即哈希表長度爲1，此時哈希表退化爲一個單向鏈表，查找元素的複雜度爲O(n)。
載荷因子爲0.1，即表中鍵值個數爲哈希表長度的1/10，此時查找元素複雜度爲O(1)。可是有個問題，內存的利用率過低了。
載荷因子爲1，即元素個數等於哈希表長度，此時是理想狀態，能夠快速查找，同時100%利用率，很少很多剛恰好。

經過上述分析，咱們能夠看到，載荷過高很差，影響效率，過低也很差，內存利用率過低，不划算。最好是始終保持載荷爲1，可是顯然不現實，因此只能是動態的檢測，高了就把哈希表擴張下，低了就把哈希表收縮下，始終將載荷因子維持一個合理的區間。

擴張與收縮策略

// 哈希表擴張函數(包含收縮)
int dictExpand(dict *d, unsigned long size)
{
    // ...
    dictht n; /* the new hash table */
    // 實際擴張或縮小後的大小
    // 2的次方中第一個大於等於size的數
    unsigned long realsize = _dictNextPower(size);
    // ...
}

static unsigned long _dictNextPower(unsigned long size) {
    unsigned long i = DICT_HT_INITIAL_SIZE;

    if (size >= LONG_MAX) return LONG_MAX;
    while(1) {
        if (i >= size)
            return i;
        i *= 2;
    }
}
複製代碼

擴張

void updateDictResizePolicy(void) {
    // 若是不存在rdb或aof文件變動子進程，resize標記爲1
    if (server.rdb_child_pid == -1 && server.aof_child_pid == -1)
        // dict_can_resize = 1;
        dictEnableResize();
    // 不然resize標記爲0
    else
        // dict_can_resize = 0;
        dictDisableResize();
}

/* 若是須要進行哈希擴張 */
static int _dictExpandIfNeeded(dict *d)
{
    // ...
    // 若是已存在鍵值對數量大於哈希表大小(載荷因子大於1) 且resize標記爲1可進行擴張
    // static unsigned int dict_force_resize_ratio = 5;
    // 若是 resize標記爲0，則載荷因子大於5 可進行擴張
    if (d->ht[0].used >= d->ht[0].size &&
        (dict_can_resize ||
         d->ht[0].used/d->ht[0].size > dict_force_resize_ratio))
    {
        // 哈希表長度擴張爲 2的次方中第一個大於等於已有鍵值對數量兩倍
        return dictExpand(d, d->ht[0].used*2);
    }
    return DICT_OK;
}

複製代碼

當不存在持久化子進程時，載荷因子>=1時擴張，擴張後長度爲2的次方中首個>= used(已有鍵值個數) * 2的數。例如: 本來哈希表長度是5，有10個鍵值對。擴張後長度是32。2 4 8 16 32...中第一個大於10 * 2的是32。

而存在持久化子進程時載荷因子>=5才能夠擴張，這是爲了不子進程寫時複製致使的沒必要要的內存分配。

收縮

#define HASHTABLE_MIN_FILL 10 /* Minimal hash table fill 10% */

int htNeedsResize(dict *dict) {
    long long size, used;

    size = dictSlots(dict);
    used = dictSize(dict);
    // 負載因子小於 0.1則進行收縮
    return (size > DICT_HT_INITIAL_SIZE &&
            (used*100/size < HASHTABLE_MIN_FILL));
}

/* Resize the table to the minimal size that contains all the elements, * but with the invariant of a USED/BUCKETS ratio near to <= 1 */
int dictResize(dict *d) {
    int minimal;
    // 只有resize標記爲1且當前不處於rehash狀態時能夠進行resize操做
    if (!dict_can_resize || dictIsRehashing(d)) return DICT_ERR;
    minimal = d->ht[0].used;
    // #define DICT_HT_INITIAL_SIZE 4
    if (minimal < DICT_HT_INITIAL_SIZE)
        minimal = DICT_HT_INITIAL_SIZE;
    // 哈希表長度縮小爲 2的次方中第一個大於等於 4與當前已擁有鍵值對數量中的較小值
    return dictExpand(d, minimal);
}
複製代碼

載荷因子< 0.1時收縮，收縮後哈希表長度爲 4與used(已擁有鍵值對個數）中的較小值，這個動做只有不存在持久化子進程且不處於rehash狀態時進行。後者好理解，可是有子進程時爲啥擴張的時候只是調高了執行條件，收縮的時候直接就不讓執行了？

由於寫時複製只要是父進程的內存發生變化，子進程就會進行內存分配。而前面說了，須要擴張是由於查詢效率過低了，性能的下降對於redis是不能接受的。而須要收縮時，僅僅只是浪費了一點內存沒有釋放，短期內是能夠接受的。

rehash如何執行

分析完了rehash中的收縮和擴張的策略，咱們再來看下rehash具體是怎麼執行的。前邊咱們說了dict結構有兩個哈希表，多出來的那個哈希表就是用來rehash中臨時使用的。具體步驟以下：

根據前邊所說策略觸發哈希表擴張/收縮動做，爲備胎d->ht[1]分配調整以後長度的內存。將rehash標記rehashidx置爲0表示rehash開始(初始爲-1表示當前未進行rehash)。

int dictExpand(dict *d, unsigned long size) {
    // ...
    /* Prepare a second hash table for incremental rehashing */
    d->ht[1] = n;
    d->rehashidx = 0;
    return DICT_OK;
}
複製代碼

當rehashidx不爲-1時，該字典每次進行增刪改查是都會執行rehash一步，執行完以後對rehashidx加1。

// 執行一步rehash, 遷移d->ht[0]中rehashidx對應索引以後第一個非空元素(多是一個鏈表)到備胎上
static void _dictRehashStep(dict *d) {
    if (d->iterators == 0) dictRehash(d,1);
}

int dictRehash(dict *d, int n) {
    int empty_visits = n*10; /* Max number of empty buckets to visit. */
    if (!dictIsRehashing(d)) return 0;

    while(n-- && d->ht[0].used != 0) {
        dictEntry *de, *nextde;

        /* Note that rehashidx can't overflow as we are sure there are more * elements because ht[0].used != 0 */
        assert(d->ht[0].size > (unsigned long)d->rehashidx);
        while(d->ht[0].table[d->rehashidx] == NULL) {
            d->rehashidx++;
            if (--empty_visits == 0) return 1;
        }
        de = d->ht[0].table[d->rehashidx];
        /* Move all the keys in this bucket from the old to the new hash HT */
        while(de) {
            uint64_t h;

            nextde = de->next;
            /* Get the index in the new hash table */
            h = dictHashKey(d, de->key) & d->ht[1].sizemask;
            de->next = d->ht[1].table[h];
            d->ht[1].table[h] = de;
            d->ht[0].used--;
            d->ht[1].used++;
            de = nextde;
        }
        d->ht[0].table[d->rehashidx] = NULL;
        d->rehashidx++;
    }
    // ...

    /* More to rehash... */
    return 1;
}
複製代碼

最終在某一時間點d->ht[0]的全部鍵值對都被遷移到備胎d->ht[1]上，此時會將d->ht[0]內存釋放，從備胎手裏搶回全部數據，而後卸磨殺驢把備胎打回原形(null指針)。最後把rehashidx置爲-1，告訴全部人rehash結束了。

int dictRehash(dict *d, int n) {
    int empty_visits = n*10; /* Max number of empty buckets to visit. */
    if (!dictIsRehashing(d)) return 0;

    // ...

    /* Check if we already rehashed the whole table... */
    if (d->ht[0].used == 0) {
        zfree(d->ht[0].table);
        d->ht[0] = d->ht[1];
        _dictReset(&d->ht[1]);
        d->rehashidx = -1;
        return 0;
    }

    /* More to rehash... */
    return 1;
}
複製代碼

rehash爲何要搞成漸進處理？

當字典數據量小的時候，rehash一次性搞定很快很方便，感受如今的這種處理方法不少餘很繁瑣，可是若是數據量比較大的時候，幾百萬甚至幾千萬條數據時，只是算個hash值就須要龐大的計算量，若是要一次性搞定服務器就沒法正常工做了，即使不gg也會對服務性能形成很大的影響。因此redis採用了愚公移山的辦法，一點一點的處理。

而在rehash處理過程當中，刪改查等操做查找key都是先找d->ht[0]，沒找到再找備胎d->ht[1]。以查找key爲例：

dictEntry *dictFind(dict *d, const void *key) {
    dictEntry *he;
    uint64_t h, idx, table;

    if (d->ht[0].used + d->ht[1].used == 0) return NULL; /* dict is empty */
    // 執行了一步rehash
    if (dictIsRehashing(d)) _dictRehashStep(d);
    // 計算hash值
    h = dictHashKey(d, key);
    // 兩個哈希表進行遍歷
    for (table = 0; table <= 1; table++) {
        // 取餘求索引
        idx = h & d->ht[table].sizemask;
        he = d->ht[table].table[idx];
        while(he) {
            if (key==he->key || dictCompareKeys(d, key, he->key))
                return he;
            he = he->next;
        }
        // 沒有進行rehash時，只查詢d->ht[0]
        if (!dictIsRehashing(d)) return NULL;
    }
    return NULL;
}
複製代碼