Redis研究-3.2 數據結構之關聯數組（字典）

時間 2019-11-10

標籤 redis 研究 3.2 數據結構關聯數組字典欄目 Redis 简体版

原文原文鏈接

這個章節要學習到的源碼都是在dict.h和dict.c兩個文件中 java

在java語言或者其餘支持關聯數組的的語言中，咱們最早知道的就是關聯數組（字典）就是key-value的「數組」，那麼，在Redis中又是如何一步一步來實現的呢？咱們先分解一下，關聯數組（字典）就是key-value的「數組」，這句話，首先必需要有key-value這個結構數組

//key-value結構
typedef struct dictEntry {
    
    // 鍵
    void *key;

    // 值
    union {
        void *val;
        uint64_t u64;
        int64_t s64;
    } v;

    // 爲何須要這個呢？這是用來解決鍵衝突的問題的
    struct dictEntry *next;

} dictEntry;

上面定義的這個結構，key表明鍵，值能夠是一個指針，也能夠是一個uint64_t的整數，也能夠是一個int64_t的整數。那麼，next的具體做用是什麼呢？這個指針的做用是能夠將多個哈希值相同的鍵值對鏈接在一塊兒，能夠用來解決鍵衝突的問題。安全

接下來的問題就是，如何構建一個「數組」？在Redis中的定義見下面的代碼：函數

typedef struct dictht {
    
    // 數組
    dictEntry **table;

    // 大小
    unsigned long size;
    unsigned long sizemask;

    //已有節點的數量
    unsigned long used;

} dictht;

上面的table就是一個數組，每一個數組的元素就是一個指向dictEntry的指針。而size屬性則記錄了table中的大小，爲何會有這個玩意兒呢？咱們平時常常聽到有叫「哈希桶」，這個的做用就是「哈希桶」的做用，用來標明這個哈希表有多少個桶，那麼，used又是什麼呢？他表明了table中如今的元素個數（不過，我以爲更應該叫作已經佔用了多少個索引了）。如今還差一個sizemask，他是神馬呢？他是和哈希是密切相關的，sizemark的大小始終等於size-1,至於和哈希有關的東西，後面用到再來講。性能

下一步，就應該是咱們的終極實現目標-關聯數組（字典），在Redis中，他是這樣來定義的：學習

typedef struct dict {

    dictType *type;

    void *privdata;

    dictht ht[2];

    int rehashidx; /* rehashing not in progress if rehashidx == -1 */

    int iterators; /* number of iterators currently running */

} dict;

咱們知道，要實現一個通用的字典，你定義的時候，是不能使用具體類型的，於是，也就不能指定特定的操做，所以，在在Redis的字典裏，針對不一樣的類型，你是能夠本身配置本身的操做的，type屬性就是起到這個做用，他的定義以下： ui

//針對不一樣的字典類型，綁定不一樣的操做函數
typedef struct dictType {

    // 計算哈希值的函數
    unsigned int (*hashFunction)(const void *key);

    // 複製鍵的函數
    void *(*keyDup)(void *privdata, const void *key);

    // 複製值的函數
    void *(*valDup)(void *privdata, const void *obj);

    // 對比鍵的函數
    int (*keyCompare)(void *privdata, const void *key1, const void *key2);

    // 銷燬鍵的函數
    void (*keyDestructor)(void *privdata, void *key);
    
    // 銷燬值的函數
    void (*valDestructor)(void *privdata, void *obj);

} dictType;

那麼，privdata屬性用來幹什麼呢？咱們從針對不一樣的類型能夠綁定的不一樣的函數來看，咱們先把這個屬性看作一個存儲通常數據的屬性了。 this

真正用來存儲數據的就是ht數組，他有兩個dictht類型的元素，爲何須要兩個？其中的一個用來存儲真實的key-value，另一個是用來rehash用的。 spa

rehashidx這個整數用來幹嗎呢？用來標明rehash的進度，若是這個字典沒有rehash，那麼他的值就是-1. 線程

iterators整數用來記錄正在使用在當前字典上的迭代器。

從key-value結構定義到key-value的數組（table）定義，再到字典定義，實現路線已經很清楚了。根據上面的定義咱們能夠看到，到目前爲止，咱們還有三個關鍵的實現或者概念沒有講清楚，分別是哈希和衝突、重哈希。

什麼是哈希？

舉個簡單的例子，當咱們要把一個鍵值對k1-v1加入到一個字典dict中，從上面咱們知道，真正存儲數據的是這個dict中的ht數組，而這個ht素組的元素是dictht，也是一個數組，對於數組的話，最經常使用的一個屬性就是數組的索引，所以，你要把這個鍵值對加入到這個字典的數組中，就須要計算出來這個鍵值對應該放在字典的數組的哪個索引上。

針對上面的描述，當咱們要把一個鍵值對加入到字典中的時候，須要經歷下面的步驟：

1.用這個dict（字典）的type中的hashFunction來計算這個鍵值的哈希值：

keyHashValue=dict->type->hashFunction(k1);

2.咱們前面說過，哈希表中有兩個很重要的屬性，一個是size（用來標明有多少個哈希桶），另一個就是sizemark屬性（他的值等於size-1），用sizemark和上面獲得的hash值，就能夠獲得數組的索引：

index=keyHashValue&ht[0].sizemark;//咱們指定存儲數據的是ht的第一個哈希表

從上面的兩個步驟來看，這裏的性能和數據分佈狀況主要取決於你綁定的哈希函數。

什麼是哈希衝突？
爲何會存在哈希衝突？咱們從上面添加新的鍵值對的步驟來看，咱們極有可能會遇到不一樣的鍵計算出來的數組的索引是相同的，這個時候咱們就說存在了哈希衝突。那麼，在Redis中，他是怎麼來解決這個問題的呢？答案就是咱們提到的在dictEntry中定義的next指針啦。使用這個指針，有相同的哈希值的不一樣的鍵值對會造成一個鏈表。而咱們看到，造成的這種鏈表是沒有head和tail的，所以爲了性能考慮，新增的具備相同的哈希值的不一樣的鍵值對會放在這個鏈表的首部，從而下降複雜度。

什麼是重哈希（rehash）？

在說重哈希以前，咱們應該先明白什麼是負載因子。所謂負載因子就是你的散列表中已經存儲的節點的數量(N)除以散列表所能容納的能力(M),這裏的M>=N，那麼負載因子就是N/M，這個比值說明了，你的散列表的裝滿程度。

明白了負載因子後咱們更容易明白，爲何會存在重哈希了。在咱們對字典的操做中，會致使字典存儲的鍵值對愈來愈多或者愈來愈少，進而會致使負載因子出現大範圍波動，爲了保證這個負載因子是在咱們的範圍內，咱們須要進行重哈希。怎麼作呢？

在知足必定狀況下（這種狀況在之後的章節學習中再來說），程序會觸動衝哈希操做，操做的步驟是：

1.爲字典的ht[1]分配空間，這個空間的大小是第一個大於ht[0].used*2的2的n次方。（好比used=4，那麼4*2=8，而8正好是2的3次方。若是used=5,5*2=10，而大於10的2的n次方中的n應該取4，故ht[1]的大小應該是2^4=16,以此類推）。

2.將ht[0]中的鍵值對從新計算hash放到ht[1]上。

3.當ht[0]中的鍵值對所有已經轉移到了ht[1]上後，釋放ht[0]，並將ht[1]設置爲ht[0],並在ht[1]上新建一個空白的哈希表，供下一次使用。

可是，這裏就會存在一個問題，當ht[0]上的鍵值對超級多的時候，是否是中止響應，只作rehash了？那這樣子的話，Redis就沒有必要存在了，所以，在Redis中就採用一種漸進式的Rehash。怎麼玩呢？關鍵就是dict->rehashidx這個計數器起到的做用。

1.爲ht[1]分配空間，這個dict同時擁有ht[0]和ht[1]兩個哈希表；

2.在進行衝哈希的時候，將rehashidx設定爲正在衝哈希的索引；

3.將ht[0]上的鍵值對衝哈希到ht[1]上，重哈希完成後，rehashidx設置爲-1；

所以，在衝哈希期間，全部的操做都是針對兩個哈希表的。

大致已經說清楚了，下面就是經常使用的API啦

//建立一個新的字典
dict *dictCreate(dictType *type,
        void *privDataPtr)
{
    dict *d = zmalloc(sizeof(*d));

    _dictInit(d,type,privDataPtr);

    return d;
}

上面的函數用到了一個私有函數_dictInit。定義以下：

//初始化字典
int _dictInit(dict *d, dictType *type,
        void *privDataPtr)
{
    // 初始化，從下面的函數能夠看到，這裏並無分配空間
    _dictReset(&d->ht[0]);
    _dictReset(&d->ht[1]);

    // 設置類型特定函數
    d->type = type;

    // 設置私有數據
    d->privdata = privDataPtr;

    // 設置哈希表 rehash 狀態
    d->rehashidx = -1;

    // 設置字典的安全迭代器數量
    d->iterators = 0;

    return DICT_OK;
}

裏面用到了_dictReset私有函數：

static void _dictReset(dictht *ht)
{
    ht->table = NULL;
    ht->size = 0;
    ht->sizemask = 0;
    ht->used = 0;
}

//添加新的鍵值對
int dictAdd(dict *d, void *key, void *val)
{
    
    dictEntry *entry = dictAddRaw(d,key);

    // 鍵已存在
    if (!entry) return DICT_ERR;

    // 鍵不存在
    dictSetVal(d, entry, val);

    // 添加成功
    return DICT_OK;
}

dictEntry *dictAddRaw(dict *d, void *key)
{
    int index;
    dictEntry *entry;
    dictht *ht;

    // 若是dict正在進行hash，那麼就進行單步 rehash
    if (dictIsRehashing(d)) _dictRehashStep(d);

    /* Get the index of the new element, or -1 if
     * the element already exists. */
    // 計算鍵在哈希表中的索引值
    // 若是值爲 -1 ，那麼表示鍵已經存在
    if ((index = _dictKeyIndex(d, key)) == -1)
        return NULL;

    /* Allocate the memory and store the new entry */
    // 若是字典正在 rehash ，那麼將新鍵添加到 1 號哈希表
    // 不然，將新鍵添加到 0 號哈希表
    ht = dictIsRehashing(d) ? &d->ht[1] : &d->ht[0];
    // 爲新節點分配空間
    entry = zmalloc(sizeof(*entry));
    // 將新節點插入到鏈表表頭
    entry->next = ht->table[index];
    ht->table[index] = entry;
    // 更新哈希表已使用節點數量
    ht->used++;

    /* Set the hash entry fields. */
    // 設置新節點的鍵  dictSetKey(d, entry, key);

    return entry;
}

static void _dictRehashStep(dict *d) {
    if (d->iterators == 0) dictRehash(d,1);
}

int dictRehash(dict *d, int n) {
     //並非線程安全的哦
    // dict沒有在rehash的時候就直接返回
    if (!dictIsRehashing(d)) return 0;

    // 進行 n 步遷移
    while(n--) {
        dictEntry *de, *nextde;

        /* Check if we already rehashed the whole table... */
        // 若是 0 號哈希表爲空，那麼表示 rehash 執行完畢
        if (d->ht[0].used == 0) {
            // 釋放 0 號哈希表
            zfree(d->ht[0].table);
            // 將原來的 1 號哈希表設置爲新的 0 號哈希表
            d->ht[0] = d->ht[1];
            // 重置舊的 1 號哈希表
            _dictReset(&d->ht[1]);
            // 關閉 rehash 標識
            d->rehashidx = -1;
            // rehash 已經完成
            return 0;
        }

        /* Note that rehashidx can't overflow as we are sure there are more
         * elements because ht[0].used != 0 */
        // 確保 rehashidx 沒有越界
        assert(d->ht[0].size > (unsigned)d->rehashidx);

        // 略過數組中爲空的索引，找到下一個非空索引
        while(d->ht[0].table[d->rehashidx] == NULL) d->rehashidx++;

        // 指向該索引的鏈表表頭節點
        de = d->ht[0].table[d->rehashidx];
        /* Move all the keys in this bucket from the old to the new hash HT */
        // 將鏈表中的全部節點遷移到新哈希表
        while(de) {
            unsigned int h;

            // 保存下個節點的指針
            nextde = de->next;

            /* Get the index in the new hash table */
            // 計算新哈希表的哈希值，以及節點插入的索引位置
            h = dictHashKey(d, de->key) & d->ht[1].sizemask;

            // 插入節點到新哈希表
            de->next = d->ht[1].table[h];
            d->ht[1].table[h] = de;

            // 更新計數器
            d->ht[0].used--;
            d->ht[1].used++;

            // 繼續處理下個節點
            de = nextde;
        }
        // 將剛遷移完的哈希表索引的指針設爲空
        d->ht[0].table[d->rehashidx] = NULL;
        // 更新 rehash 索引
        d->rehashidx++;
    }

    return 1;
}

dictEntry *dictFind(dict *d, const void *key)
{
    dictEntry *he;
    unsigned int h, idx, table;

    // 字典爲空，直接返回NULL
    if (d->ht[0].size == 0) return NULL; /* We don't have a table at all */

    // 若是dict正在rehash，那麼就進行rehash
    if (dictIsRehashing(d)) _dictRehashStep(d);

    // 計算鍵的哈希值
    h = dictHashKey(d, key);
    // 在字典的哈希表中查找這個鍵，這裏的有兩個哈希表
    for (table = 0; table <= 1; table++) {

        // 計算索引值
        idx = h & d->ht[table].sizemask;

        // 遍歷給定索引上的鏈表的全部節點，查找 key
        he = d->ht[table].table[idx];
        while(he) {
			//找到就返回
            if (dictCompareKeys(d, key, he->key))
                return he;

            he = he->next;
        }
         //若是運行到這裏還沒找到，首先要判斷dict是否是在rehash，若是是，則要去另一個哈希表中找，找不到才返回NULL
        if (!dictIsRehashing(d)) return NULL;
    }

    // 進行到這裏時，說明兩個哈希表都沒找到
    return NULL;
}

//在dict中得到指定的key對應的value
void *dictFetchValue(dict *d, const void *key) {
    dictEntry *he;

    he = dictFind(d,key);

    return he ? dictGetVal(he) : NULL;
}

上面已經說了增、查，下面還有改、刪

static int dictGenericDelete(dict *d, const void *key, int nofree)
{
    unsigned int h, idx;
    dictEntry *he, *prevHe;
    int table;

    // dict爲空的話，返回刪除錯誤
    if (d->ht[0].size == 0) return DICT_ERR; /* d->ht[0].table is NULL */

    // 進行單步rehash
    if (dictIsRehashing(d)) _dictRehashStep(d);

    // 計算哈希值
    h = dictHashKey(d, key);

    // 遍歷哈希表
    for (table = 0; table <= 1; table++) {

        // 計算索引值 
        idx = h & d->ht[table].sizemask;
        // 指向該索引上的鏈表
        he = d->ht[table].table[idx];//這有可能就是一個鏈表
        prevHe = NULL;
        // 遍歷鏈表上的全部節點
        while(he) {
        
            if (dictCompareKeys(d, key, he->key)) {
                // 查找目標節點

                /* Unlink the element from the list */
                // 從鏈表中刪除
                if (prevHe)
                    prevHe->next = he->next;
                else
                    d->ht[table].table[idx] = he->next;

                // 釋放調用鍵和值的釋放函數？
                if (!nofree) {
                    dictFreeKey(d, he);
                    dictFreeVal(d, he);
                }
                
                // 釋放節點自己
                zfree(he);

                // 更新已使用節點數量，我的以爲這裏是有問題的，由於一個節點上可能存在一個鏈表，而此次刪除的有可能只是鏈表中的一部分，所以，節點數是不能少的
                d->ht[table].used--;

                // 返回已找到信號
                return DICT_OK;
            }

            prevHe = he;
            he = he->next;
        }

        // 若是執行到這裏，說明在 0 號哈希表中找不到給定鍵
        // 那麼根據字典是否正在進行 rehash ，決定要不要查找 1 號哈希表
        if (!dictIsRehashing(d)) break;
    }

    // 沒找到
    return DICT_ERR; /* not found */
}
int dictDelete(dict *ht, const void *key) {
    return dictGenericDelete(ht,key,0);//要調用釋放節點的函數
}
 
 
 
int dictDeleteNoFree(dict *ht, const void *key) {
    return dictGenericDelete(ht,key,1);//不調用釋放函數
}

int dictReplace(dict *d, void *key, void *val)
{
    dictEntry *entry, auxentry;

    /* Try to add the element. If the key
     * does not exists dictAdd will suceed. */
    // 嘗試直接將鍵值對添加到字典
    // 若是鍵 key 不存在的話，添加會成功
    if (dictAdd(d, key, val) == DICT_OK)
        return 1;

    /* It already exists, get the entry */
    // 運行到這裏，說明鍵 key 已經存在，那麼找出包含這個 key 的節點
    entry = dictFind(d, key);
    /* Set the new value and free the old one. Note that it is important
     * to do that in this order, as the value may just be exactly the same
     * as the previous one. In this context, think to reference counting,
     * you want to increment (set), and then decrement (free), and not the
     * reverse. */
    // 先保存原有的值的指針
    auxentry = *entry;
    // 而後設置新的值
    dictSetVal(d, entry, val);
    // 而後釋放舊值
    dictFreeVal(d, &auxentry);

    return 0;
}

在咱們學習java的集合類的時候，最經常使用的一個武器就是迭代器，在Redis的dict中，也實現了迭代器，分爲安全的和不安全的

typedef struct dictIterator {
        
    // 被迭代的字典
    dict *d;

    // table ：正在被迭代的哈希表號，值能夠是 0 或 1 。
    // index ：迭代器當前所指向的哈希表索引位置。
    // safe 迭代器是否安全，當爲1的時候，他是安全的，不然爲不安全的
    int table, index, safe;

    // entry ：當前迭代到的節點的指針
    // nextEntry ：當前迭代節點的下一個節點， 由於在安全迭代器運做時， entry所只帶的節點有可能被修改，因此須要一個額外的指針來保存下一節點的位置，從而防止指針丟失
    dictEntry *entry, *nextEntry;

    long long fingerprint; /* unsafe iterator fingerprint for misuse detection */
} dictIterator;

//生成一個不安全的迭代器
dictIterator *dictGetIterator(dict *d)
{
    dictIterator *iter = zmalloc(sizeof(*iter));

    iter->d = d;
    iter->table = 0;
    iter->index = -1;
    iter->safe = 0;
    iter->entry = NULL;
    iter->nextEntry = NULL;

    return iter;
}

//生成安全的迭代器
dictIterator *dictGetSafeIterator(dict *d) {
    dictIterator *i = dictGetIterator(d);

    i->safe = 1;

    return i;
}

好啦，這一節有點多，請見諒，若是有疑問，請諮詢QQ:359311095

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。