深刻學習Redis（三），基本類型【Hash】剖析

時間 2019-11-16

標籤深刻學習 redis 基本類型 Hash 剖析欄目 Redis 简体版

原文原文鏈接

更多精彩文章，關注公衆號【ToBeTopJavaer】，更有數萬元精品vip資源免費等你來拿！！！

接下來咱們要剖析的基本類型是Hash，相信你們對Hash都不會陌生吧，下面咱們將深刻源碼剖析Redis中Hash的實現。 redis

首先咱們看一張圖：數組

存儲類型數據結構

包含鍵值對的無序散列表。value 只能是字符串，不能嵌套其餘類型。架構

一樣是存儲字符串，Hash 與 String 的主要區別？ide

一、把全部相關的值彙集到一個 key 中，節省內存空間性能

二、只使用一個 key，減小 key 衝突ui

三、當須要批量獲取值的時候，只須要使用一個命令，減小內存/IO/CPU 的消耗this

Hash 不適合的場景：編碼

一、Field 不能單獨設置過時時間spa

二、沒有 bit 操做

三、須要考慮數據量分佈的問題（value 值很是大的時候，沒法分佈到多個節點）

操做命令

存儲（實現）原理

Redis 的 Hash 自己也是一個 KV 的結構，相似於 Java 中的 HashMap。

外層的哈希（Redis KV 的實現）只用到了 hashtable。當存儲 hash 數據類型時，

咱們把它叫作內層的哈希。內層的哈希底層可使用兩種數據結構實現：

ziplist：OBJ_ENCODING_ZIPLIST（壓縮列表）

hashtable：OBJ_ENCODING_HT（哈希表）

以下圖所示：

問題1、那麼在何時會用到ziplist，何時用到hashtable呢？

在redis.conf咱們能夠看到：

在源碼中：

/* 源碼位置： t_hash.c ， 當達字段個數超過閾值， 使用 HT 做爲編碼 */
if (hashTypeLength(o) > server.hash_max_ziplist_entries)
hashTypeConvert(o, OBJ_ENCODING_HT);
/*源碼位置： t_hash.c， 當字段值長度過大， 轉爲 HT */
for (i = start; i <= end; i++) {
if (sdsEncodedObject(argv[i]) &&
sdslen(argv[i]->ptr) > server.hash_max_ziplist_value)
{
hashTypeConvert(o, OBJ_ENCODING_HT);
break;
}
}複製代碼

從而咱們能夠得知，當 hash 對象同時知足如下兩個條件的時候，使用 ziplist 編碼：

1）全部的鍵值對的健和值的字符串長度都小於等於 64byte（一個英文字母

一個字節）；

2）哈希對象保存的鍵值對數量小於 512 個。

一個哈希對象超過配置的閾值（鍵和值的長度有>64byte，鍵值對個數>512 個）時，

會轉換成哈希表（hashtable）。

問題2、什麼是ziplist壓縮列表

ziplist 壓縮列表

ziplist 壓縮列表是什麼？

ziplist 是一個通過特殊編碼的雙向鏈表，它不存儲指向上一個鏈表節點和指向下一

個鏈表節點的指針，而是存儲上一個節點長度和當前節點長度，經過犧牲部分讀寫性能，

來換取高效的內存空間利用率，是一種時間換空間的思想。只用在字段個數少，字段值

小的場景裏面。

ziplist 的內部結構？

整體架構以下圖所示：

entry對象定義的源碼以下：

typedef struct zlentry { 
unsigned int prevrawlensize; /* 上一個鏈表節點佔用的長度 */ 
unsigned int prevrawlen; /* 存儲上一個鏈表節點的長度數值所須要的字節數 */ 
unsigned int lensize; /* 存儲當前鏈表節點長度數值所須要的字節數 */ 
unsigned int len; /* 當前鏈表節點佔用的長度 */ 
unsigned int headersize; /* 當前鏈表節點的頭部大小（prevrawlensize + lensize），即非數據域的大小 */ 
unsigned char encoding; /* 編碼方式 */ 
unsigned char *p; /* 壓縮鏈表以字符串的形式保存，該指針指向當前節點起始位置 */ 
} zlentry;複製代碼

問題3、什麼是hashtable（ dict）？

hashtable是什麼？

在 Redis 中，hashtable 被稱爲字典（dictionary），它是一個數組+鏈表的結構。

前面咱們知道了，Redis 的 KV 結構是經過一個 dictEntry 來實現的。

Redis 又對 dictEntry 進行了多層的封裝。

dictEntry 定義以下：

typedef struct dictEntry {
  void *key; /* key 關鍵字定義 */
  union {
    void *val; uint64_t u64; /* value 定義 */
    int64_t s64; double d;
  } v;
  struct dictEntry *next; /* 指向下一個鍵值對節點 */
} dictEntry複製代碼

dictEntry 放到了 dictht（hashtable 裏面）：

/* This is our hash table structure. Every dictionary has two of this as we
* implement incremental rehashing, for the old to the new table. */
typedef struct dictht {
  dictEntry **table; /* 哈希表數組 */
  unsigned long size; /* 哈希表大小 */
  unsigned long sizemask; /* 掩碼大小， 用於計算索引值。 老是等於 size-1 */
  unsigned long used; /* 已有節點數 */
} dictht;複製代碼

dictht 放到了 dict 裏面：

typedef struct dict {
  dictType *type; /* 字典類型 */
  void *privdata; /* 私有數據 */
  dictht ht[2]; /* 一個字典有兩個哈希表 */
  long rehashidx; /* rehash 索引 */
  unsigned long iterators; /* 當前正在使用的迭代器數量 */
} dict;複製代碼

從最底層到最高層 dictEntry——dictht——dict——OBJ_ENCODING_HT

哈希的整體存儲結構以下：

注意： dictht 後面是 NULL 說明第二個 ht 還沒用到。 dictEntry*後面是 NULL 說明沒有 hash 到這個地址。 dictEntry 後面是NULL 說明沒有發生哈希衝突。

問題三、爲何要定義兩個hash表呢？ht[2]?

redis 的 hash 默認使用的是 ht[0]，ht[1]不會初始化和分配空間。

哈希表 dictht 是用鏈地址法來解決碰撞問題的。在這種狀況下，哈希表的性能取決於它的大小（size 屬性）和它所保存的節點的數量（used 屬性）之間的比率：

1. 比率在 1:1 時（一個哈希表 ht 只存儲一個節點 entry），哈希表的性能最好；

2. 若是節點數量比哈希表的大小要大不少的話（這個比例用 ratio 表示，5 表示平均一個 ht 存儲 5 個 entry），那麼哈希表就會退化成多個鏈表，哈希表自己的性能優點就再也不存在。

在這種狀況下須要擴容。Redis 裏面的這種操做叫作 rehash。

一、爲字符 ht[1]哈希表分配空間，這個哈希表的空間大小取決於要執行的操做，以及 ht[0]當前包含的鍵值對的數量。

擴展：ht[1]的大小爲第一個大於等於 ht[0].used*2。

二、將全部的 ht[0]上的節點 rehash 到 ht[1]上，從新計算 hash 值和索引，而後放入指定的位置。

三、當 ht[0]所有遷移到了 ht[1]以後，釋放 ht[0]的空間，將 ht[1]設置爲 ht[0]表，並建立新的 ht[1]，爲下次 rehash 作準備。

問題四、何時觸發擴容？

關鍵因素：負載因子

定義源碼以下：

static int dict_can_resize = 1;
static unsigned int dict_force_resize_ratio = 5;複製代碼

ratio = used / size，已使用節點與字典大小的比例。

dict_can_resize 爲 1 而且 dict_force_resize_ratio 已使用節點數和字典大小之間的比率超過 1：5，觸發擴容。

擴容判斷 _dictExpandIfNeeded源碼以下：

if (d->ht[0].used >= d->ht[0].size &&(dict_can_resize || d->ht[0].used/d->ht[0].size > dict_force_resize_ratio))
{
    return dictExpand(d, d->ht[0].used*2);
}r
eturn DICT_OK;複製代碼

擴容方法 dictExpand源碼以下：

static int dictExpand(dict *ht, unsigned long size) {
dict n; /* the new hashtable */
unsigned long realsize = _dictNextPower(size), i;

/* the size is invalid if it is smaller than the number of
* elements already inside the hashtable */
if (ht->used > size)
return DICT_ERR;

_dictInit(&n, ht->type, ht->privdata);
n.size = realsize;
n.sizemask = realsize-1;
n.table = calloc(realsize,sizeof(dictEntry*));

/* Copy all the elements from the old to the new table:
* note that if the old hash table is empty ht->size is zero,
* so dictExpand just creates an hash table. */
n.used = ht->used;
for (i = 0; i < ht->size && ht->used > 0; i++) {
dictEntry *he, *nextHe;

if (ht->table[i] == NULL) continue;

/* For each hash entry on this slot... */
he = ht->table[i];
while(he) {
unsigned int h;

nextHe = he->next;
/* Get the new element index */
h = dictHashKey(ht, he->key) & n.sizemask;
he->next = n.table[h];
n.table[h] = he;
ht->used--;
/* Pass to the next element */
he = nextHe;
}
}a
ssert(ht->used == 0);
free(ht->table);

/* Remap the new hashtable in the old */
*ht = n;
return DICT_OK;
}複製代碼

縮容源碼以下：

int htNeedsResize(dict *dict) {
  long long size, used;

  size = dictSlots(dict);
  used = dictSize(dict);
  return (size > DICT_HT_INITIAL_SIZE &&(used*100/size < HASHTABLE_MIN_FILL));
}複製代碼

應用場景

String

String 能夠作的事情，Hash 均可以作。

存儲對象類型的數據

好比對象或者一張表的數據，比 String 節省了更多 key 的空間，也更加便於集中管理。

購物車