Redis 選擇hash仍是string 存儲數據？

時間 2019-12-05

標籤 redis 選擇 hash 仍是 string 存儲數據欄目 Redis 简体版

原文原文鏈接

在stackoverflow 看到一個問題，Redis strings vs Redis hashes to represent JSON: efficiency?內容以下：html

I want to store a JSON payload into redis. There's really 2 ways I can do this:web

One using a simple string keys and values.redis

key:user, value:payload (the entire JSON blob which can be 100-200 KB)json

SET user:1 payload小程序

Using hashes設計模式

HSET user:1 username "someone" HSET user:1 location "NY" HSET user:1 bio "STRING WITH OVER 100 lines"緩存

Keep in mind that if I use a hash, the value length isn't predictable. They're not all short such as the bio example above. Which is more memory efficient? Using string keys and values, or using a hash?bash

string 和 hash 直觀測試

首先咱們先測試用數據測試一下，測試數據結構以下：數據結構

values = {
    "name": "gs",
    "age": 1
}
複製代碼

使用for 生成10w個key，key的生成規則爲：併發

for i in range(100000):
    key = "object:%d" % i
複製代碼

把數據分別以hash 和 string（values 使用 json encode 爲string ）的形式存入redis。

結果以下：

hash 佔用 10.16M

hash 佔用 10.15M

這看起來和咱們印象中hash 佔空間比較大的觀念不太一致，這是爲何呢？

這裏是由於Redis 的hash 對象有兩種編碼方式：

ziplist（2.6以前是zipmap）
hashtable

當哈希對象能夠同時知足如下兩個條件時，哈希對象使用 ziplist 編碼：

哈希對象保存的全部鍵值對的鍵和值的字符串長度都小於 64 字節；
哈希對象保存的鍵值對數量小於 512 個；

不能知足這兩個條件的哈希對象須要使用 hashtable 編碼。上述測試數據知足這兩個條件，因此這裏使用的是ziplist來存儲的數據，而不是hashtable。

注意 這兩個條件的上限值是能夠修改的，具體請看配置文件中關於 hash-max-ziplist-value 選項和 hash-max-ziplist-entries 選項的說明。

hash-max-ziplist-entries for Redis >= 2.6 hash-max-ziplist-value for Redis >= 2.6

ziplist

ziplist 編碼的數據底層是使用壓縮列表做爲底層數據結構，結構以下：

hash 對象使用ziplist 保存時，程序會將保存了鍵的ziplist節點推入到列表的表尾，而後再將保存了值的ziplist節點推入列表的表尾。

使用這種方式保存時，並不須要申請多餘的內存空間，並且每一個Key都要存儲一些關聯的系統信息（如過時時間、LRU等），所以和String類型的Key/Value相比，Hash類型極大的減小了Key的數量(大部分的Key都以Hash字段的形式表示並存儲了)，從而進一步優化了存儲空間的使用效率。

在這篇redis memory optimization官方文章中，做者強烈推薦使用hash存儲數據

Use hashes when possible

Small hashes are encoded in a very small space, so you should try representing your data using hashes every time it is possible. For instance if you have objects representing users in a web application, instead of using different keys for name, surname, email, password, use a single hash with all the required fields.

But many times hashes contain just a few fields. When hashes are small we can instead just encode them in an O(N) data structure, like a linear array with length-prefixed key value pairs. Since we do this only when N is small, the amortized time for HGET and HSET commands is still O(1): the hash will be converted into a real hash table as soon as the number of elements it contains will grow too much (you can configure the limit in redis.conf).

This does not work well just from the point of view of time complexity, but also from the point of view of constant times, since a linear array of key value pairs happens to play very well with the CPU cache (it has a better cache locality than a hash table).

hashtable

hashtable 編碼的哈希對象使用字典做爲底層實現，哈希對象中的每一個鍵值對都使用一個字典鍵值對來保存：

字典的每一個鍵都是一個字符串對象，對象中保存了鍵值對的鍵；
字典的每一個值都是一個字符串對象，對象中保存了鍵值對的值。

hashtable 編碼的對象以下所示：

第二次測試

values = {
    "name": "gs",
    "age": 1,
    "intro": "long..long..long..string"
}
複製代碼

第二次測試方式和第一次同樣，只是把測試數據中加了一個大的字符串，以保證hash 使用hashtable 的方式存儲數據

結果以下：

hashtable： 1.13G

string： 1.13G

基本同樣，這裏應該主要是Hash類型極大的減小了Key的數量(大部分的Key都以Hash字段的形式表示並存儲了)，從而進一步優化了存儲空間的使用效率。

NOTE: 讀取和寫入的速度基本一致，差異不大

回到這個問題，對於string 和 hash 該如何選擇呢？

我比較贊同下面這個答案：

具體使用哪一種數據結構，實際上是須要看你要存儲的數據以及使用場景。

若是存儲的都是比較結構化的數據，好比用戶數據緩存，或者常常須要操做數據的一個或者幾個，特別是若是一個數據中若是filed比較多，可是每次只須要使用其中的一個或者少數的幾個，使用hash是一個好的選擇，由於它提供了hget 和 hmget，而無需取出全部數據再在代碼中處理。

反之，若是數據差別較大，操做時經常須要把全部數據都讀取出來再處理，使用string 是一個好的選擇。

固然，也能夠聽Redis 的，放心的使用hash 吧。

還有一種場景：若是一個hash中有大量的field（成千上萬個），須要考慮是否是使用string來分開存儲是否是更好的選擇。

參考連接

[1] Redis strings vs Redis hashes to represent JSON: efficiency?: stackoverflow.com/questions/1…
[2] redis memory optimization: redis.io/topics/memo…
[3] Redis 設計與實現： redisbook.com/preview/obj…

最後，感謝女友支持和包容，比❤️

也能夠在公號輸入如下關鍵字獲取歷史文章：公號&小程序 | 設計模式 | 併發&協程