Redis 源碼走讀（二）對象系統

時間 2019-12-05

標籤 redis 源碼走讀對象系統欄目 Redis 简体版

原文原文鏈接

Redis設計了多種數據結構，並以此爲基礎構建了多種對象，每種對象（除了新出的 stream 之外）都有超過一種的實現。redis

redisObject 這個結構體反應了 Redis 對象的內存佈局數據結構

typedef struct redisObject {
    unsigned type:4;//對象類型 4bit
    unsigned encoding:4;//底層數據結構 4 bit
    unsigned lru:LRU_BITS; /* LRU time (relative to global lru_clock) or
                            * LFU data (least significant 8 bits frequency
                            * and most significant 16 bits access time). */ // 24 bit
    int refcount; // 4 byte
    void *ptr;//指向數據結構的指針 // 8 byte
} robj;

能夠看出，robj 用4個 bit 存儲對象類型，4個 bit 存儲對象的底層數據結構ide

以及 robj 的固定大小爲 16 byte函數

其中對象類型有下面幾種：佈局

#define OBJ_STRING 0    /* String object. *///字符串類型
#define OBJ_LIST 1      /* List object. *///列表類型
#define OBJ_SET 2       /* Set object. *///集合對象
#define OBJ_ZSET 3      /* Sorted set object. *///有序集合對象
#define OBJ_HASH 4      /* Hash object. *///哈希對象
#define OBJ_MODULE 5    /* Module object. *///模塊對象
#define OBJ_STREAM 6    /* Stream object. *///流對象，redis 5中新增

數據結構有下面幾種：ui

#define OBJ_ENCODING_RAW 0     /* Raw representation *///基本 sds
#define OBJ_ENCODING_INT 1     /* Encoded as integer *///整數表示的字符串
#define OBJ_ENCODING_HT 2      /* Encoded as hash table *///字典
#define OBJ_ENCODING_ZIPMAP 3  /* Encoded as zipmap */ //廢棄
#define OBJ_ENCODING_LINKEDLIST 4 /* No longer used: old list encoding. *//廢棄
#define OBJ_ENCODING_ZIPLIST 5 /* Encoded as ziplist *///壓縮列表
#define OBJ_ENCODING_INTSET 6  /* Encoded as intset *///整數集合
#define OBJ_ENCODING_SKIPLIST 7  /* Encoded as skiplist *///跳躍表
#define OBJ_ENCODING_EMBSTR 8  /* Embedded sds string encoding *///embstr
#define OBJ_ENCODING_QUICKLIST 9 /* Encoded as linked list of ziplists */
#define OBJ_ENCODING_STREAM 10 /* Encoded as a radix tree of listpacks */

其實觀察 objectComputeSize 這個方法就看出對象與數據結構的關聯關係this

OBJ_STRING = OBJ_ENCODING_RAW + OBJ_ENCODING_INT + OBJ_ENCODING_EMBSTRspa

OBJ_LIST = OBJ_ENCODING_QUICKLIST + OBJ_ENCODING_ZIPLIST設計

OBJ_SET = OBJ_ENCODING_INTSET + OBJ_ENCODING_HT指針

OBJ_ZSET = OBJ_ENCODING_SKIPLIST + OBJ_ENCODING_ZIPLIST

OBJ_HASH = OBJ_ENCODING_HT + OBJ_ENCODING_ZIPLIST

OBJ_STREAM = OBJ_ENCODING_STREAM

爲何要設置這麼複雜的對象系統呢，主要仍是爲了壓縮內存。

以最最多見的字符串對象爲例，它對應的數據結構是最多的，有三種，其目的在一個名爲 tryObjectEncoding 的函數中可見一斑：

//嘗試壓縮 string
//1. 檢查是否能夠直接用 INT 存儲，最好能用 shared.integers 來存
//2. 檢查是否能夠用 embstr 來存儲
//3. 若是 sds 有1/10的空間空閒，則壓縮空閒空間
/* Try to encode a string object in order to save space */
robj *tryObjectEncoding(robj *o) {
    long value;
    sds s = o->ptr;
    size_t len;

    ......

    /* Check if we can represent this string as a long integer.
     * Note that we are sure that a string larger than 20 chars is not
     * representable as a 32 nor 64 bit integer. */
    len = sdslen(s);
    if (len <= 20 && string2l(s,len,&value)) {//檢查是否爲長度<=20的整數
        /* This object is encodable as a long. Try to use a shared object.
         * Note that we avoid using shared integers when maxmemory is used
         * because every object needs to have a private LRU field for the LRU
         * algorithm to work well. */
        //檢查 value 是否落在 [0， OBJ_SHARED_INTEGERS)這個區間裏
        if ((server.maxmemory == 0 ||
            !(server.maxmemory_policy & MAXMEMORY_FLAG_NO_SHARED_INTEGERS)) &&
            value >= 0 &&
            value < OBJ_SHARED_INTEGERS)
        {
            decrRefCount(o);
            incrRefCount(shared.integers[value]);
            return shared.integers[value];
        } else {
            if (o->encoding == OBJ_ENCODING_RAW) sdsfree(o->ptr);
            o->encoding = OBJ_ENCODING_INT;
            o->ptr = (void*) value;
            return o;
        }
    }

    /* If the string is small and is still RAW encoded,
     * try the EMBSTR encoding which is more efficient.
     * In this representation the object and the SDS string are allocated
     * in the same chunk of memory to save space and cache misses. */
    //是否能夠用 embstr 來存儲：檢查string 的長度是否 <= 44
    if (len <= OBJ_ENCODING_EMBSTR_SIZE_LIMIT) {
        robj *emb;

        if (o->encoding == OBJ_ENCODING_EMBSTR) return o;
        emb = createEmbeddedStringObject(s,sdslen(s));
        decrRefCount(o);
        return emb;
    }

    /* We can't encode the object...
     *
     * Do the last try, and at least optimize the SDS string inside
     * the string object to require little space, in case there
     * is more than 10% of free space at the end of the SDS string.
     *
     * We do that only for relatively large strings as this branch
     * is only entered if the length of the string is greater than
     * OBJ_ENCODING_EMBSTR_SIZE_LIMIT. */
    //嘗試壓縮 sds 的空間
    if (o->encoding == OBJ_ENCODING_RAW &&
        sdsavail(s) > len/10)
    {
        o->ptr = sdsRemoveFreeSpace(o->ptr);
    }

    /* Return the original object. */
    return o;
}

能夠看出 Redis 對內存的使用是很是剋制的。

分析一個頗有意思的細節：爲何 embstr 與 raw sds 的分界線在 44 這個長度呢？
看一下sdshdr8這個結構體

struct __attribute__ ((__packed__)) sdshdr8 {
    uint8_t len; /* used */ // 1 byte
    uint8_t alloc; /* excluding the header and null terminator */ // 1 byte
    unsigned char flags; /* 3 lsb of type, 5 unused bits */ // 1 byte
    char buf[];
};

能夠看出len + alloc + flags = 3 byte

而後Redis 會默認在存儲的字符串尾部加一個 '\0'，這個也會佔據一個1 byte 的空間

也就是說一個 sdshdr8 除去內容之外至少要佔 4個 byte 的空間

再加上 robj 頭的大小 16 byte，那就是20 byte

而 jemalloc 會固定分配8/16/32/64 等大小的內存，因此以 44 爲embstr 與 raw sds 的分界線，是有深意的（是否能夠再細一點，將 12 做爲另一種更小的字符串的分界線呢？）

更有趣的是，若是往前翻幾個版本，能夠發現這個分界線是在 39 byte，這是由於老版本的 sds 只有一種：

struct sdshdr {
    unsigned int len;//4 byte
    unsigned int free;//4 byte
    char buf[];
};

能夠看出sdshdr 的固定開銷是4+4+1 = 9 byte，再加上 robj 的16byte就是25byte，因此分界線就只能定爲39byte 了

新版本的sdshdr8 與之相比，硬是摳出了5個 byte 的空間，真的很是了不得

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。