【決戰西二旗】|Redis面試熱點之工程架構篇

時間 2019-12-18

標籤決戰西二旗 redis 面試熱點工程架構欄目 Redis 简体版

原文原文鏈接

前言html

前面用兩篇文章大體介紹了Redis熱點面試中的底層實現相關的問題，感興趣的能夠回顧一下：
【決戰西二旗】|Redis面試熱點之底層實現篇
 【決戰西二旗】|Redis面試熱點之底層實現篇(續)git

接下來咱們繼續來一塊兒研究下Redis工程架構相關的問題，這部份內容出現的機率相對大一些，由於並非全部人都會去研究源碼，若是面試一味問源碼那麼可能註定是一場尬聊。github

面試時在不要求候選人對Redis很是熟練的前提下，工程問題將是不二之選，工程問題相對較多，所以本號將分幾篇學習完，今天先來一塊兒學習第一篇。面試

經過本文你將瞭解到如下內容：
1.Redis的內存回收詳解
2.Redis的持久化機制redis

Q1:瞭解Redis的內存回收嗎？講講你的理解算法

1.1 爲何要回收內存？

Redis做爲內存型數據庫，若是單純的只進不出遲早就撐爆了，事實上不少把Redis當作主存儲DB用的傢伙們遲早會嚐到這個苦果，固然除非你家廠子確實不差錢，數T級別的內存都毛毛雨，或者數據增加必定程度以後再也不增加的場景，就另當別論了。數據庫

對於咱們這種把節約成本當作KPI的普通廠子，仍是把Redis當緩存用比較符合家裏的經濟條件，因此這麼看面試官的問題還算是比較貼合實際，比起那些手撕RBTree好一些，若是問題恰好在你知識射程範圍內，先給面試官點個贊再說！緩存

爲了讓Redis服務安全穩定的運行，讓使用內存保持在必定的閾值內是很是有必要的，所以咱們就須要刪除該刪除的，清理該清理的，把內存留給須要的鍵值對，試想一條大河須要設置幾個警惕水位來確保不決堤不枯竭，Redis也是同樣的，只不過Redis只關心決堤便可，來一張圖：安全

圖中設定機器內存爲128GB，佔用64GB算是比較安全的水平，若是內存接近80%也就是100GB左右，那麼認爲Redis目前承載能力已經比較大了，具體的比例能夠根據公司和我的的業務經驗來肯定。bash

筆者只是想表達出於安全和穩定的考慮，不要以爲128GB的內存就意味着存儲128GB的數據，都是要打折的。

1.2 內存從哪裏回收？

Redis佔用的內存是分爲兩部分：存儲鍵值對消耗和自己運行消耗。顯而後者咱們沒法回收，所以只能從鍵值對下手了，鍵值對能夠分爲幾種：帶過時的、不帶過時的、熱點數據、冷數據。對於帶過時的鍵值是須要刪除的，若是刪除了全部的過時鍵值對以後內存仍然不足怎麼辦？那隻能把部分數據給踢掉了。

人生無處不取捨，這個讓筆者腦海浮現了《泰坦尼克》，郵輪撞到了冰山頃刻間海水涌入，面臨數量不足的救生艇，人們作出了抉擇：讓女士和孩童先走，紳士們選擇留下，海上逃生場景如圖：

1.3 如何實施過時鍵值對的刪除？

要實施對鍵值對的刪除咱們須要明白以下幾點：

帶過時超時的鍵值對存儲在哪裏？
如何判斷帶超時的鍵值對是否能夠被刪除了？
刪除機制有哪些以及如何選擇？

1.3.1 鍵值對的存儲

老規矩來到github看下源碼，src/server.h中給的redisDb結構體給出了答案：

typedef struct redisDb {
    dict *dict;                 /* The keyspace for this DB */
    dict *expires;              /* Timeout of keys with a timeout set */
    dict *blocking_keys;        /* Keys with clients waiting for data (BLPOP)*/
    dict *ready_keys;           /* Blocked keys that received a PUSH */
    dict *watched_keys;         /* WATCHED keys for MULTI/EXEC CAS */
    int id;                     /* Database ID */
    long long avg_ttl;          /* Average TTL, just for stats */
    unsigned long expires_cursor; /* Cursor of the active expire cycle. */
    list *defrag_later;         /* List of key names to attempt to defrag one by one, gradually. */
} redisDb;
複製代碼

Redis本質上就是一個大的key-value，key就是字符串，value有是幾種對象：字符串、列表、有序列表、集合、哈希等，這些key-value都是存儲在redisDb的dict中的，來看下黃健宏畫的一張很是讚的圖：

看到這裏，對於刪除機制又清晰了一步，咱們只要把redisDb中dict中的目標key-value刪掉就行，不過貌似沒有這麼簡單，Redis對於過時鍵值對確定有本身的組織規則，讓咱們繼續研究吧！

redisDb的expires成員的類型也是dict，和鍵值對是同樣的，本質上expires是dict的子集，expires保存的是全部帶過時的鍵值對，稱之爲過時字典吧，它纔是咱們研究的重點。

對於鍵，咱們能夠設置絕對和相對過時時間、以及查看剩餘時間：

使用EXPIRE和PEXPIRE來實現鍵值對的秒級和毫秒級生存時間設定，這是相對時長的過時設置
使用EXPIREAT和EXPIREAT來實現鍵值對在某個秒級和毫秒級時間戳時進行過時刪除，屬於絕對過時設置
經過TTL和PTTL來查看帶有生存時間的鍵值對的剩餘過時時間

上述三組命令在設計緩存時用處比較大，有心的讀者能夠留意。

過時字典expires和鍵值對空間dict存儲的內容並不徹底同樣，過時字典expires的key是指向Redis對應對象的指針，其value是long long型的unix時間戳，前面的EXPIRE和PEXPIRE相對時長最終也會轉換爲時間戳，來看下過時字典expires的結構，筆者畫了個圖：

1.3.2 鍵值對的過時刪除判斷

判斷鍵是否過時可刪除，須要先查過時字典是否存在該值，若是存在則進一步判斷過時時間戳和當前時間戳的相對大小，作出刪除判斷，簡單的流程如圖：

1.3.3 鍵值對的刪除策略

通過前面的幾個環節，咱們知道了Redis的兩種存儲位置：鍵空間和過時字典，以及過時字典expires的結構、判斷是否過時的方法，那麼該如何實施刪除呢？

先拋開Redis來想一下可能的幾種刪除策略：

定時刪除：在設置鍵的過時時間的同時，建立定時器，讓定時器在鍵過時時間到來時，即刻執行鍵值對的刪除；
按期刪除：每隔特定的時間對數據庫進行一次掃描，檢測並刪除其中的過時鍵值對；
惰性刪除：鍵值對過時暫時不進行刪除，至於刪除的時機與鍵值對的使用有關，當獲取鍵時先查看其是否過時，過時就刪除，不然就保留；

在上述的三種策略中定時刪除和按期刪除屬於不一樣時間粒度的主動刪除，惰性刪除屬於被動刪除。

三種策略都有各自的優缺點：定時刪除對內存使用率有優點，可是對CPU不友好，惰性刪除對內存不友好，若是某些鍵值對一直不被使用，那麼會形成必定量的內存浪費，按期刪除是定時刪除和惰性刪除的折中。

Reids採用的是惰性刪除和定時刪除的結合，通常來講能夠藉助最小堆來實現定時器，不過Redis的設計考慮到時間事件的有限種類和數量，使用了無序鏈表存儲時間事件，這樣若是在此基礎上實現定時刪除，就意味着O(N)遍歷獲取最近須要刪除的數據。

可是我以爲antirez若是非要使用定時刪除，那麼他確定不會使用原來的無序鏈表機制，因此我的認爲已存在的無序鏈表不能做爲Redis不使用定時刪除的根本理由，冒昧猜想惟一可能的是antirez以爲沒有必要使用定時刪除。

1.3.4 按期刪除的實現細節

按期刪除聽着很簡單，可是如何控制執行的頻率和時長呢？

試想一下若是執行頻率太少就退化爲惰性刪除了，若是執行時間太長又和定時刪除相似了，想一想還確實是個難題！而且執行按期刪除的時機也須要考慮，因此咱們繼續來看看Redis是如何實現按期刪除的吧！筆者在src/expire.c文件中找到了activeExpireCycle函數，按期刪除就是由此函數實現的，在代碼中antirez作了比較詳盡的註釋，不過都是英文的，試着讀了一下模模糊糊弄個大概，因此學習英文並閱讀外文資料是很重要的學習途徑。

先貼一下代碼，核心部分算上註釋大約210行，具體看下：

#define ACTIVE_EXPIRE_CYCLE_KEYS_PER_LOOP 20 /* Keys for each DB loop. */
#define ACTIVE_EXPIRE_CYCLE_FAST_DURATION 1000 /* Microseconds. */
#define ACTIVE_EXPIRE_CYCLE_SLOW_TIME_PERC 25 /* Max % of CPU to use. */
#define ACTIVE_EXPIRE_CYCLE_ACCEPTABLE_STALE 10 /* % of stale keys after which we do extra efforts. */

void activeExpireCycle(int type) {
    /* Adjust the running parameters according to the configured expire * effort. The default effort is 1, and the maximum configurable effort * is 10. */
    unsigned long
    effort = server.active_expire_effort-1, /* Rescale from 0 to 9. */
    config_keys_per_loop = ACTIVE_EXPIRE_CYCLE_KEYS_PER_LOOP +
                           ACTIVE_EXPIRE_CYCLE_KEYS_PER_LOOP/4*effort,
    config_cycle_fast_duration = ACTIVE_EXPIRE_CYCLE_FAST_DURATION +
                                 ACTIVE_EXPIRE_CYCLE_FAST_DURATION/4*effort,
    config_cycle_slow_time_perc = ACTIVE_EXPIRE_CYCLE_SLOW_TIME_PERC +
                                  2*effort,
    config_cycle_acceptable_stale = ACTIVE_EXPIRE_CYCLE_ACCEPTABLE_STALE-
                                    effort;

    /* This function has some global state in order to continue the work * incrementally across calls. */
    static unsigned int current_db = 0; /* Last DB tested. */
    static int timelimit_exit = 0;      /* Time limit hit in previous call? */
    static long long last_fast_cycle = 0; /* When last fast cycle ran. */

    int j, iteration = 0;
    int dbs_per_call = CRON_DBS_PER_CALL;
    long long start = ustime(), timelimit, elapsed;

    /* When clients are paused the dataset should be static not just from the * POV of clients not being able to write, but also from the POV of * expires and evictions of keys not being performed. */
    if (clientsArePaused()) return;

    if (type == ACTIVE_EXPIRE_CYCLE_FAST) {
        /* Don't start a fast cycle if the previous cycle did not exit * for time limit, unless the percentage of estimated stale keys is * too high. Also never repeat a fast cycle for the same period * as the fast cycle total duration itself. */
        if (!timelimit_exit &&
            server.stat_expired_stale_perc < config_cycle_acceptable_stale)
            return;

        if (start < last_fast_cycle + (long long)config_cycle_fast_duration*2)
            return;

        last_fast_cycle = start;
    }

    /* We usually should test CRON_DBS_PER_CALL per iteration, with * two exceptions: * * 1) Don't test more DBs than we have. * 2) If last time we hit the time limit, we want to scan all DBs * in this iteration, as there is work to do in some DB and we don't want * expired keys to use memory for too much time. */
    if (dbs_per_call > server.dbnum || timelimit_exit)
        dbs_per_call = server.dbnum;

    /* We can use at max 'config_cycle_slow_time_perc' percentage of CPU * time per iteration. Since this function gets called with a frequency of * server.hz times per second, the following is the max amount of * microseconds we can spend in this function. */
    timelimit = config_cycle_slow_time_perc*1000000/server.hz/100;
    timelimit_exit = 0;
    if (timelimit <= 0) timelimit = 1;

    if (type == ACTIVE_EXPIRE_CYCLE_FAST)
        timelimit = config_cycle_fast_duration; /* in microseconds. */

    /* Accumulate some global stats as we expire keys, to have some idea * about the number of keys that are already logically expired, but still * existing inside the database. */
    long total_sampled = 0;
    long total_expired = 0;

    for (j = 0; j < dbs_per_call && timelimit_exit == 0; j++) {
        /* Expired and checked in a single loop. */
        unsigned long expired, sampled;

        redisDb *db = server.db+(current_db % server.dbnum);

        /* Increment the DB now so we are sure if we run out of time * in the current DB we'll restart from the next. This allows to * distribute the time evenly across DBs. */
        current_db++;

        /* Continue to expire if at the end of the cycle more than 25% * of the keys were expired. */
        do {
            unsigned long num, slots;
            long long now, ttl_sum;
            int ttl_samples;
            iteration++;

            /* If there is nothing to expire try next DB ASAP. */
            if ((num = dictSize(db->expires)) == 0) {
                db->avg_ttl = 0;
                break;
            }
            slots = dictSlots(db->expires);
            now = mstime();

            /* When there are less than 1% filled slots, sampling the key * space is expensive, so stop here waiting for better times... * The dictionary will be resized asap. */
            if (num && slots > DICT_HT_INITIAL_SIZE &&
                (num*100/slots < 1)) break;

            /* The main collection cycle. Sample random keys among keys * with an expire set, checking for expired ones. */
            expired = 0;
            sampled = 0;
            ttl_sum = 0;
            ttl_samples = 0;

            if (num > config_keys_per_loop)
                num = config_keys_per_loop;

            /* Here we access the low level representation of the hash table * for speed concerns: this makes this code coupled with dict.c, * but it hardly changed in ten years. * * Note that certain places of the hash table may be empty, * so we want also a stop condition about the number of * buckets that we scanned. However scanning for free buckets * is very fast: we are in the cache line scanning a sequential * array of NULL pointers, so we can scan a lot more buckets * than keys in the same time. */
            long max_buckets = num*20;
            long checked_buckets = 0;

            while (sampled < num && checked_buckets < max_buckets) {
                for (int table = 0; table < 2; table++) {
                    if (table == 1 && !dictIsRehashing(db->expires)) break;

                    unsigned long idx = db->expires_cursor;
                    idx &= db->expires->ht[table].sizemask;
                    dictEntry *de = db->expires->ht[table].table[idx];
                    long long ttl;

                    /* Scan the current bucket of the current table. */
                    checked_buckets++;
                    while(de) {
                        /* Get the next entry now since this entry may get * deleted. */
                        dictEntry *e = de;
                        de = de->next;

                        ttl = dictGetSignedIntegerVal(e)-now;
                        if (activeExpireCycleTryExpire(db,e,now)) expired++;
                        if (ttl > 0) {
                            /* We want the average TTL of keys yet * not expired. */
                            ttl_sum += ttl;
                            ttl_samples++;
                        }
                        sampled++;
                    }
                }
                db->expires_cursor++;
            }
            total_expired += expired;
            total_sampled += sampled;

            /* Update the average TTL stats for this database. */
            if (ttl_samples) {
                long long avg_ttl = ttl_sum/ttl_samples;

                /* Do a simple running average with a few samples. * We just use the current estimate with a weight of 2% * and the previous estimate with a weight of 98%. */
                if (db->avg_ttl == 0) db->avg_ttl = avg_ttl;
                db->avg_ttl = (db->avg_ttl/50)*49 + (avg_ttl/50);
            }

            /* We can't block forever here even if there are many keys to * expire. So after a given amount of milliseconds return to the * caller waiting for the other active expire cycle. */
            if ((iteration & 0xf) == 0) { /* check once every 16 iterations. */
                elapsed = ustime()-start;
                if (elapsed > timelimit) {
                    timelimit_exit = 1;
                    server.stat_expired_time_cap_reached_count++;
                    break;
                }
            }
            /* We don't repeat the cycle for the current database if there are * an acceptable amount of stale keys (logically expired but yet * not reclained). */
        } while ((expired*100/sampled) > config_cycle_acceptable_stale);
    }

    elapsed = ustime()-start;
    server.stat_expire_cycle_time_used += elapsed;
    latencyAddSampleIfNeeded("expire-cycle",elapsed/1000);

    /* Update our estimate of keys existing but yet to be expired. * Running average with this sample accounting for 5%. */
    double current_perc;
    if (total_sampled) {
        current_perc = (double)total_expired/total_sampled;
    } else
        current_perc = 0;
    server.stat_expired_stale_perc = (current_perc*0.05)+
                                     (server.stat_expired_stale_perc*0.95);
}複製代碼

說實話這個代碼細節比較多，因爲筆者對Redis源碼瞭解很少，只能作個模糊版本的解讀，因此不免有問題，仍是建議有條件的讀者自行前往源碼區閱讀，拋磚引玉看下筆者的模糊版本：

該算法是個自適應的過程，當過時的key比較少時那麼就花費不多的cpu時間來處理，若是過時的key不少就採用激進的方式來處理，避免大量的內存消耗，能夠理解爲判斷過時鍵多就多跑幾回，少則少跑幾回；
因爲Redis中有不少數據庫db，該算法會逐個掃描，本次結束時繼續向後面的db掃描，是個閉環的過程；
按期刪除有快速循環和慢速循環兩種模式，主要採用慢速循環模式，其循環頻率主要取決於server.hz，一般設置爲10，也就是每秒執行10次慢循環按期刪除，執行過程當中若是耗時超過25%的CPU時間就中止；
慢速循環的執行時間相對較長，會出現超時問題，快速循環模式的執行時間不超過1ms，也就是執行時間更短，可是執行的次數更多，在執行過程當中發現某個db中抽樣的key中過時key佔比低於25%則跳過；

主體意思：按期刪除是個自適應的閉環而且機率化的抽樣掃描過程，過程當中都有執行時間和cpu時間的限制，若是觸發閾值就中止，能夠說是儘可能在不影響對客戶端的響應下潤物細無聲地進行的。

1.3.5 DEL刪除鍵值對

在Redis4.0以前執行del操做時若是key-value很大，那麼可能致使阻塞，在新版本中引入了BIO線程以及一些新的命令，實現了del的延時懶刪除，最後會有BIO線程來實現內存的清理回收。

以前寫過一篇4.0版本的LazyFree相關的文章，能夠看下淺析Redis 4.0新特性之LazyFree

1.4 內存淘汰機制

爲了保證Redis的安全穩定運行，設置了一個max-memory的閾值，那麼當內存用量到達閾值，新寫入的鍵值對沒法寫入，此時就須要內存淘汰機制，在Redis的配置中有幾種淘汰策略能夠選擇，詳細以下：

noeviction: 當內存不足以容納新寫入數據時，新寫入操做會報錯；
allkeys-lru：當內存不足以容納新寫入數據時，在鍵空間中移除最近最少使用的 key；
allkeys-random：當內存不足以容納新寫入數據時，在鍵空間中隨機移除某個 key；
volatile-lru：當內存不足以容納新寫入數據時，在設置了過時時間的鍵空間中，移除最近最少使用的 key；
volatile-random：當內存不足以容納新寫入數據時，在設置了過時時間的鍵空間中，隨機移除某個 key；
volatile-ttl：當內存不足以容納新寫入數據時，在設置了過時時間的鍵空間中，有更早過時時間的 key 優先移除；

後三種策略都是針對過時字典的處理，可是在過時字典爲空時會noeviction同樣返回寫入失敗，毫無策略地隨機刪除也不太可取，因此通常選擇第二種allkeys-lru基於LRU策略進行淘汰。

我的認爲antirez一貫都是工程化思惟，善於使用機率化設計來作近似實現，LRU算法也不例外，Redis中實現了近似LRU算法，而且通過幾個版本的迭代效果已經比較接近理論LRU算法的效果了，這個也是個不錯的內容，因爲篇幅限制，本文計劃後續單獨講LRU算法時再進行詳細討論。

1.5 過時鍵刪除和內存淘汰的關係

過時健刪除策略強調的是對過時健的操做，若是有健過時而內存足夠，Redis不會使用內存淘汰機制來騰退空間，這時會優先使用過時健刪除策略刪除過時健。

內存淘汰機制強調的是對內存數據的淘汰操做，當內存不足時，即便有的健沒有到達過時時間或者根本沒有設置過時也要根據必定的策略來刪除一部分，騰退空間保證新數據的寫入。

Q2:講講你對Redis持久化機制的理解。

我的認爲Redis持久化既是數據庫自己的亮點，也是面試的熱點，主要考察的方向包括：RDB機制原理、AOF機制原理、各自的優缺點、工程上的對於RDB和AOF的取捨、新版本Redis混合持久化策略等，如能把握要點，持久化問題就過關了。

以前寫過一篇持久化的文章：理解Redis持久化,基本上也涵蓋了上面的幾個點，能夠看一下。

巨人的肩膀

www.hoohack.me/2019/06/24/…

redisbook.readthedocs.io/en/latest/i…

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。