memcached源碼分析item過時失效處理

時間 2019-11-08

標籤 memcached 源碼分析 item 過時失效處理欄目 Memcached 简体版

原文原文鏈接

原文地址：http://www.faceye.net/search/142629.html html

過時失效處理：

一個item在兩種狀況下會過時失效：1.item的exptime時間戳到了。2.用戶使用flush_all命令將所有item變成過時失效的。讀者可能會說touch命令也可使得一個item過時失效，其實這也屬於前面說的第一種狀況。算法

超時失效：

對於第一種過時失效，memcached的使用懶惰處理：不主動檢測一個item是否過時失效。當worker線程訪問這個item時，才檢測這個item的exptime時間戳是否到了。比較簡單，這裏就先不貼代碼，後面會貼。數組

flush_all命令：

第二種過時失效是用戶flush_all命令設置的。flush_all會將全部item都變成過時失效。全部item是指哪些item？由於多個客戶端會不斷地往memcached插入item，因此必需要明白全部item是指哪些。是以worker線程接收到這個命令那一刻爲界?仍是以刪除那一刻爲界？服務器

當worker線程接收到flush_all命令後，會用全局變量settings的oldest_live成員存儲接收到這個命令那一刻的時間(準確地說，是worker線程解析得知這是一個flush_all命令那一刻再減一)，代碼爲settings.oldest_live =current_time - 1;而後調用item_flush_expired函數鎖上cache_lock，而後調用do_item_flush_expired函數完成工做。memcached

void do_item_flush_expired(void) {
int i;
item *iter, *next;
if (settings.oldest_live == 0)
return;
for (i = 0; i < LARGEST_ID; i++) {
for (iter = heads[i]; iter != NULL; iter = next) {
if (iter->time != 0 && iter->time >= settings.oldest_live) {
next = iter->next;
if ((iter->it_flags & ITEM_SLABBED) == 0) {
do_item_unlink_nolock(iter, hash(ITEM_key(iter), iter->nkey));
}
} else {
/* We've hit the first old item. Continue to the next queue. */
break;
}
}
}
}

do_item_flush_expired函數內部會變量全部LRU隊列，檢測每個item的time成員。檢測time成員是合理的。若是time成員小於settings.oldest_live就說明該item在worker線程接收到flush_all命令的時候就已經存在了(time成員表示該item的最後一次訪問時間)。那麼就該刪除這個item。函數

這樣看來memcached是以worker線程接收到flush_all命令那一刻爲界的。等等等等，看清楚一點！！在do_item_flush_expired函數裏面，不是當item的time成員小於settings.oldest_live時刪除這個item，而是大於的時候才刪除。從time成員變量的意義來講，大於表明什麼啊？有大於的嗎？奇怪！@#@&￥fetch

實際上memcached是以刪除那一刻爲界的。那settings.oldest_live爲何要存儲worker線程接收到flush_all命令的時間戳？爲何又要判斷iter->time是否大於settings.oldest_live呢？ui

按照通常的作法，在do_item_flush_expired函數中直接把哈希表和LRU上的全部item通通刪除便可。這樣確實是能夠達到目標。但在本worker線程處理期間，其餘worker線程徹底不能工做(由於do_item_flush_expired的調用者已經鎖上了cache_lock)。而LRU隊列裏面可能有大量的數據，這個過時處理過程可能會很長。其餘worker線程徹底不能工做是難於接受的。this

memcached的做者確定也意識到這個問題，因此他就寫了一個奇怪的do_item_flush_expired函數，用來加速。do_item_flush_expired只會刪除少許特殊的item。如何特殊法，在後面代碼註釋中會解釋。對於其餘大量的item，memcached採用懶惰方式處理。只有當worker線程試圖訪問該item，才檢測item是否已經被設置爲過時的了。事實上，無需對item進行任何設置就能檢測該item是否爲過時的，經過settings.oldest_live變量便可。這種懶惰和前面第一種item過時失效的處理是同樣的。lua

如今再看一下do_item_flush_expired函數，看一下特殊的item。

void do_item_flush_expired(void) {
int i;
item *iter, *next;
if (settings.oldest_live == 0)
return;
for (i = 0; i < LARGEST_ID; i++) {
for (iter = heads[i]; iter != NULL; iter = next) {
//iter->time == 0的是lru爬蟲item，直接忽略
//通常狀況下iter->time是小於settings.oldest_live的。但在這種狀況下
//就有可能出現iter->time >= settings.oldest_live :  worker1接收到
//flush_all命令，並給settings.oldest_live賦值爲current_time-1。
//worker1線程還沒來得及調用item_flush_expired函數，就被worker2
//搶佔了cpu，而後worker2往lru隊列插入了一個item。這個item的time
//成員就會知足iter->time >= settings.oldest_live
if (iter->time != 0 && iter->time >= settings.oldest_live) {
next = iter->next;
if ((iter->it_flags & ITEM_SLABBED) == 0) {
//雖然調用的是nolock,但本函數的調用者item_flush_expired
//已經鎖上了cache_lock，才調用本函數的
do_item_unlink_nolock(iter, hash(ITEM_key(iter), iter->nkey));
}
} else {
//由於lru隊列裏面的item是根據time降序排序的，因此當存在一個item的time成員
//小於settings.oldest_live,剩下的item都不須要再比較了
break;
}
}
}
}

懶惰刪除：

如今來看一下item的懶惰刪除。注意代碼中的註釋。

item *do_item_get(const char *key, const size_t nkey, const uint32_t hv) {
item *it = assoc_find(key, nkey, hv);
...
if (it != NULL) {
//settings.oldest_live初始化值爲0
//檢測用戶是否使用過flush_all命令，刪除全部item。
//it->time <= settings.oldest_live就說明用戶在使用flush_all命令的時候
//就已經存在該item了。那麼該item是要刪除的。
//flush_all命令能夠有參數，用來設定在將來的某個時刻把全部的item都設置
//爲過時失效，此時settings.oldest_live是一個比worker接收到flush_all
//命令的那一刻大的時間,因此要判斷settings.oldest_live <= current_time
if (settings.oldest_live != 0 && settings.oldest_live <= current_time &&
it->time <= settings.oldest_live) {
do_item_unlink(it, hv);
do_item_remove(it);
it = NULL;
} else if (it->exptime != 0 && it->exptime <= current_time) {//該item已通過期失效了
do_item_unlink(it, hv);//引用數會減一
do_item_remove(it);//引用數減一,若是引用數等於0，就刪除
it = NULL;
} else {
it->it_flags |= ITEM_FETCHED;
}
}
return it;
}

能夠看到，在查找到一個item後就要檢測它是否過時失效了。失效了就要刪除之。

除了do_item_get函數外，do_item_alloc函數也是會處理過時失效item的。do_item_alloc函數不是刪除這個過時失效item，而是佔爲己用。由於這個函數的功能是申請一個item，若是一個item過時了那麼就直接霸佔這個item的那塊內存。下面看一下代碼。

item *do_item_alloc(char *key, const size_t nkey, const int flags,
const rel_time_t exptime, const int nbytes,
const uint32_t cur_hv) {
uint8_t nsuffix;
item *it = NULL;
char suffix[40];
//要存儲這個item須要的總空間
size_t ntotal = item_make_header(nkey + 1, flags, nbytes, suffix, &nsuffix);
if (settings.use_cas) {
ntotal += sizeof(uint64_t);
}
//根據大小判斷從屬於哪一個slab
unsigned int id = slabs_clsid(ntotal);
/* do a quick check if we have any expired items in the tail.. */
int tries = 5;
item *search;
item *next_it;
rel_time_t oldest_live = settings.oldest_live;
search = tails[id];
for (; tries > 0 && search != NULL; tries--, search=next_it) {
next_it = search->prev;
...
if (refcount_incr(&search->refcount) != 2) {//引用數，還有其餘線程在引用，不能霸佔這個item
//刷新這個item的訪問時間以及在LRU隊列中的位置
do_item_update_nolock(search);
tries++;
refcount_decr(&search->refcount);
//此時引用數>=2
continue;
}
//search指向的item的refcount等於2，這說明此時這個item除了本worker
//線程外，沒有其餘任何worker線程索引其。能夠放心釋放並重用這個item
//由於這個循環是從lru鏈表的後面開始遍歷的。因此一開始search就指向
//了最不經常使用的item，若是這個item都沒有過時。那麼其餘的比其更經常使用
//的item就不要刪除了(即便它們過時了)。此時只能向slabs申請內存
if ((search->exptime != 0 && search->exptime < current_time)
|| (search->time <= oldest_live && oldest_live <= current_time)) {
//search指向的item是一個過時失效的item，可使用之
it = search;
//從新計算一下這個slabclass_t分配出去的內存大小
//直接霸佔舊的item就須要從新計算
slabs_adjust_mem_requested(it->slabs_clsid, ITEM_ntotal(it), ntotal);
do_item_unlink_nolock(it, hv);//從哈希表和lru鏈表中刪除
/* Initialize the item block: */
it->slabs_clsid = 0;
}
//引用計數減一。此時該item已經沒有任何worker線程索引其，而且哈希表也
//再也不索引其
refcount_decr(&search->refcount);
break;
}
...
return it;
}
//新的item直接霸佔舊的item就會調用這個函數
void slabs_adjust_mem_requested(unsigned int id, size_t old, size_t ntotal)
{
pthread_mutex_lock(&slabs_lock);
slabclass_t *p;
if (id < POWER_SMALLEST || id > power_largest) {
fprintf(stderr, "Internal error! Invalid slab class\n");
abort();
}
p = &slabclass[id];
//從新計算一下這個slabclass_t分配出去的內存大小
p->requested = p->requested - old + ntotal;
pthread_mutex_unlock(&slabs_lock);
}

flush_all命令是能夠有時間參數的。這個時間和其餘時間同樣取值範圍是 1到REALTIME_MAXDELTA(30天)。若是命令爲flush_all 100，那麼99秒後全部的item失效。此時settings.oldest_live的值爲current_time+100-1，do_item_flush_expired函數也沒有什麼用了(總不會被搶佔CPU99秒吧)。也正是這個緣由，須要在do_item_get裏面，加入settings.oldest_live<= current_time這個判斷，防止過早刪除了item。

這裏明顯有一個bug。假如客戶端A向服務器提交了flush_all10命令。過了5秒後，客戶端B向服務器提交命令flush_all100。那麼客戶端A的命令將失效，沒有起到任何做用。

LRU爬蟲：

前面說到，memcached是懶惰刪除過時失效item的。因此即便用戶在客戶端使用了flush_all命令使得所有item都過時失效了，但這些item仍是佔據者哈希表和LRU隊列並無歸還給slab分配器。

LRU爬蟲線程：

有沒有辦法強制清除這些過時失效的item，再也不佔據哈希表和LRU隊列的空間並歸還給slabs呢？固然是有的。memcached提供了LRU爬蟲能夠實現這個功能。

要使用LRU爬蟲就必須在客戶端使用lru_crawler命令。memcached服務器根據具體的命令參數進行處理。

memcached是用一個專門的線程負責清除這些過時失效item的，本文將稱這個線程爲LRU爬蟲線程。默認狀況下memcached是不啓動這個線程的，但能夠在啓動memcached的時候添加參數-o lru_crawler啓動這個線程。也能夠經過客戶端命令啓動。即便啓動了這個LRU爬蟲線程，該線程仍是不會工做。須要另外發送命令，指明要對哪一個LRU隊列進行清除處理。如今看一下lru_crawler有哪些參數。

LRU爬蟲命令：

lru_crawler <enable|disable> 啓動或者中止一個LRU爬蟲線程。任什麼時候刻，最多隻有一個LRU爬蟲線程。該命令對settings.lru_crawler進行賦值爲true或者false
lru_crawler crawl <classid,classid,classid|all> 可使用2,3,6這樣的列表指明要對哪一個LRU隊列進行清除處理。也可使用all對全部的LRU隊列進行處理
lru_crawler sleep <microseconds> LRU爬蟲線程在清除item的時候會佔用鎖，會妨礙worker線程的正常業務。因此LRU爬蟲在處理的時候須要時不時休眠一下。默認休眠時間爲100微秒。該命令對settings.lru_crawler_sleep進行賦值
lru_crawler tocrawl <32u> 一個LRU隊列可能會有不少過時失效的item。若是一直檢查和清除下去，勢必會妨礙worker線程的正常業務。這個參數用來指明最多隻檢查每一條LRU隊列的多少個item。默認值爲0，因此若是不指定那麼就不會工做。該命令對settings.lru_crawler_tocrawl進行賦值

若是要啓動LRU爬蟲主動刪除過時的item，須要這樣作：首先使用lru_crawlerenable命令啓動一個LRU爬蟲線程。而後使用lru_crawlertocrawl num命令肯定每個LRU隊列最多檢查num-1個item。最後使用命令lru_crawlercrawl <classid,classid,classid|all> 指定要處理的LRU隊列。lru_crawler sleep能夠不設置，若是要設置那麼能夠在lru_crawler crawl命令以前設置便可。

啓動LRU爬蟲線程：

如今來看一下LRU爬蟲是怎麼工做的。先來看一下memcached爲LRU爬蟲定義了哪些全局變量。

static volatile int do_run_lru_crawler_thread = 0;
static int lru_crawler_initialized = 0;
static pthread_mutex_t lru_crawler_lock = PTHREAD_MUTEX_INITIALIZER;
static pthread_cond_t  lru_crawler_cond = PTHREAD_COND_INITIALIZER;
int init_lru_crawler(void) {//main函數會調用該函數
if (lru_crawler_initialized == 0) {
if (pthread_cond_init(&lru_crawler_cond, NULL) != 0) {
fprintf(stderr, "Can't initialize lru crawler condition\n");
return -1;
}
pthread_mutex_init(&lru_crawler_lock, NULL);
lru_crawler_initialized = 1;
}
return 0;
}

代碼比較簡單，這裏就不說了。下面看一下lru_crawler enable和disable命令。enable命令會啓動一個LRU爬蟲線程，而disable會中止這個LRU爬蟲線程，固然不是直接調用pthread_exit中止線程。pthread_exit函數是一個危險函數，不該該在代碼出現。

static pthread_t item_crawler_tid;
//worker線程接收到"lru_crawler enable"命令後會調用本函數
//啓動memcached時若是有-o lru_crawler參數也是會調用本函數
int start_item_crawler_thread(void) {
int ret;
//在stop_item_crawler_thread函數能夠看到pthread_join函數
//在pthread_join返回後，纔會把settings.lru_crawler設置爲false。
//因此不會出現同時出現兩個crawler線程
if (settings.lru_crawler)
return -1;
pthread_mutex_lock(&lru_crawler_lock);
do_run_lru_crawler_thread = 1;
settings.lru_crawler = true;
//建立一個LRU爬蟲線程，線程函數爲item_crawler_thread。LRU爬蟲線程在進入
//item_crawler_thread函數後，會調用pthread_cond_wait，等待worker線程指定
//要處理的LRU隊列
if ((ret = pthread_create(&item_crawler_tid, NULL,
item_crawler_thread, NULL)) != 0) {
fprintf(stderr, "Can't create LRU crawler thread: %s\n",
strerror(ret));
pthread_mutex_unlock(&lru_crawler_lock);
return -1;
}
pthread_mutex_unlock(&lru_crawler_lock);
return 0;
}
//worker線程在接收到"lru_crawler disable"命令會執行這個函數
int stop_item_crawler_thread(void) {
int ret;
pthread_mutex_lock(&lru_crawler_lock);
do_run_lru_crawler_thread = 0;//中止LRU線程
//LRU爬蟲線程可能休眠於等待條件變量，須要喚醒才能中止LRU爬蟲線程
pthread_cond_signal(&lru_crawler_cond);
pthread_mutex_unlock(&lru_crawler_lock);
if ((ret = pthread_join(item_crawler_tid, NULL)) != 0) {
fprintf(stderr, "Failed to stop LRU crawler thread: %s\n", strerror(ret));
return -1;
}
settings.lru_crawler = false;
return 0;
}

能夠看到worker線程在接收到」 lru_crawler enable」命令後會啓動一個LRU爬蟲線程。這個LRU爬蟲線程還沒去執行任務，由於尚未指定任務。命令"lru_crawlertocrawl num"並非啓動一個任務。對於這個命令，worker線程只是簡單地把settings.lru_crawler_tocrawl賦值爲num。

清除失效item：

命令」lru_crawler crawl<classid,classid,classid|all>」纔是指定任務的。該命令指明瞭要對哪一個LRU隊列進行清理。若是使用all那麼就是對全部的LRU隊列進行清理。

在看memcached的清理代碼以前，先考慮一個問題：怎麼對一條LRU隊列進行清理？

最直觀的作法是先加鎖(鎖上cache_lock)，而後遍歷一整條LRU隊列。直接判斷LRU隊列裏面的每個item便可。明顯這種方法有問題。若是memcached有大量的item，那麼遍歷一個LRU隊列耗時將過久。這樣會妨礙worker線程的正常業務。固然咱們能夠考慮使用分而治之的方法，每次只處理幾個item，屢次進行，最終達處處理整個LRU隊列的目標。但LRU隊列是一個鏈表，不支持隨機訪問。處理隊列中間的某個item，須要從鏈表頭或者尾依次訪問，時間複雜度仍是O(n)。

僞item：

memcached爲了實現隨機訪問，使用了一個很巧妙的方法。它在LRU隊列尾部插入一個僞item，而後驅動這個僞item向隊列頭部前進，每次前進一位。

這個僞item是全局變量，LRU爬蟲線程無須從LRU隊列頭部或者尾部遍歷就能夠直接訪問這個僞item。經過這個僞item的next和prev指針，就能夠訪問真正的item。因而，LRU爬蟲線程無需遍歷就能夠直接訪問LRU隊列中間的某一個item。

下面看一下lru_crawler_crawl函數，memcached會在這個函數會把僞item插入到LRU隊列尾部的。當worker線程接收到lru_crawler crawl<classid,classid,classid|all>命令時就會調用這個函數。由於用戶可能要求LRU爬蟲線程清理多個LRU隊列的過時失效item，因此須要一個僞item數組。僞item數組的大小等於LRU隊列的個數，它們是一一對應的。

//這個結構體和item結構體長得很像,是僞item結構體，用於LRU爬蟲
typedef struct {
struct _stritem *next;
struct _stritem *prev;
struct _stritem *h_next;    /* hash chain next */
rel_time_t      time;       /* least recent access */
rel_time_t      exptime;    /* expire time */
int             nbytes;     /* size of data */
unsigned short  refcount;
uint8_t         nsuffix;    /* length of flags-and-length string */
uint8_t         it_flags;   /* ITEM_* above */
uint8_t         slabs_clsid;/* which slab class we're in */
uint8_t         nkey;       /* key length, w/terminating null and padding */
uint32_t        remaining;  /* Max keys to crawl per slab per invocation */
} crawler;
static crawler crawlers[LARGEST_ID];
static int crawler_count = 0;//本次任務要處理多少個LRU隊列
//當客戶端使用命令lru_crawler crawl <classid,classid,classid|all>時，
//worker線程就會調用本函數,並將命令的第二個參數做爲本函數的參數
enum crawler_result_type lru_crawler_crawl(char *slabs) {
char *b = NULL;
uint32_t sid = 0;
uint8_t tocrawl[POWER_LARGEST];
//LRU爬蟲線程進行清理的時候，會鎖上lru_crawler_lock。直到完成全部
//的清理任務纔會解鎖。因此客戶端的前一個清理任務還沒結束前，不能
//再提交另一個清理任務
if (pthread_mutex_trylock(&lru_crawler_lock) != 0) {
return CRAWLER_RUNNING;
}
pthread_mutex_lock(&cache_lock);
//解析命令，若是命令要求對某一個LRU隊列進行清理，那麼就在tocrawl數組
//對應元素賦值1做爲標誌
if (strcmp(slabs, "all") == 0) {//處理所有lru隊列
for (sid = 0; sid < LARGEST_ID; sid++) {
tocrawl[sid] = 1;
}
} else {
for (char *p = strtok_r(slabs, ",", &b);
p != NULL;
p = strtok_r(NULL, ",", &b)) {
//解析出一個個的sid
if (!safe_strtoul(p, &sid) || sid < POWER_SMALLEST
|| sid > POWER_LARGEST) {//sid越界
pthread_mutex_unlock(&cache_lock);
pthread_mutex_unlock(&lru_crawler_lock);
return CRAWLER_BADCLASS;
}
tocrawl[sid] = 1;
}
}
//crawlers是一個僞item類型數組。若是用戶要清理某一個LRU隊列，那麼
//就在這個LRU隊列中插入一個僞item
for (sid = 0; sid < LARGEST_ID; sid++) {
if (tocrawl[sid] != 0 && tails[sid] != NULL) {
//對於僞item和真正的item，能夠用nkey、time這兩個成員的值區別
crawlers[sid].nbytes = 0;
crawlers[sid].nkey = 0;
crawlers[sid].it_flags = 1; /* For a crawler, this means enabled. */
crawlers[sid].next = 0;
crawlers[sid].prev = 0;
crawlers[sid].time = 0;
crawlers[sid].remaining = settings.lru_crawler_tocrawl;
crawlers[sid].slabs_clsid = sid;
//將這個僞item插入到對應的lru隊列的尾部
crawler_link_q((item *)&crawlers[sid]);
crawler_count++;//要處理的LRU隊列數加一
}
}
pthread_mutex_unlock(&cache_lock);
//有任務了，喚醒LRU爬蟲線程，讓其執行清理任務
pthread_cond_signal(&lru_crawler_cond);
STATS_LOCK();
stats.lru_crawler_running = true;
STATS_UNLOCK();
pthread_mutex_unlock(&lru_crawler_lock);
return CRAWLER_OK;
}

如今再來看一下僞item是怎麼在LRU隊列中前進的。先看一個僞item前進圖。

從上面的圖能夠看到，僞item經過與前驅節點交換位置實現前進。若是僞item是LRU隊列的頭節點，那麼就將這個僞item移出LRU隊列。函數crawler_crawl_q完成這個交換操做，而且返回交換前僞item的前驅節點(固然在交換後就變成僞item的後驅節點了)。若是僞item處於LRU隊列的頭部，那麼就返回NULL(此時沒有前驅節點了)。crawler_crawl_q函數裏面那些指針滿天飛，這裏就不貼出代碼了。

上面的圖，雖然僞item遍歷了LRU隊列，但並無刪除某個item。這樣畫，一來是爲了好看，二來遍歷LRU隊列不必定會刪除item的(item不過時失效就不會刪除)。

清理item：

前面說到，能夠用命令lru_crawler tocrawl num指定每一個LRU隊列最多隻檢查num-1個item。看清楚點，是檢查數，不是刪除數，並且是num-1個。首先要調用item_crawler_evaluate函數檢查一個item是否過時，是的話就刪除。若是檢查完num-1個，僞item都尚未到達LRU隊列的頭部，那麼就直接將這個僞item從LRU隊列中刪除。下面看一下item_crawler_thread函數吧。

static void *item_crawler_thread(void *arg) {
int i;
pthread_mutex_lock(&lru_crawler_lock);
while (do_run_lru_crawler_thread) {
//lru_crawler_crawl函數和stop_item_crawler_thread函數會signal這個條件變量
pthread_cond_wait(&lru_crawler_cond, &lru_crawler_lock);
while (crawler_count) {//crawler_count代表要處理多少個LRU隊列
item *search = NULL;
void *hold_lock = NULL;
for (i = 0; i < LARGEST_ID; i++) {
if (crawlers[i].it_flags != 1) {
continue;
}
pthread_mutex_lock(&cache_lock);
//返回crawlers[i]的前驅節點,並交換crawlers[i]和前驅節點的位置
search = crawler_crawl_q((item *)&crawlers[i]);
if (search == NULL || //crawlers[i]是頭節點，沒有前驅節點
//remaining的值爲settings.lru_crawler_tocrawl。每次啓動lru
//爬蟲線程，檢查每個lru隊列的多少個item。
(crawlers[i].remaining && --crawlers[i].remaining < 1)) {
//檢查了足夠屢次，退出檢查這個lru隊列
crawlers[i].it_flags = 0;
crawler_count--;//清理完一個LRU隊列,任務數減一
crawler_unlink_q((item *)&crawlers[i]);//將這個僞item從LRU隊列中刪除
pthread_mutex_unlock(&cache_lock);
continue;
}
uint32_t hv = hash(ITEM_key(search), search->nkey);
//嘗試鎖住控制這個item的哈希表段級別鎖
if ((hold_lock = item_trylock(hv)) == NULL) {
pthread_mutex_unlock(&cache_lock);
continue;
}
//此時有其餘worker線程在引用這個item
if (refcount_incr(&search->refcount) != 2) {
refcount_decr(&search->refcount);//lru爬蟲線程放棄引用該item
if (hold_lock)
item_trylock_unlock(hold_lock);
pthread_mutex_unlock(&cache_lock);
continue;
}
//若是這個item過時失效了，那麼就刪除這個item
item_crawler_evaluate(search, hv, i);
if (hold_lock)
item_trylock_unlock(hold_lock);
pthread_mutex_unlock(&cache_lock);
//lru爬蟲不能不間斷地爬lru隊列，這樣會妨礙worker線程的正常業務
//因此須要掛起lru爬蟲線程一段時間。在默認設置中，會休眠100微秒
if (settings.lru_crawler_sleep)
usleep(settings.lru_crawler_sleep);//微秒級
}
}
STATS_LOCK();
stats.lru_crawler_running = false;
STATS_UNLOCK();
}
pthread_mutex_unlock(&lru_crawler_lock);
return NULL;
}
//若是這個item過時失效了，那麼就刪除其
static void item_crawler_evaluate(item *search, uint32_t hv, int i) {
rel_time_t oldest_live = settings.oldest_live;
//這個item的exptime時間戳到了，已通過期失效了
if ((search->exptime != 0 && search->exptime < current_time)
//由於客戶端發送flush_all命令，致使這個item失效了
|| (search->time <= oldest_live && oldest_live <= current_time)) {
itemstats[i].crawler_reclaimed++;
if ((search->it_flags & ITEM_FETCHED) == 0) {
itemstats[i].expired_unfetched++;
}
//將item從LRU隊列中刪除
do_item_unlink_nolock(search, hv);
do_item_remove(search);
assert(search->slabs_clsid == 0);
} else {
refcount_decr(&search->refcount);
}
}

真正的LRU淘汰：

雖然本文前面屢次使用LRU這個詞，而且memcached代碼裏面的函數命名也用了lru前綴，特別是lru_crawler命令。但實際上這和LRU淘汰沒有半毛錢關係！！

上當受騙了吧，罵吧：￥&@#￥&*@%##……%#%……#￥%￥@#%……

讀者能夠回憶一下操做系統裏面的LRU算法。本文裏面刪除的那些item都是過時失效的，刪除了活該。過時了還霸着位置，有點像霸着茅坑不拉屎。操做系統裏面LRU算法是由於資源不夠，迫於無奈而被踢的，被踢者也是挺無奈的。不同吧，因此說本文前面說的不是LRU。

那memcached的LRU在哪裏體現了呢？do_item_alloc函數！！前面的博文一直都有提到這個神通常的函數，但從沒有給出完整的版本。固然這裏也不會給出完整的版本。由於這個函數裏面仍是有一些東西暫時沒法解釋給讀者們聽。如今估計讀者都能體會到《》中寫到的：memcached模塊間的關聯性太多了。

item *do_item_alloc(char *key, const size_t nkey, const int flags,
const rel_time_t exptime, const int nbytes,
const uint32_t cur_hv) {
uint8_t nsuffix;
item *it = NULL;
char suffix[40];
//要存儲這個item須要的總空間
size_t ntotal = item_make_header(nkey + 1, flags, nbytes, suffix, &nsuffix);
if (settings.use_cas) {
ntotal += sizeof(uint64_t);
}
//根據大小判斷從屬於哪一個slab
unsigned int id = slabs_clsid(ntotal);
int tries = 5;
item *search;
item *next_it;
rel_time_t oldest_live = settings.oldest_live;
search = tails[id];
for (; tries > 0 && search != NULL; tries--, search=next_it) {
next_it = search->prev;
uint32_t hv = hash(ITEM_key(search), search->nkey);
/* Now see if the item is refcount locked */
if (refcount_incr(&search->refcount) != 2) {//引用數>=3
refcount_decr(&search->refcount);
continue;
}
//search指向的item的refcount等於2，這說明此時這個item除了本worker
//線程外，沒有其餘任何worker線程索引其。能夠放心釋放並重用這個item
//由於這個循環是從lru鏈表的後面開始遍歷的。因此一開始search就指向
//了最不經常使用的item，若是這個item都沒有過時。那麼其餘的比其更經常使用
//的item就不要刪除了(即便它們過時了)。此時只能向slabs申請內存
if ((search->exptime != 0 && search->exptime < current_time)
|| (search->time <= oldest_live && oldest_live <= current_time)) {
..
} else if ((it = slabs_alloc(ntotal, id)) == NULL) {//申請內存失敗
//此刻，過時失效的item沒有找到，申請內存又失敗了。看來只能使用
//LRU淘汰一個item(即便這個item並無過時失效)
if (settings.evict_to_free == 0) {//設置了不進行LRU淘汰item
//此時只能向客戶端回覆錯誤了
itemstats[id].outofmemory++;
} else {
//即便一個item的exptime成員設置爲永不超時(0)，仍是會被踢的
it = search;
//從新計算一下這個slabclass_t分配出去的內存大小
//直接霸佔舊的item就須要從新計算
slabs_adjust_mem_requested(it->slabs_clsid, ITEM_ntotal(it), ntotal);
do_item_unlink_nolock(it, hv);//從哈希表和lru鏈表中刪除
/* Initialize the item block: */
it->slabs_clsid = 0;
}
}
//引用計數減一。此時該item已經沒有任何worker線程索引其，而且哈希表也
//再也不索引其
refcount_decr(&search->refcount);
break;
}
...
return it;
}

原文地址：http://www.faceye.net/search/142629.html