1. 惰性刪除。memcached通常不會主動去清除已通過期或者失效的緩存,當get請求一個item的時候,纔會去檢查item是否失效。數組
2. flush命令。flush命令會將全部的item設置爲失效。緩存
3. 建立的時候檢查。Memcached會在建立ITEM的時候去LRU的鏈表尾部開始檢查,是否有失效的ITEM,若是沒有的話就從新建立。數據結構
4. LRU爬蟲。memcached默認是關閉LRU爬蟲的。LRU爬蟲是一個單獨的線程,會去清理失效的ITEM。app
5. LRU淘汰。當緩存沒有內存能夠分配給新的元素的時候,memcached會從LRU鏈表的尾部開始淘汰一個ITEM,無論這個ITEM是否還在有效期都將會面臨淘汰。LRU鏈表插入緩存ITEM的時候有前後順序,因此淘汰一個ITEM也是從尾部進行 也就是先淘汰最先的ITEM。ide
Mecached的LRU的鏈表操做主要在item.c這個文件上的。其中數組heads和tails分別存儲不一樣的LRU的雙向鏈表的頭地址和尾部地址。memcached
每一個slabs class都會有本身的一個雙向鏈表結構。鏈表結構主要經過item結構中的兩個指針地址來記錄item在鏈表上左右兩邊位置的item地址值。oop
1 //item的具體結構 2 typedef struct _stritem { 3 //記錄LRU雙向鏈表下一個item的地址 4 struct _stritem *next; //下一個結構 5 //記錄LRU雙向鏈表前一個Item的地址 6 struct _stritem *prev; //前一個結構 7 8 //....more code 9 } item;
item_link_q方法主要是將一個item添加到LRU鏈表上面:fetch
1 //從LRU鏈表上新增一個Item 2 //LRU鏈表是一個雙向鏈表結構 3 static void item_link_q(item *it) { /* item is the new head */ 4 item **head, **tail; 5 assert(it->slabs_clsid < LARGEST_ID); 6 assert((it->it_flags & ITEM_SLABBED) == 0); 7 8 head = &heads[it->slabs_clsid]; 9 tail = &tails[it->slabs_clsid]; 10 assert(it != *head); 11 assert((*head && *tail) || (*head == 0 && *tail == 0)); 12 it->prev = 0; 13 it->next = *head; 14 if (it->next) it->next->prev = it; 15 *head = it; 16 if (*tail == 0) *tail = it; 17 sizes[it->slabs_clsid]++; 18 return; 19 }
item_unlink_q方法主要是將一個item從LRU鏈表上面解除:ui
1 //從LRU鏈表上解除Item 2 static void item_unlink_q(item *it) { 3 item **head, **tail; 4 assert(it->slabs_clsid < LARGEST_ID); 5 head = &heads[it->slabs_clsid]; 6 tail = &tails[it->slabs_clsid]; 7 8 if (*head == it) { 9 assert(it->prev == 0); 10 *head = it->next; 11 } 12 if (*tail == it) { 13 assert(it->next == 0); 14 *tail = it->prev; 15 } 16 assert(it->next != it); 17 assert(it->prev != it); 18 19 if (it->next) it->next->prev = it->prev; 20 if (it->prev) it->prev->next = it->next; 21 sizes[it->slabs_clsid]--; 22 return; 23 }
Memcached的緩存清除策略是惰性的。這個如何來理解?當用戶設置了一個緩存數據,緩存有效期爲5分鐘。當5分鐘時間事後,緩存失效,這個時候Memcached並不會自動去檢查當前的Item是否過時。當客戶端再次來請求這個數據的時候,纔會去檢查緩存是否失效了,若是失效則會去清除這個數據。this
看一下do_item_get這個方法中,判斷緩存數據是否失效的代碼:
1 /** wrapper around assoc_find which does the lazy expiration logic */ 2 item *do_item_get(const char *key, const size_t nkey, const uint32_t hv) { 3 //...code 4 if (it != NULL) { 5 //settings.oldest_live主要用來記錄flush命令執行的時間 6 //it->time用來記錄item最近set/add/replce等操做的時間(get操做不會改變) 7 //而後判斷it->time是否在執行flush命令以前,若是是執行flush以前,說明該item已經失效 8 if (settings.oldest_live != 0 && settings.oldest_live <= current_time && 9 it->time <= settings.oldest_live) { 10 //LRU鏈表和HASHTABLE上解除綁定 11 do_item_unlink(it, hv); 12 //刪除該Item 13 do_item_remove(it); 14 it = NULL; //返回NULL 15 if (was_found) { 16 fprintf(stderr, " -nuked by flush"); 17 } 18 //檢查是否過時,主要是檢查有效期時間 19 //若是數據已通過期,則須要清除 20 } else if (it->exptime != 0 && it->exptime <= current_time) { 21 //LRU鏈表和HASHTABLE上解除綁定 22 do_item_unlink(it, hv); 23 //刪除該Item 24 do_item_remove(it); 25 it = NULL; 26 if (was_found) { 27 fprintf(stderr, " -nuked by expire"); 28 } 29 } else { 30 it->it_flags |= ITEM_FETCHED; 31 DEBUG_REFCNT(it, '+'); 32 } 33 } 34 //...code 35 }
當用戶發送一個flush命令的時候,Memcached會將命令以前的全部的緩存都設置爲失效。
Memcached不會主動去清除這些item。主要經過兩種方式:
1. do_item_flush_expired方法。
Memcached會在接受到flush命令的時候,將設置全局參數settings.oldest_live =current_time - 1。而後去調用item_flush_expired方法。由於設置全局參數item_flush_expired到調用緩存鎖方法之間會有必定的時間差,有可能這個過程當中,會有新的item在操做。
而後Memcached調用do_item_flush_expired方法,去遍歷全部的LRU鏈表。do_item_flush_expired不會將每個在flush命令前的Item刪除,由於這樣會很是耗時,而是刪除在設置全局變量到加上緩存鎖這之間操做的item。這樣就能加快flush的速度。
2. 惰性刪除方法。
Memcached會在get操做的時候去判斷it->time是否小於settings.oldest_live,若是小於,說明這個item就是過時的。經過這種方法,惰性刪除大批量的item數據。
1 /* 2 * Flushes expired items after a flush_all call 3 */ 4 void item_flush_expired() { 5 mutex_lock(&cache_lock); 6 do_item_flush_expired(); 7 mutex_unlock(&cache_lock); 8 } 9 /* expires items that are more recent than the oldest_live setting. */ 10 void do_item_flush_expired(void) { 11 int i; 12 item *iter, *next; 13 if (settings.oldest_live == 0) 14 return; 15 for (i = 0; i < LARGEST_ID; i++) { 16 /* The LRU is sorted in decreasing time order, and an item's timestamp 17 * is never newer than its last access time, so we only need to walk 18 * back until we hit an item older than the oldest_live time. 19 * The oldest_live checking will auto-expire the remaining items. 20 */ 21 for (iter = heads[i]; iter != NULL; iter = next) { 22 /* iter->time of 0 are magic objects. */ 23 //iter->time 最近一次的訪問時間 24 //這邊爲什麼是iter->time >= settings.oldest_live? 25 //由於在執行do_item_flush_expired方法前,已經上了cache鎖,其它worker是不能操做的 26 //這邊過程當中,若是遍歷每個Item都去刪除,那麼這個遍歷過程會很是緩慢,會致使客戶端一直等待。 27 // 28 //Memcached就想出了一個聰明的辦法,從設置settings.oldest_live到上鎖之間,仍是會有其它客戶端 29 //操做item數據,那麼Memcache就將這一部分數據先清理(這部分數據很是少許),這樣就能加快flush的速度 30 //而剩餘iter->time < settings.oldest_live的那大批量的item,會經過惰性刪除的方式,在get請求中去判斷處理 31 if (iter->time != 0 && iter->time >= settings.oldest_live) { 32 next = iter->next; 33 if ((iter->it_flags & ITEM_SLABBED) == 0) { 34 do_item_unlink_nolock(iter, hash(ITEM_key(iter), iter->nkey)); 35 } 36 } else { 37 /* We've hit the first old item. Continue to the next queue. */ 38 break; 39 } 40 } 41 } 42 }
1 //建立一個新的Item 2 item *do_item_alloc(char *key, const size_t nkey, const int flags, 3 const rel_time_t exptime, const int nbytes, 4 const uint32_t cur_hv) { 5 uint8_t nsuffix; 6 item *it = NULL; //item結構 7 char suffix[40]; 8 //item_make_header 計算存儲數據的總長度 9 size_t ntotal = item_make_header(nkey + 1, flags, nbytes, suffix, &nsuffix); 10 if (settings.use_cas) { 11 ntotal += sizeof(uint64_t); 12 } 13 14 //經過ntotal 查詢在哪一個slabs_class上面 15 //Memcached會根據存儲數據長度的不一樣,分爲N多個slabs_class 16 //用戶存儲數據的時候,根據須要存儲數據的長度,就能夠查詢到須要存儲到哪一個slabs_class中。 17 //每一個slabs_class都由諾幹個slabs組成,slabs每一個大小爲1M,咱們的item結構的數據就會被分配在slabs上 18 //每一個slabs都會根據本身slabs_class存儲的數據塊的大小,會被分割爲諾幹個chunk 19 // 20 //舉個例子: 21 //若是id=1的slabs_class爲存儲 最大爲224個字節的緩存數據 22 //當用戶的設置的緩存數據總數據長度爲200個字節,則這個item結構就會存儲到id=1的slabs_class上。 23 //當第一次或者slabs_class中的slabs不夠用的時候,slabs_class就會去分配一個1M的slabs給存儲item使用 24 //由於id=1的slabs_class存儲小於224個字節的數據,因此slabs會被分割爲諾幹個大小爲224字節的chunk塊 25 //咱們的item結構數據,就會存儲在這個chunk塊上面 26 unsigned int id = slabs_clsid(ntotal); 27 if (id == 0) 28 return 0; 29 30 mutex_lock(&cache_lock); 31 /* do a quick check if we have any expired items in the tail.. */ 32 int tries = 5; 33 /* Avoid hangs if a slab has nothing but refcounted stuff in it. */ 34 int tries_lrutail_reflocked = 1000; 35 int tried_alloc = 0; 36 item *search; 37 item *next_it; 38 void *hold_lock = NULL; 39 rel_time_t oldest_live = settings.oldest_live; 40 41 //這邊就能夠獲得slabs_class上第一個item的地址 42 //item數據結構經過item->next和item->prev 來記錄鏈表結構 43 //這邊是尋找LRU 鏈表的尾部地址 44 search = tails[id]; 45 46 /* We walk up *only* for locked items. Never searching for expired. 47 * Waste of CPU for almost all deployments */ 48 //tries = 5 這邊只嘗試5次循環搜索 49 //search = tails[id] 搜索從LRU鏈表 的尾部開始 50 for (; tries > 0 && search != NULL; tries--, search=next_it) { 51 /* we might relink search mid-loop, so search->prev isn't reliable */ 52 next_it = search->prev; 53 if (search->nbytes == 0 && search->nkey == 0 && search->it_flags == 1) { 54 /* We are a crawler, ignore it. */ 55 tries++; 56 continue; 57 } 58 uint32_t hv = hash(ITEM_key(search), search->nkey); 59 /* Attempt to hash item lock the "search" item. If locked, no 60 * other callers can incr the refcount 61 */ 62 /* Don't accidentally grab ourselves, or bail if we can't quicklock */ 63 if (hv == cur_hv || (hold_lock = item_trylock(hv)) == NULL) 64 continue; 65 /* Now see if the item is refcount locked */ 66 67 //通常狀況下search->refcount爲1,若是增長了refcount以後,不等於2,說明item被其它的worker線程鎖定 68 //refcount往上加1,是鎖定當前的item,若是不等於2,說明鎖定失敗 69 if (refcount_incr(&search->refcount) != 2) { 70 /* Avoid pathological case with ref'ed items in tail */ 71 do_item_update_nolock(search); 72 tries_lrutail_reflocked--; 73 tries++; //try的次數+1 74 refcount_decr(&search->refcount); //減去1 75 itemstats[id].lrutail_reflocked++; 76 /* Old rare bug could cause a refcount leak. We haven't seen 77 * it in years, but we leave this code in to prevent failures 78 * just in case */ 79 if (settings.tail_repair_time && 80 search->time + settings.tail_repair_time < current_time) { 81 itemstats[id].tailrepairs++; 82 search->refcount = 1; 83 do_item_unlink_nolock(search, hv); 84 } 85 if (hold_lock) 86 item_trylock_unlock(hold_lock); 87 88 if (tries_lrutail_reflocked < 1) 89 break; 90 91 continue; 92 } 93 94 /* Expired or flushed */ 95 //這邊判斷尾部的Item是否失效,若是已經失效了的話,將當前的失效的item分配給最新的緩存 96 if ((search->exptime != 0 && search->exptime < current_time) 97 || (search->time <= oldest_live && oldest_live <= current_time)) { 98 itemstats[id].reclaimed++; 99 if ((search->it_flags & ITEM_FETCHED) == 0) { 100 itemstats[id].expired_unfetched++; 101 } 102 it = search; 103 slabs_adjust_mem_requested(it->slabs_clsid, ITEM_ntotal(it), ntotal); 104 do_item_unlink_nolock(it, hv); 105 /* Iniialize the item block: */ 106 it->slabs_clsid = 0; 107 108 //slabs_alloc方法是去分配一個新的內存塊 109 } else if ((it = slabs_alloc(ntotal, id)) == NULL) { 110 tried_alloc = 1; 111 //若是設置了不容許LRU淘汰,則返回ERROR 112 if (settings.evict_to_free == 0) { 113 itemstats[id].outofmemory++; 114 } else { 115 //這邊設置了LRU淘汰 116 //若是分配失敗,則從LRU鏈表尾部,淘汰一個item 117 //若是這個item設置了有效期爲0,也會被淘汰 118 itemstats[id].evicted++; 119 itemstats[id].evicted_time = current_time - search->time; 120 if (search->exptime != 0) 121 itemstats[id].evicted_nonzero++; 122 if ((search->it_flags & ITEM_FETCHED) == 0) { 123 itemstats[id].evicted_unfetched++; 124 } 125 //這邊直接將LRU尾部的ITEM淘汰,而且給了最新的ITEM使用 126 it = search; 127 //從新計算一下這個slabclass_t分配出去的內存大小 128 //直接霸佔被淘汰的item就須要從新計算 129 slabs_adjust_mem_requested(it->slabs_clsid, ITEM_ntotal(it), ntotal); 130 //從哈希表和lru鏈表中刪除 131 //it->refcount的值爲2,因此item不會被刪除,只是HashTable和LRU上的連接關係 132 do_item_unlink_nolock(it, hv); 133 /* Initialize the item block: */ 134 it->slabs_clsid = 0; 135 136 /* If we've just evicted an item, and the automover is set to 137 * angry bird mode, attempt to rip memory into this slab class. 138 * TODO: Move valid object detection into a function, and on a 139 * "successful" memory pull, look behind and see if the next alloc 140 * would be an eviction. Then kick off the slab mover before the 141 * eviction happens. 142 */ 143 if (settings.slab_automove == 2) 144 slabs_reassign(-1, id); 145 } 146 } 147 148 //解除引用鎖定 149 refcount_decr(&search->refcount); 150 /* If hash values were equal, we don't grab a second lock */ 151 if (hold_lock) 152 item_trylock_unlock(hold_lock); 153 break; 154 } 155 156 /* 若是分配了5次,結果LRU鏈表尾部的item都是被鎖定的,則從新分配一個item */ 157 if (!tried_alloc && (tries == 0 || search == NULL)) 158 it = slabs_alloc(ntotal, id); 159 160 if (it == NULL) { 161 itemstats[id].outofmemory++; 162 mutex_unlock(&cache_lock); 163 return NULL; 164 } 165 166 assert(it->slabs_clsid == 0); 167 assert(it != heads[id]); 168 169 /* Item initialization can happen outside of the lock; the item's already 170 * been removed from the slab LRU. 171 */ 172 it->refcount = 1; //引用的次數 又設置爲1 /* the caller will have a reference */ 173 mutex_unlock(&cache_lock); 174 it->next = it->prev = it->h_next = 0; 175 it->slabs_clsid = id; 176 177 DEBUG_REFCNT(it, '*'); 178 it->it_flags = settings.use_cas ? ITEM_CAS : 0; 179 it->nkey = nkey; 180 it->nbytes = nbytes; 181 //這邊是內存拷貝,拷貝到item結構地址的內存塊上 182 memcpy(ITEM_key(it), key, nkey); 183 it->exptime = exptime; 184 //這邊也是內存拷貝 185 memcpy(ITEM_suffix(it), suffix, (size_t)nsuffix); 186 it->nsuffix = nsuffix; 187 return it; 188 }
Memcached會開一個單獨的線程對失效的緩存數據進行處理。
1 //LRU爬蟲 2 static void *item_crawler_thread(void *arg) { 3 int i; 4 5 pthread_mutex_lock(&lru_crawler_lock); 6 if (settings.verbose > 2) 7 fprintf(stderr, "Starting LRU crawler background thread\n"); 8 while (do_run_lru_crawler_thread) { 9 pthread_cond_wait(&lru_crawler_cond, &lru_crawler_lock); 10 11 while (crawler_count) { 12 item *search = NULL; 13 void *hold_lock = NULL; 14 15 for (i = 0; i < LARGEST_ID; i++) { 16 if (crawlers[i].it_flags != 1) { 17 continue; 18 } 19 pthread_mutex_lock(&cache_lock); 20 search = crawler_crawl_q((item *)&crawlers[i]); 21 if (search == NULL || 22 (crawlers[i].remaining && --crawlers[i].remaining < 1)) { 23 if (settings.verbose > 2) 24 fprintf(stderr, "Nothing left to crawl for %d\n", i); 25 crawlers[i].it_flags = 0; 26 crawler_count--; 27 crawler_unlink_q((item *)&crawlers[i]); 28 pthread_mutex_unlock(&cache_lock); 29 continue; 30 } 31 uint32_t hv = hash(ITEM_key(search), search->nkey); 32 /* Attempt to hash item lock the "search" item. If locked, no 33 * other callers can incr the refcount 34 */ 35 if ((hold_lock = item_trylock(hv)) == NULL) { 36 pthread_mutex_unlock(&cache_lock); 37 continue; 38 } 39 /* Now see if the item is refcount locked */ 40 if (refcount_incr(&search->refcount) != 2) { 41 refcount_decr(&search->refcount); 42 if (hold_lock) 43 item_trylock_unlock(hold_lock); 44 pthread_mutex_unlock(&cache_lock); 45 continue; 46 } 47 48 /* Frees the item or decrements the refcount. */ 49 /* Interface for this could improve: do the free/decr here 50 * instead? */ 51 item_crawler_evaluate(search, hv, i); 52 53 if (hold_lock) 54 item_trylock_unlock(hold_lock); 55 pthread_mutex_unlock(&cache_lock); 56 57 if (settings.lru_crawler_sleep) 58 usleep(settings.lru_crawler_sleep); 59 } 60 } 61 if (settings.verbose > 2) 62 fprintf(stderr, "LRU crawler thread sleeping\n"); 63 STATS_LOCK(); 64 stats.lru_crawler_running = false; 65 STATS_UNLOCK(); 66 } 67 pthread_mutex_unlock(&lru_crawler_lock); 68 if (settings.verbose > 2) 69 fprintf(stderr, "LRU crawler thread stopping\n"); 70 71 return NULL; 72 } 73 74 75 int start_item_crawler_thread(void) { 76 int ret; 77 78 if (settings.lru_crawler) 79 return -1; 80 pthread_mutex_lock(&lru_crawler_lock); 81 do_run_lru_crawler_thread = 1; 82 settings.lru_crawler = true; 83 if ((ret = pthread_create(&item_crawler_tid, NULL, 84 item_crawler_thread, NULL)) != 0) { 85 fprintf(stderr, "Can't create LRU crawler thread: %s\n", 86 strerror(ret)); 87 pthread_mutex_unlock(&lru_crawler_lock); 88 return -1; 89 } 90 pthread_mutex_unlock(&lru_crawler_lock); 91 92 return 0; 93 }