Memcached學習（五）--LRU刪除策略

時間 2019-12-09

標籤 memcached 學習 lru 刪除策略欄目 Memcached 简体版

原文原文鏈接

Memcached過時鍵刪除策略

1. 惰性刪除。memcached通常不會主動去清除已通過期或者失效的緩存，當get請求一個item的時候，纔會去檢查item是否失效。數組

2. flush命令。flush命令會將全部的item設置爲失效。緩存

3. 建立的時候檢查。Memcached會在建立ITEM的時候去LRU的鏈表尾部開始檢查，是否有失效的ITEM，若是沒有的話就從新建立。數據結構

4. LRU爬蟲。memcached默認是關閉LRU爬蟲的。LRU爬蟲是一個單獨的線程，會去清理失效的ITEM。app

5. LRU淘汰。當緩存沒有內存能夠分配給新的元素的時候，memcached會從LRU鏈表的尾部開始淘汰一個ITEM，無論這個ITEM是否還在有效期都將會面臨淘汰。LRU鏈表插入緩存ITEM的時候有前後順序，因此淘汰一個ITEM也是從尾部進行也就是先淘汰最先的ITEM。ide

LRU的數據結構和基本操做

Mecached的LRU的鏈表操做主要在item.c這個文件上的。其中數組heads和tails分別存儲不一樣的LRU的雙向鏈表的頭地址和尾部地址。memcached

每一個slabs class都會有本身的一個雙向鏈表結構。鏈表結構主要經過item結構中的兩個指針地址來記錄item在鏈表上左右兩邊位置的item地址值。oop

1 //item的具體結構
2 typedef struct _stritem {
3     //記錄LRU雙向鏈表下一個item的地址
4     struct _stritem *next;  //下一個結構
5     //記錄LRU雙向鏈表前一個Item的地址
6     struct _stritem *prev;  //前一個結構
7     
8     //....more code
9 } item;

item_link_q方法主要是將一個item添加到LRU鏈表上面：fetch

 1 //從LRU鏈表上新增一個Item
 2 //LRU鏈表是一個雙向鏈表結構
 3 static void item_link_q(item *it) { /* item is the new head */
 4     item **head, **tail;
 5     assert(it->slabs_clsid < LARGEST_ID);
 6     assert((it->it_flags & ITEM_SLABBED) == 0);
 7  
 8     head = &heads[it->slabs_clsid];
 9     tail = &tails[it->slabs_clsid];
10     assert(it != *head);
11     assert((*head && *tail) || (*head == 0 && *tail == 0));
12     it->prev = 0;
13     it->next = *head;
14     if (it->next) it->next->prev = it;
15     *head = it;
16     if (*tail == 0) *tail = it;
17     sizes[it->slabs_clsid]++;
18     return;
19 }

item_unlink_q方法主要是將一個item從LRU鏈表上面解除：ui

 1 //從LRU鏈表上解除Item
 2 static void item_unlink_q(item *it) {
 3     item **head, **tail;
 4     assert(it->slabs_clsid < LARGEST_ID);
 5     head = &heads[it->slabs_clsid];
 6     tail = &tails[it->slabs_clsid];
 7  
 8     if (*head == it) {
 9         assert(it->prev == 0);
10         *head = it->next;
11     }
12     if (*tail == it) {
13         assert(it->next == 0);
14         *tail = it->prev;
15     }
16     assert(it->next != it);
17     assert(it->prev != it);
18  
19     if (it->next) it->next->prev = it->prev;
20     if (it->prev) it->prev->next = it->next;
21     sizes[it->slabs_clsid]--;
22     return;
23 }

策略1--惰性刪除

Memcached的緩存清除策略是惰性的。這個如何來理解？當用戶設置了一個緩存數據，緩存有效期爲5分鐘。當5分鐘時間事後，緩存失效，這個時候Memcached並不會自動去檢查當前的Item是否過時。當客戶端再次來請求這個數據的時候，纔會去檢查緩存是否失效了，若是失效則會去清除這個數據。this

看一下do_item_get這個方法中，判斷緩存數據是否失效的代碼：

 1 /** wrapper around assoc_find which does the lazy expiration logic */
 2 item *do_item_get(const char *key, const size_t nkey, const uint32_t hv) {
 3 //...code
 4     if (it != NULL) {
 5         //settings.oldest_live主要用來記錄flush命令執行的時間
 6         //it->time用來記錄item最近set/add/replce等操做的時間（get操做不會改變）
 7         //而後判斷it->time是否在執行flush命令以前，若是是執行flush以前，說明該item已經失效
 8         if (settings.oldest_live != 0 && settings.oldest_live <= current_time &&
 9             it->time <= settings.oldest_live) {
10             //LRU鏈表和HASHTABLE上解除綁定
11             do_item_unlink(it, hv);
12             //刪除該Item
13             do_item_remove(it);
14             it = NULL; //返回NULL
15             if (was_found) {
16                 fprintf(stderr, " -nuked by flush");
17             }
18         //檢查是否過時，主要是檢查有效期時間
19         //若是數據已通過期，則須要清除
20         } else if (it->exptime != 0 && it->exptime <= current_time) {
21             //LRU鏈表和HASHTABLE上解除綁定
22             do_item_unlink(it, hv);
23             //刪除該Item
24             do_item_remove(it);
25             it = NULL;
26             if (was_found) {
27                 fprintf(stderr, " -nuked by expire");
28             }
29         } else {
30             it->it_flags |= ITEM_FETCHED;
31             DEBUG_REFCNT(it, '+');
32         }
33     }
34 //...code
35 }

策略2 -- flush命令

當用戶發送一個flush命令的時候，Memcached會將命令以前的全部的緩存都設置爲失效。

Memcached不會主動去清除這些item。主要經過兩種方式：

1. do_item_flush_expired方法。

　　Memcached會在接受到flush命令的時候，將設置全局參數settings.oldest_live =current_time - 1。而後去調用item_flush_expired方法。由於設置全局參數item_flush_expired到調用緩存鎖方法之間會有必定的時間差，有可能這個過程當中，會有新的item在操做。

　　而後Memcached調用do_item_flush_expired方法，去遍歷全部的LRU鏈表。do_item_flush_expired不會將每個在flush命令前的Item刪除，由於這樣會很是耗時，而是刪除在設置全局變量到加上緩存鎖這之間操做的item。這樣就能加快flush的速度。

2. 惰性刪除方法。

　　Memcached會在get操做的時候去判斷it->time是否小於settings.oldest_live，若是小於，說明這個item就是過時的。經過這種方法，惰性刪除大批量的item數據。

 1 /*
 2  * Flushes expired items after a flush_all call
 3  */
 4 void item_flush_expired() {
 5     mutex_lock(&cache_lock);
 6     do_item_flush_expired();
 7     mutex_unlock(&cache_lock);
 8 }
 9 /* expires items that are more recent than the oldest_live setting. */
10 void do_item_flush_expired(void) {
11     int i;
12     item *iter, *next;
13     if (settings.oldest_live == 0)
14         return;
15     for (i = 0; i < LARGEST_ID; i++) {
16         /* The LRU is sorted in decreasing time order, and an item's timestamp
17          * is never newer than its last access time, so we only need to walk
18          * back until we hit an item older than the oldest_live time.
19          * The oldest_live checking will auto-expire the remaining items.
20          */
21         for (iter = heads[i]; iter != NULL; iter = next) {
22             /* iter->time of 0 are magic objects. */
23             //iter->time 最近一次的訪問時間
24             //這邊爲什麼是iter->time >= settings.oldest_live？
25             //由於在執行do_item_flush_expired方法前，已經上了cache鎖，其它worker是不能操做的
26             //這邊過程當中，若是遍歷每個Item都去刪除，那麼這個遍歷過程會很是緩慢，會致使客戶端一直等待。
27             //
28             //Memcached就想出了一個聰明的辦法，從設置settings.oldest_live到上鎖之間，仍是會有其它客戶端
29             //操做item數據，那麼Memcache就將這一部分數據先清理（這部分數據很是少許），這樣就能加快flush的速度
30             //而剩餘iter->time < settings.oldest_live的那大批量的item，會經過惰性刪除的方式，在get請求中去判斷處理
31             if (iter->time != 0 && iter->time >= settings.oldest_live) {
32                 next = iter->next;
33                 if ((iter->it_flags & ITEM_SLABBED) == 0) {
34                     do_item_unlink_nolock(iter, hash(ITEM_key(iter), iter->nkey));
35                 }
36             } else {
37                 /* We've hit the first old item. Continue to the next queue. */
38                 break;
39             }
40         }
41     }
42 }

策略3 - -分配Item的時候去檢查

  1 //建立一個新的Item
  2 item *do_item_alloc(char *key, const size_t nkey, const int flags,
  3                     const rel_time_t exptime, const int nbytes,
  4                     const uint32_t cur_hv) {
  5     uint8_t nsuffix;
  6     item *it = NULL; //item結構
  7     char suffix[40];
  8     //item_make_header 計算存儲數據的總長度
  9     size_t ntotal = item_make_header(nkey + 1, flags, nbytes, suffix, &nsuffix);
 10     if (settings.use_cas) {
 11         ntotal += sizeof(uint64_t);
 12     }
 13  
 14     //經過ntotal 查詢在哪一個slabs_class上面
 15     //Memcached會根據存儲數據長度的不一樣，分爲N多個slabs_class
 16     //用戶存儲數據的時候，根據須要存儲數據的長度，就能夠查詢到須要存儲到哪一個slabs_class中。
 17     //每一個slabs_class都由諾幹個slabs組成，slabs每一個大小爲1M，咱們的item結構的數據就會被分配在slabs上
 18     //每一個slabs都會根據本身slabs_class存儲的數據塊的大小，會被分割爲諾幹個chunk
 19     //
 20     //舉個例子：
 21     //若是id=1的slabs_class爲存儲 最大爲224個字節的緩存數據
 22     //當用戶的設置的緩存數據總數據長度爲200個字節，則這個item結構就會存儲到id=1的slabs_class上。
 23     //當第一次或者slabs_class中的slabs不夠用的時候，slabs_class就會去分配一個1M的slabs給存儲item使用
 24     //由於id=1的slabs_class存儲小於224個字節的數據，因此slabs會被分割爲諾幹個大小爲224字節的chunk塊
 25     //咱們的item結構數據，就會存儲在這個chunk塊上面
 26     unsigned int id = slabs_clsid(ntotal);
 27     if (id == 0)
 28         return 0;
 29  
 30     mutex_lock(&cache_lock);
 31     /* do a quick check if we have any expired items in the tail.. */
 32     int tries = 5;
 33     /* Avoid hangs if a slab has nothing but refcounted stuff in it. */
 34     int tries_lrutail_reflocked = 1000;
 35     int tried_alloc = 0;
 36     item *search;
 37     item *next_it;
 38     void *hold_lock = NULL;
 39     rel_time_t oldest_live = settings.oldest_live;
 40  
 41     //這邊就能夠獲得slabs_class上第一個item的地址
 42     //item數據結構經過item->next和item->prev 來記錄鏈表結構
 43     //這邊是尋找LRU 鏈表的尾部地址
 44     search = tails[id];
 45  
 46     /* We walk up *only* for locked items. Never searching for expired.
 47      * Waste of CPU for almost all deployments */
 48     //tries = 5 這邊只嘗試5次循環搜索
 49     //search = tails[id] 搜索從LRU鏈表 的尾部開始
 50     for (; tries > 0 && search != NULL; tries--, search=next_it) {
 51         /* we might relink search mid-loop, so search->prev isn't reliable */
 52         next_it = search->prev;
 53         if (search->nbytes == 0 && search->nkey == 0 && search->it_flags == 1) {
 54             /* We are a crawler, ignore it. */
 55             tries++;
 56             continue;
 57         }
 58         uint32_t hv = hash(ITEM_key(search), search->nkey);
 59         /* Attempt to hash item lock the "search" item. If locked, no
 60          * other callers can incr the refcount
 61          */
 62         /* Don't accidentally grab ourselves, or bail if we can't quicklock */
 63         if (hv == cur_hv || (hold_lock = item_trylock(hv)) == NULL)
 64             continue;
 65         /* Now see if the item is refcount locked */
 66  
 67         //通常狀況下search->refcount爲1，若是增長了refcount以後，不等於2，說明item被其它的worker線程鎖定
 68         //refcount往上加1，是鎖定當前的item，若是不等於2，說明鎖定失敗
 69         if (refcount_incr(&search->refcount) != 2) {
 70             /* Avoid pathological case with ref'ed items in tail */
 71             do_item_update_nolock(search);
 72             tries_lrutail_reflocked--;
 73             tries++; //try的次數+1
 74             refcount_decr(&search->refcount); //減去1
 75             itemstats[id].lrutail_reflocked++;
 76             /* Old rare bug could cause a refcount leak. We haven't seen
 77              * it in years, but we leave this code in to prevent failures
 78              * just in case */
 79             if (settings.tail_repair_time &&
 80                     search->time + settings.tail_repair_time < current_time) {
 81                 itemstats[id].tailrepairs++;
 82                 search->refcount = 1;
 83                 do_item_unlink_nolock(search, hv);
 84             }
 85             if (hold_lock)
 86                 item_trylock_unlock(hold_lock);
 87  
 88             if (tries_lrutail_reflocked < 1)
 89                 break;
 90  
 91             continue;
 92         }
 93  
 94         /* Expired or flushed */
 95         //這邊判斷尾部的Item是否失效，若是已經失效了的話，將當前的失效的item分配給最新的緩存
 96         if ((search->exptime != 0 && search->exptime < current_time)
 97             || (search->time <= oldest_live && oldest_live <= current_time)) {
 98             itemstats[id].reclaimed++;
 99             if ((search->it_flags & ITEM_FETCHED) == 0) {
100                 itemstats[id].expired_unfetched++;
101             }
102             it = search;
103             slabs_adjust_mem_requested(it->slabs_clsid, ITEM_ntotal(it), ntotal);
104             do_item_unlink_nolock(it, hv);
105             /* Iniialize the item block: */
106             it->slabs_clsid = 0;
107  
108         //slabs_alloc方法是去分配一個新的內存塊
109         } else if ((it = slabs_alloc(ntotal, id)) == NULL) {
110             tried_alloc = 1;
111             //若是設置了不容許LRU淘汰，則返回ERROR
112             if (settings.evict_to_free == 0) {
113                 itemstats[id].outofmemory++;
114             } else {
115                 //這邊設置了LRU淘汰
116                 //若是分配失敗，則從LRU鏈表尾部，淘汰一個item
117                 //若是這個item設置了有效期爲0，也會被淘汰
118                 itemstats[id].evicted++;
119                 itemstats[id].evicted_time = current_time - search->time;
120                 if (search->exptime != 0)
121                     itemstats[id].evicted_nonzero++;
122                 if ((search->it_flags & ITEM_FETCHED) == 0) {
123                     itemstats[id].evicted_unfetched++;
124                 }
125                 //這邊直接將LRU尾部的ITEM淘汰，而且給了最新的ITEM使用
126                 it = search;
127                 //從新計算一下這個slabclass_t分配出去的內存大小
128                 //直接霸佔被淘汰的item就須要從新計算
129                 slabs_adjust_mem_requested(it->slabs_clsid, ITEM_ntotal(it), ntotal);
130                 //從哈希表和lru鏈表中刪除
131                 //it->refcount的值爲2，因此item不會被刪除，只是HashTable和LRU上的連接關係
132                 do_item_unlink_nolock(it, hv);
133                 /* Initialize the item block: */
134                 it->slabs_clsid = 0;
135  
136                 /* If we've just evicted an item, and the automover is set to
137                  * angry bird mode, attempt to rip memory into this slab class.
138                  * TODO: Move valid object detection into a function, and on a
139                  * "successful" memory pull, look behind and see if the next alloc
140                  * would be an eviction. Then kick off the slab mover before the
141                  * eviction happens.
142                  */
143                 if (settings.slab_automove == 2)
144                     slabs_reassign(-1, id);
145             }
146         }
147  
148         //解除引用鎖定
149         refcount_decr(&search->refcount);
150         /* If hash values were equal, we don't grab a second lock */
151         if (hold_lock)
152             item_trylock_unlock(hold_lock);
153         break;
154     }
155  
156     /* 若是分配了5次，結果LRU鏈表尾部的item都是被鎖定的，則從新分配一個item */
157     if (!tried_alloc && (tries == 0 || search == NULL))
158         it = slabs_alloc(ntotal, id);
159  
160     if (it == NULL) {
161         itemstats[id].outofmemory++;
162         mutex_unlock(&cache_lock);
163         return NULL;
164     }
165  
166     assert(it->slabs_clsid == 0);
167     assert(it != heads[id]);
168  
169     /* Item initialization can happen outside of the lock; the item's already
170      * been removed from the slab LRU.
171      */
172     it->refcount = 1; //引用的次數 又設置爲1   /* the caller will have a reference */
173     mutex_unlock(&cache_lock);
174     it->next = it->prev = it->h_next = 0;
175     it->slabs_clsid = id;
176  
177     DEBUG_REFCNT(it, '*');
178     it->it_flags = settings.use_cas ? ITEM_CAS : 0;
179     it->nkey = nkey;
180     it->nbytes = nbytes;
181     //這邊是內存拷貝，拷貝到item結構地址的內存塊上
182     memcpy(ITEM_key(it), key, nkey);
183     it->exptime = exptime;
184     //這邊也是內存拷貝
185     memcpy(ITEM_suffix(it), suffix, (size_t)nsuffix);
186     it->nsuffix = nsuffix;
187     return it;
188 }

策略4 - -LRU爬蟲

Memcached會開一個單獨的線程對失效的緩存數據進行處理。

 1 //LRU爬蟲
 2 static void *item_crawler_thread(void *arg) {
 3     int i;
 4  
 5     pthread_mutex_lock(&lru_crawler_lock);
 6     if (settings.verbose > 2)
 7         fprintf(stderr, "Starting LRU crawler background thread\n");
 8     while (do_run_lru_crawler_thread) {
 9     pthread_cond_wait(&lru_crawler_cond, &lru_crawler_lock);
10  
11     while (crawler_count) {
12         item *search = NULL;
13         void *hold_lock = NULL;
14  
15         for (i = 0; i < LARGEST_ID; i++) {
16             if (crawlers[i].it_flags != 1) {
17                 continue;
18             }
19             pthread_mutex_lock(&cache_lock);
20             search = crawler_crawl_q((item *)&crawlers[i]);
21             if (search == NULL ||
22                 (crawlers[i].remaining && --crawlers[i].remaining < 1)) {
23                 if (settings.verbose > 2)
24                     fprintf(stderr, "Nothing left to crawl for %d\n", i);
25                 crawlers[i].it_flags = 0;
26                 crawler_count--;
27                 crawler_unlink_q((item *)&crawlers[i]);
28                 pthread_mutex_unlock(&cache_lock);
29                 continue;
30             }
31             uint32_t hv = hash(ITEM_key(search), search->nkey);
32             /* Attempt to hash item lock the "search" item. If locked, no
33              * other callers can incr the refcount
34              */
35             if ((hold_lock = item_trylock(hv)) == NULL) {
36                 pthread_mutex_unlock(&cache_lock);
37                 continue;
38             }
39             /* Now see if the item is refcount locked */
40             if (refcount_incr(&search->refcount) != 2) {
41                 refcount_decr(&search->refcount);
42                 if (hold_lock)
43                     item_trylock_unlock(hold_lock);
44                 pthread_mutex_unlock(&cache_lock);
45                 continue;
46             }
47  
48             /* Frees the item or decrements the refcount. */
49             /* Interface for this could improve: do the free/decr here
50              * instead? */
51             item_crawler_evaluate(search, hv, i);
52  
53             if (hold_lock)
54                 item_trylock_unlock(hold_lock);
55             pthread_mutex_unlock(&cache_lock);
56  
57             if (settings.lru_crawler_sleep)
58                 usleep(settings.lru_crawler_sleep);
59         }
60     }
61     if (settings.verbose > 2)
62         fprintf(stderr, "LRU crawler thread sleeping\n");
63     STATS_LOCK();
64     stats.lru_crawler_running = false;
65     STATS_UNLOCK();
66     }
67     pthread_mutex_unlock(&lru_crawler_lock);
68     if (settings.verbose > 2)
69         fprintf(stderr, "LRU crawler thread stopping\n");
70  
71     return NULL;
72 }
73  
74  
75 int start_item_crawler_thread(void) {
76     int ret;
77  
78     if (settings.lru_crawler)
79         return -1;
80     pthread_mutex_lock(&lru_crawler_lock);
81     do_run_lru_crawler_thread = 1;
82     settings.lru_crawler = true;
83     if ((ret = pthread_create(&item_crawler_tid, NULL,
84         item_crawler_thread, NULL)) != 0) {
85         fprintf(stderr, "Can't create LRU crawler thread: %s\n",
86             strerror(ret));
87         pthread_mutex_unlock(&lru_crawler_lock);
88         return -1;
89     }
90     pthread_mutex_unlock(&lru_crawler_lock);
91  
92     return 0;
93 }

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。