LRU算法原理解析

時間 2019-11-06

標籤 lru 算法原理解析简体版

原文原文鏈接

LRU是Least Recently Used的縮寫，即最近最少使用，經常使用於頁面置換算法，是爲虛擬頁式存儲管理服務的。javascript

現代操做系統提供了一種對主存的抽象概念虛擬內存，來對主存進行更好地管理。他將主存當作是一個存儲在磁盤上的地址空間的高速緩存，在主存中只保存活動區域，並根據須要在主存和磁盤之間來回傳送數據。虛擬內存被組織爲存放在磁盤上的N個連續的字節組成的數組，每一個字節都有惟一的虛擬地址，做爲到數組的索引。虛擬內存被分割爲大小固定的數據塊虛擬頁(Virtual Page,VP)，這些數據塊做爲主存和磁盤之間的傳輸單元。相似地，物理內存被分割爲物理頁(Physical Page,PP)。前端

虛擬內存使用頁表來記錄和判斷一個虛擬頁是否緩存在物理內存中：java

如上圖所示，當CPU訪問虛擬頁VP3時，發現VP3並未緩存在物理內存之中，這稱之爲缺頁，如今須要將VP3從磁盤複製到物理內存中，但在此以前，爲了保持原有空間的大小，須要在物理內存中選擇一個犧牲頁，將其複製到磁盤中，這稱之爲交換或者頁面調度，圖中的犧牲頁爲VP4。把哪一個頁面調出去能夠達到調動儘可能少的目的？最好是每次調換出的頁面是全部內存頁面中最遲將被使用的——這能夠最大限度的推遲頁面調換，這種算法，被稱爲理想頁面置換算法，但這種算法很難完美達到。node

爲了儘可能減小與理想算法的差距，產生了各類精妙的算法，LRU算法即是其中一個。python

LRU原理

LRU 算法的設計原則是：若是一個數據在最近一段時間沒有被訪問到，那麼在未來它被訪問的可能性也很小。也就是說，當限定的空間已存滿數據時，應當把最久沒有被訪問到的數據淘汰。git

根據LRU原理和Redis實現所示，假定系統爲某進程分配了3個物理塊，進程運行時的頁面走向爲 7 0 1 2 0 3 0 4，開始時3個物理塊均爲空，那麼LRU算法是以下工做的：github

基於哈希表和雙向鏈表的LRU算法實現

若是要本身實現一個LRU算法，能夠用哈希表加雙向鏈表實現：面試

設計思路是，使用哈希表存儲 key，值爲鏈表中的節點，節點中存儲值，雙向鏈表來記錄節點的順序，頭部爲最近訪問節點。算法

LRU算法中有兩種基本操做：後端

get(key)：查詢key對應的節點，若是key存在，將節點移動至鏈表頭部。
set(key, value)：設置key對應的節點的值。若是key不存在，則新建節點，置於鏈表開頭。若是鏈表長度超標，則將處於尾部的最後一個節點去掉。若是節點存在，更新節點的值，同時將節點置於鏈表頭部。

LRU緩存機制

leetcode上有一道關於LRU緩存機制的題目：

運用你所掌握的數據結構，設計和實現一個 LRU (最近最少使用) 緩存機制。它應該支持如下操做：獲取數據 get 和寫入數據 put 。

獲取數據 get(key) - 若是密鑰 (key) 存在於緩存中，則獲取密鑰的值（老是正數），不然返回 -1。寫入數據 put(key, value) - 若是密鑰不存在，則寫入其數據值。當緩存容量達到上限時，它應該在寫入新數據以前刪除最近最少使用的數據值，從而爲新的數據值留出空間。

進階:

你是否能夠在 O(1) 時間複雜度內完成這兩種操做？

示例:
LRUCache cache = new LRUCache( 2 /* 緩存容量 */ );

cache.put(1, 1);
cache.put(2, 2);
cache.get(1);       // 返回  1
cache.put(3, 3);    // 該操做會使得密鑰 2 做廢
cache.get(2);       // 返回 -1 (未找到)
cache.put(4, 4);    // 該操做會使得密鑰 1 做廢
cache.get(1);       // 返回 -1 (未找到)
cache.get(3);       // 返回  3
cache.get(4);       // 返回  4

咱們能夠本身實現雙向鏈表，也可使用現成的數據結構，python中的數據結構OrderedDict是一個有序哈希表，能夠記住加入哈希表的鍵的順序，至關於同時實現了哈希表與雙向鏈表。OrderedDict是將最新數據放置於末尾的:

In [35]: from collections import OrderedDict

In [36]: lru = OrderedDict()

In [37]: lru[1] = 1

In [38]: lru[2] = 2

In [39]: lru
Out[39]: OrderedDict([(1, 1), (2, 2)])

In [40]: lru.popitem()
Out[40]: (2, 2)

OrderedDict有兩個重要方法：

popitem(last=True): 返回一個鍵值對，當last=True時，按照LIFO的順序，不然按照FIFO的順序。
move_to_end(key, last=True): 將現有 key 移動到有序字典的任一端。若是 last 爲True（默認）則將元素移至末尾；若是 last 爲False則將元素移至開頭。

刪除數據時，可使用popitem(last=False)將開頭最近未訪問的鍵值對刪除。訪問或者設置數據時，使用move_to_end(key, last=True)將鍵值對移動至末尾。

代碼實現：

from collections import OrderedDict


class LRUCache:
    def __init__(self, capacity: int):
        self.lru = OrderedDict()
        self.capacity = capacity
        
    def get(self, key: int) -> int:
        self._update(key)
        return self.lru.get(key, -1)
        
    def put(self, key: int, value: int) -> None:
        self._update(key)
        self.lru[key] = value
        if len(self.lru) > self.capacity:
            self.lru.popitem(False)
         
    def _update(self, key: int):
        if key in self.lru:
            self.lru.move_to_end(key)

OrderedDict源碼分析

OrderedDict其實也是用哈希表與雙向鏈表實現的：

class OrderedDict(dict):
    'Dictionary that remembers insertion order'
    # An inherited dict maps keys to values.
    # The inherited dict provides __getitem__, __len__, __contains__, and get.
    # The remaining methods are order-aware.
    # Big-O running times for all methods are the same as regular dictionaries.

    # The internal self.__map dict maps keys to links in a doubly linked list.
    # The circular doubly linked list starts and ends with a sentinel element.
    # The sentinel element never gets deleted (this simplifies the algorithm).
    # The sentinel is in self.__hardroot with a weakref proxy in self.__root.
    # The prev links are weakref proxies (to prevent circular references).
    # Individual links are kept alive by the hard reference in self.__map.
    # Those hard references disappear when a key is deleted from an OrderedDict.

    def __init__(*args, **kwds):
        '''Initialize an ordered dictionary.  The signature is the same as
        regular dictionaries.  Keyword argument order is preserved.
        '''
        if not args:
            raise TypeError("descriptor '__init__' of 'OrderedDict' object "
                            "needs an argument")
        self, *args = args
        if len(args) > 1:
            raise TypeError('expected at most 1 arguments, got %d' % len(args))
        try:
            self.__root
        except AttributeError:
            self.__hardroot = _Link()
            self.__root = root = _proxy(self.__hardroot)
            root.prev = root.next = root
            self.__map = {}
        self.__update(*args, **kwds)

    def __setitem__(self, key, value,
                    dict_setitem=dict.__setitem__, proxy=_proxy, Link=_Link):
        'od.__setitem__(i, y) <==> od[i]=y'
        # Setting a new item creates a new link at the end of the linked list,
        # and the inherited dictionary is updated with the new key/value pair.
        if key not in self:
            self.__map[key] = link = Link()
            root = self.__root
            last = root.prev
            link.prev, link.next, link.key = last, root, key
            last.next = link
            root.prev = proxy(link)
        dict_setitem(self, key, value)

　由源碼看出，OrderedDict使用self.__map = {}做爲哈希表，其中保存了key與鏈表中的節點Link()的鍵值對，self.__map[key] = link = Link():

class _Link(object):
    __slots__ = 'prev', 'next', 'key', '__weakref__'

節點Link()中保存了指向前一個節點的指針prev，指向後一個節點的指針next以及key值。

並且，這裏的鏈表是一個環形雙向鏈表,OrderedDict使用一個哨兵元素root做爲鏈表的head與tail：

   self.__hardroot = _Link()
   self.__root = root = _proxy(self.__hardroot)
    root.prev = root.next = root

由__setitem__可知，向OrderedDict中添加新值時，鏈表變爲以下的環形結構：

         next             next             next
   root <----> new node1 <----> new node2 <----> root
         prev             prev             prev

root.next爲鏈表的第一個節點，root.prev爲鏈表的最後一個節點。

因爲OrderedDict繼承自dict，鍵值對是保存在OrderedDict自身中的，鏈表節點中只保存了key，並未保存value。

若是咱們要本身實現的話，無需如此複雜，能夠將value置於節點之中，鏈表只須要實現插入最前端與移除最後端節點的功能便可：

from _weakref import proxy as _proxy


class Node:
    __slots__ = ('prev', 'next', 'key', 'value', '__weakref__')


class LRUCache:

    def __init__(self, capacity: int):
        self.__hardroot = Node()
        self.__root = root = _proxy(self.__hardroot)
        root.prev = root.next = root
        self.__map = {}
        self.capacity = capacity
        
    def get(self, key: int) -> int:
        if key in self.__map:
            self.move_to_head(key)
            return self.__map[key].value
        else:
            return -1
         
    def put(self, key: int, value: int) -> None:
        if key in self.__map:
            node = self.__map[key]
            node.value = value
            self.move_to_head(key)
        else:
            node = Node()
            node.key = key
            node.value = value
            self.__map[key] = node
            self.add_head(node)
            if len(self.__map) > self.capacity:
                self.rm_tail()
        
    def move_to_head(self, key: int) -> None:
        if key in self.__map:
            node = self.__map[key]
            node.prev.next = node.next
            node.next.prev = node.prev
            head = self.__root.next
            self.__root.next = node
            node.prev = self.__root
            node.next = head
            head.prev = node
    
    def add_head(self, node: Node) -> None:
        head = self.__root.next
        self.__root.next = node
        node.prev = self.__root
        node.next = head
        head.prev = node
    
    def rm_tail(self) -> None:
        tail = self.__root.prev
        del self.__map[tail.key]
        tail.prev.next = self.__root
        self.__root.prev = tail.prev

node-lru-cache

在實際應用中，要實現LRU緩存算法，還要實現不少額外的功能。

有一個用javascript實現的很好的node-lru-cache包：

var LRU = require("lru-cache")
  , options = { max: 500
              , length: function (n, key) { return n * 2 + key.length }
              , dispose: function (key, n) { n.close() }
              , maxAge: 1000 * 60 * 60 }
  , cache = new LRU(options)
  , otherCache = new LRU(50) // sets just the max size

cache.set("key", "value")
cache.get("key") // "value"

這個包不是用緩存key的數量來判斷是否要啓動LRU淘汰算法，而是使用保存的鍵值對的實際大小來判斷。選項options中能夠設置緩存所佔空間的上限max，判斷鍵值對所佔空間的函數length，還能夠設置鍵值對的過時時間maxAge等，有興趣的能夠看下。