python 緩存

時間 2019-11-07

標籤 python 緩存欄目 Python 简体版

原文原文鏈接

緩存的目的

緩存是一種將定量數據加以保存以備迎合後續請求的處理方式，旨在加快數據的檢索速度。html

簡單實現本身的一個緩存類

import datetime
import pprint
import random

class MyCache(object):
    def __init__(self):
        self.cache = {}
        self.max_cache_size = 10

    def __contains__(self, key):
        """
        判斷鍵是否存在於緩存中
        實現這個魔法方法 是爲了在實例化以後檢查 key 是否在緩存實例中
        :param key:
        :return:
        """
        return key in self.cache

    def update(self, key, value):
        """
        更新緩存字典 而且選擇性刪除最先的條目
        :param key:
        :param value:
        :return:
        """
        if key not in self.cache and len(self.cache) >= self.max_cache_size:
            self.remove_oldest()
        self.cache[key] = {"date_accessed": datetime.datetime.now(), "value": value}

    def remove_oldest(self):
        """
        刪除最先訪問時間的輸入數據
        :return:
        """
        oldest_entry = None
        for key in self.cache:
            if not oldest_entry:
                oldest_entry = key
            elif self.cache[key]["date_accessed"] < self.cache[oldest_entry]['date_accessed']:
                oldest_entry = key
        self.cache.pop(oldest_entry)

    @property
    def size(self):
        """
        緩存容量
        :return:
        """
        return len(self.cache)
複製代碼

contains, 雖然在這裏並不必定要使用該方法，但其基本思路在於容許咱們檢查該類實例，從而瞭解其中是否包含有咱們正在尋找的鍵。
另外，update方法負責利用新的鍵/值對進行緩存字典更新。一旦達到或者超出緩存最大容量，其還會刪除日期最先的輸入數據。
另外，remove_oldest方法負責具體的字典內早期數據刪除工做。
最後，咱們還引入了名爲size的屬性，其可以返回緩存的具體容量。

在運行這段代碼以後，你們會注意到當緩存被佔滿時，其會刪除時間最先的條目。不過以上示例代碼並無提到如何更新訪問日期，即訪問某一條數據的時候將時間設置爲最新。python

進行測試：git

if __name__ == "__main__":
    keys = ["test", "red", "fox", "fence", "junk",
            "other", "alpha", "bravo", "cal", "devo",
            "ele"]

    s = "abcdefghijklmnop"
    cache = MyCache()
    for i, key in enumerate(keys):
        if key in cache:
            continue
        else:
            value = "".join(random.choice(s) for j in range(20))
            cache.update(key, value)
        print(f"{i+1}s iterations, {cache.size} cached entries")
        print()
    print(pprint.pformat(cache.cache))
    print("test" in cache)   # __contains__ 實現的效果 
    print("cal" in cache)
複製代碼

使用 lru_cache 裝飾器

import time
import urllib.error
import urllib.request
from functools import lru_cache

@lru_cache(maxsize=24)
def get_webpage(module):
    """
    獲取特定Python模塊網絡頁面
    """
    webpage = "https://docs.python.org/3/library/{}.html".format(module)
    try:
        with urllib.request.urlopen(webpage) as request:
            return request.read()
    except urllib.error.HTTPError:
        return None


if __name__ == '__main__':
    t1 = time.time()
    modules = ['functools', 'collections', 'os', 'sys']
    for module in modules:
        page = get_webpage(module)
        if page:
            print("{} module page found".format(module))
    t2 = time.time()
    for m in modules:
        page = get_webpage(m)
        if page:
            print(f"{m} get again ...")
    t3 = time.time()

    print(t2-t1)
    print(t3-t2)
    print((t2-t1) / (t3-t2))
複製代碼

咱們利用lru_cache對get_webpage函數進行了裝飾，並將其最大尺寸設置爲24條調用。在此以後，咱們設置了一條網頁字符串變量，並將其傳遞至咱們但願函數獲取的模塊當中。如此一來，咱們就可以針對該函數運行屢次循環。能夠看到在首次運行上述代碼時，輸出結果的顯示速度相對比較慢。但若是你們在同一會話中再次加以運行，那麼其顯示速度將極大加快——這意味着lru_cache已經正確對該調用進行了緩存處理。github

另外，咱們還能夠將一條typed參數傳遞至該裝飾器。其屬於一條Boolean，旨在通知該裝飾器在typed爲設定爲True時對不一樣類型參數進行分別緩存。web

使用 cachetools 模塊

代碼來源： www.thepythoncorner.com/2018/04/how…正則表達式

原文講了如何使用緩存來加速你的 python 程序，舉出如下兩個例子：在未使用緩存時：redis

import time
import datetime


def get_candy_price(candy_id):
    # let's use a sleep to simulate the time your function spends trying to connect to
    # the web service, 5 seconds will be enough.
    time.sleep(5)

    # let's pretend that the price returned by the web service is $1 for candies with a
    # odd candy_id and $1,5 for candies with a even candy_id

    price = 1.5 if candy_id % 2 == 0 else 1

    return (datetime.datetime.now().strftime("%c"), price)


# now, let's simulate 20 customers in your show.
# They are asking for candy with id 2 and candy with id 3...
for i in range(0, 20):
    print(get_candy_price(2))
    print(get_candy_price(3))
複製代碼

在適應了緩存以後：數據庫

import time
import datetime

from cachetools import cached, TTLCache  # 1 - let's import the "cached" decorator and the "TTLCache" object from cachetools
cache = TTLCache(maxsize=100, ttl=300)  # 2 - let's create the cache object.


@cached(cache)  # 3 - it's time to decorate the method to use our cache system!
def get_candy_price(candy_id):
    # let's use a sleep to simulate the time your function spends trying to connect to
    # the web service, 5 seconds will be enough.
    time.sleep(5)

    # let's pretend that the price returned by the web service is $1 for candies with a
    # odd candy_id and $1,5 for candies with a even candy_id

    price = 1.5 if candy_id % 2 == 0 else 1

    return (datetime.datetime.now().strftime("%c"), price)


# now, let's simulate 20 customers in your show.
# They are asking for candy with id 2 and candy with id 3...
for i in range(0, 20):
    print(get_candy_price(2))
    print(get_candy_price(3))
複製代碼

這裏再也不展現運行結果，能夠自行 copy 運行。後端

多級緩存

以上緩存的思路大同小異，可是並不能解決個人問題。我想按照多個條件去設置和緩存。相似於將緩存當作一個簡易的數據庫去查詢，而不單單是簡單的鍵值對的形式。找到了一個 cacheout 模塊，嘗試去實現本身想要的功能。緩存

cacheout 使用

連接

github.com/dgilland/ca… cacheout.readthedocs.io/en/latest/m…

簡介

這是一個 python 緩存庫。

特色

In-memory caching using dictionary backend
Cache manager for easily accessing multiple cache objects
Reconfigurable cache settings for runtime setup when using module-level cache objects
Maximum cache size enforcement
Default cache TTL (time-to-live) as well as custom TTLs per cache entry
Bulk set, get, and delete operations
Bulk get and delete operations filtered by string, regex, or function
Memoization decorators
Thread safe
Multiple cache implementations:
- FIFO (First In, First Out)
- LIFO (Last In, First Out)
- LRU (Least Recently Used)
- MRU (Most Recently Used)
- LFU (Least Frequently Used)
- RR (Random Replacement)

簡單翻譯下：

使用字典後端的內存緩存
緩存管理器，用於輕鬆訪問多個緩存對象
使用模塊級緩存對象時，運行時設置的可從新配置緩存設置
最大緩存大小實施
默認緩存TTL（生存時間）以及每一個緩存條目的自定義TTL
批量設置，獲取和刪除操做
批量獲取和刪除由字符串，正則表達式或函數過濾的操做
記憶裝飾
線程安全
多個緩存實現：
- FIFO（先進先出）
- LIFO（後進先出）
- LRU（最近最少使用）
- MRU（最近使用）
- LFU（最不經常使用）
- RR（隨機替換）

路線圖

Roadmap

Layered caching (multi-level caching)
Cache event listener support (e.g. on-get, on-set, on-delete)
Cache statistics (e.g. cache hits/misses, cache frequency, etc)

路線圖

分層緩存（多級緩存）
緩存事件監聽器支持（例如on-get，on-set，on-delete）
緩存統計信息（例如緩存命中/未命中，緩存頻率等）

安裝

pip install cacheout
複製代碼

依賴

Python >= 3.4
複製代碼

簡單使用

建立一個緩存對象：

# start with some basic caching by creating a cache object:
from cacheout import Cache
cache = Cache()
複製代碼

默認有 256 的緩存個數以及不設置過時時間： cache = Cache() 等價於：

# By default the cache object will have a maximum size of 256 and default TTL expiration turned off. These values can be set with:
cache = Cache(maxsize=256, ttl=0, timer=time.time, default=None)  # defaults
複製代碼

設置值：

# Set a cache key using cache.set():
cache.set(1, 'foobar')
複製代碼

獲取值：

# Get the value of a cache key with cache.get():
assert cache.get(1) == 'foobar'
複製代碼

設置一個在沒有獲取到值的時候拿到的默認值：

# Get a default value when cache key isn't set:
assertcache.get(2) is None
assert cache.get(2, default=False) is False
assert 2 not in cache
複製代碼

可是這個值並無被設置進入緩存。

設置一個全局的默認值：

# Provide a global default:
cache2 = Cache(default=True)
assert cache2.get('missing') is True
assert 'missing' not in cache2

cache3 = Cache(default=lambda key: key)
assert cache3.get('missing') == 'missing'
# missing 被設置進入緩存
assert 'missing' in cache3
複製代碼

設置緩存的過時時間：

# Set the TTL (time-to-live) expiration per entry:
cache.set(3, {'data': {}}, ttl=1)
assert cache.get(3) == {'data': {}}
time.sleep(1)
assert cache.get(3) is None
複製代碼

緩存函數的結果：

# Memoize a function where cache keys are generated from the called function parameters:
@cache.memoize()
def func(a, b):
    return a + b 

# Provide a TTL for the memoized function and incorporate argument types into generated cache keys:
@cache.memoize(ttl=5, typed=True)
def func(a, b):
    print("--- into --- func ---")
    return a + b

# func(1, 2) has different cache key than func(1.0, 2.0), whereas,
# with "typed=False" (the default), they would have the same key

print(func(1, 2))
print(func(1, 2))
print(func.uncached(1, 2))  # 訪問原始的memoized功能
print(func(1, 2))
複製代碼

獲取一份緩存的拷貝

# Get a copy of the entire cache with cache.copy():
assert cache.copy() == {1: 'foobar', 2: ('foo', 'bar', 'baz')}
複製代碼

刪除緩存中的某個值

# Delete a cache key with cache.delete():
cache.delete(1)
assert cache.get(1) is None
複製代碼

清空整個緩存

# Clear the entire cache with cache.clear():
cache.clear()
assert len(cache) == 0
複製代碼

緩存的批量設置獲取以及刪除

# Perform bulk operations with cache.set_many(), cache.get_many(), and cache.delete_many():
cache.set_many({'a': 1, 'b': 2, 'c': 3})
assert cache.get_many(['a', 'b', 'c']) == {'a': 1, 'b': 2, 'c': 3}
cache.delete_many(['a', 'b', 'c'])
assert cache.count() == 0
複製代碼

批量獲取和刪除時的匹配問題

# Use complex filtering in cache.get_many() and cache.delete_many():

import re
cache.set_many({'a_1': 1, 'a_2': 2, '123': 3, 'b': 4})

cache.get_many('a_*') == {'a_1': 1, 'a_2': 2}
cache.get_many(re.compile(r'\d')) == {'123': 3}
cache.get_many(lambda key: '2' in key) == {'a_2': 2, '123': 3}

cache.delete_many('a_*')
assert dict(cache.items()) == {'123': 3, 'b': 4}
複製代碼

在建立以後從新配置緩存對象

# Reconfigure the cache object after creation with cache.configure():
cache.configure(maxsize=1000, ttl=5 * 60)
複製代碼

像字典同樣去獲取緩存的鍵值鍵值對

# Get keys, values, and items from the cache with cache.keys() cache.values(), and cache.items():

cache.set_many({'a': 1, 'b': 2, 'c': 3})
assert list(cache.keys()) == ['a', 'b', 'c']
assert list(cache.values()) == [1, 2, 3]
assert list(cache.items()) == [('a', 1), ('b', 2), ('c', 3)]
複製代碼

遍歷迭代緩存

# Iterate over cache keys:

for key in cache:
    print(key, cache.get(key))
    # 'a' 1
    # 'b' 2
    # 'c' 3
複製代碼

檢查被緩存的鍵是否存在

# Check if key exists with cache.has() and key in cache:
assert cache.has('a')
assert 'a' in cache
複製代碼

使用CacheManager管理多級緩存

from cacheout import CacheManager

cacheman = CacheManager({'a': {'maxsize': 100},
                         'b': {'maxsize': 200, 'ttl': 900},
                         'c': {})

cacheman['a'].set('key1', 'value1')
value = cacheman['a'].get('key')

cacheman['b'].set('key2', 'value2')
assert cacheman['b'].maxsize == 200
assert cacheman['b'].ttl == 900

cacheman['c'].set('key3', 'value3')

cacheman.clear_all()
for name, cache in cacheman:
    assert name in cacheman
    assert len(cache) == 0
複製代碼

其中，最後講到的多級緩存應該能夠解決本身的問題，如圖，若是個人接口存在股票類型和時間兩個自變量，就能夠將股票類型設置在一級緩存裏面，將時間設置爲二級緩存：