從SpringBoot構建十萬博文聊聊緩存穿透

時間 2019-11-22

標籤 springboot 構建十萬博文聊聊緩存穿透欄目 Spring 简体版

原文原文鏈接

前言

在博客系統中，爲了提高響應速度，加入了 Redis 緩存，把文章主鍵 ID 做爲 key 值去緩存查詢，若是不存在對應的 value，就去數據庫中查找。這個時候，若是請求的併發量很大，就會對後端的數據庫服務形成很大的壓力。html

形成緣由

業務自身代碼或數據出現問題
惡意攻擊、爬蟲形成大量空的命中，會對數據庫形成很大壓力

博客架構

案例分析

因爲文章的地址是這樣子的：java

https://blog.52itstyle.top/49.html

你們很容易猜出，是否是還有 50、5一、52 甚至是十萬+？若是是正兒八經的爬蟲，可能會讀取你的總頁數。可是有些不正經的爬蟲或者人，還真覺得你有十萬+博文，而後就寫了這麼一個腳本。git

for num in range(1,1000000):
   //爬死你，開100個線程

解決方案

設置布隆過濾器，預先將全部文章的主鍵 ID 哈希到一個足夠大的 BitMap 中，每次請求都會通過 BitMap 的攔截，若是 Key 不存在，直接返回異常。這樣就避免了對 Redis 緩存以及底層數據庫的查詢壓力。spring

這裏咱們使用谷歌開源的第三方工具類來實現：數據庫

<dependency>
      <groupId>com.google.guava</groupId>
      <artifactId>guava</artifactId>
      <version>25.1-jre</version>
</dependency>

編寫布隆過濾器：後端

/**
 * 布隆緩存過濾器
 */
@Component
public class BloomCacheFilter {

    public static BloomFilter<Integer> bloomFilter = null;

    @Autowired
    private DynamicQuery dynamicQuery;
    /**
     * 初始化
     */
    @PostConstruct
    public void init(){
        String nativeSql = "SELECT id FROM blog";
        List<Object> list = dynamicQuery.query(nativeSql,new Object[]{});
        bloomFilter = BloomFilter.create(Funnels.integerFunnel(), list.size());
        list.forEach(blog ->bloomFilter.put(Integer.parseInt(blog.toString())));
    }
    /**
     * 判斷key是否存在
     * @param key
     * @return
     */
    public static boolean mightContain(long key){
        return bloomFilter.mightContain((int)key);
    }
}

而後，每一次查詢以前作一次 Key 值校驗：緩存

/**
 * 博文
 */
@RequestMapping("{id}.shtml")
public String page(@PathVariable("id") Long id, ModelMap model) {
     if(BloomCacheFilter.mightContain(id)){
         Blog blog = blogService.getById(id);
         model.addAttribute("blog",blog);
         return  "article";
     }else{
         return  "error";
     }
}

效率

那麼，在數據量很大的狀況下，效率如何呢？咱們來作個實驗，以 100W 爲基數。架構

public static void main(String[] args) {
        int capacity = 1000000;
        int key = 6666;
        BloomFilter<Integer> bloomFilter = BloomFilter.create(Funnels.integerFunnel(), capacity);
        for (int i = 0; i < capacity; i++) {
            bloomFilter.put(i);
        }
        /**返回計算機最精確的時間，單位納妙 */
        long start = System.nanoTime();
        if (bloomFilter.mightContain(key)) {
            System.out.println("成功過濾到" + key);
        }
        long end = System.nanoTime();
        System.out.println("布隆過濾器消耗時間:" + (end - start));
}

布隆過濾器消耗時間:281299，約等於 0.28 毫秒，匹配速度是否是很快？併發

錯判率

萬事萬物都有所均衡，既然效率如此之高，確定其它方面定有所犧牲，經過測試咱們發現，過濾器有 3% 的錯判率，也就是說，原本沒有的文章，有可能經過校驗被訪問到，而後報錯！app

public static void main(String[] args) {
        int capacity = 1000000;
        BloomFilter<Integer> bloomFilter = BloomFilter.create(Funnels.integerFunnel(), capacity);
        for (int i = 0; i < capacity; i++) {
            bloomFilter.put(i);
        }
        int sum = 0;
        for (int i = capacity + 20000; i < capacity + 30000; i++) {
            if (bloomFilter.mightContain(i)) {
                sum ++;
            }
        }
        //0.03
        DecimalFormat df=new DecimalFormat("0.00");//設置保留位數
        System.out.println("錯判率爲:" + df.format((float)sum/10000));
}

經過源碼閱讀，發現 3% 的錯判率是系統寫死的。

public static <T> BloomFilter<T> create(Funnel<? super T> funnel, long expectedInsertions) {
        return create(funnel, expectedInsertions, 0.03D);
}

固然咱們也能夠經過傳參，下降錯判率。測試了一下，查詢速度稍微有一丟丟下降，但也只是零點幾毫秒級的而已。

BloomFilter<Integer> bloomFilter = BloomFilter.create(Funnels.integerFunnel(), capacity,0.01);

那麼如何作到零錯判率呢？答案是不可能的，布隆過濾器，錯判率必須大於零。~~爲了保證文章 100% 的訪問率~~，正常狀況下，咱們能夠關閉布隆校驗，只有才突發狀況下開啓。好比，能夠經過阿里的動態參數配置 Nacos 實現。

@NacosValue(value = "${bloomCache:false}", autoRefreshed = true)
private boolean bloomCache;
//省略部分代碼
if(bloomCache||BloomCacheFilter.mightContain(id)){
     Blog blog = blogService.getById(id);
     model.addAttribute("blog",blog);
     return  "article";
}else{
     return  "error";
}