Solr之緩存篇

時間 2019-11-13

標籤 solr 緩存简体版

原文原文鏈接

Solr在Lucene之上開發了不少Cache功能，從目前提供的Cache類型有：數組

(1)filterCache 緩存

(2)documentCache app

(3)fieldvalueCache jsp

(4)queryresultCache ide

而每種Cache針對具體的查詢請求進行對應的Cache。本文將從幾個方面來闡述上述幾種Cache在Solr的運用，具體以下：函數

（1）Cache的生命週期性能

（2）Cache的使用場景優化

（3）Cache的配置介紹 ui

（4）Cache的命中監控 this

1 Cache生命週期

全部的Cache的生命週期由SolrIndexSearcher來管理，若是Cache對應的SolrIndexSearcher被從新構建都表明正在運行的Cache對象失效，而SolrIndexSearcher是否從新打開主要有幾個方面影響。

（1）增量數據更新後提交DirectUpdateHandler2.commit(CommitUpdateCommand cmd)，該方法代碼以下：

if (cmd.optimize) {
      optimizeCommands.incrementAndGet();
    } else {
      commitCommands.incrementAndGet();
      if (cmd.expungeDeletes) expungeDeleteCommands.incrementAndGet();
    }

    Future[] waitSearcher = null;
    if (cmd.waitSearcher) {//是否等待打開SolrIndexSearcher，通常新的Searcher會作一些預備工做，好比預熱Cache
      waitSearcher = new Future[1];
    }

    boolean error=true;
    iwCommit.lock();
    try {
      log.info("start "+cmd);

      if (cmd.optimize) {//是否優化索引，通常增量數據不優化
        openWriter();
        writer.optimize(cmd.maxOptimizeSegments);
      } else if (cmd.expungeDeletes) {
        openWriter();
        writer.expungeDeletes();//通常對於標記刪除的文檔進行物理刪除，固然優化也能將標記刪除的doc刪除，
//可是該方法會比優化快不少
 }

      closeWriter();//關閉增量打開的Writer對象

      callPostCommitCallbacks();
      if (cmd.optimize) {//若是有listener的話會執行這部分代碼
        callPostOptimizeCallbacks();
      }
      // open a new searcher in the sync block to avoid opening it
      // after a deleteByQuery changed the index, or in between deletes
      // and adds of another commit being done.
      core.getSearcher(true,false,waitSearcher);//該方法是從新打開Searcher的關鍵方法，
   //其中有重要參數來限定是否new open 或者reopen IndexReader.

      // reset commit tracking
      tracker.didCommit();//提供Mbean的一些狀態監控

      log.info("end_commit_flush");

      error=false;
    }
    finally {//commlit後將一些監控置0
      iwCommit.unlock();
      addCommands.set(0);
      deleteByIdCommands.set(0);
      deleteByQueryCommands.set(0);
      numErrors.set(error ? 1 : 0);
    }

    // if we are supposed to wait for the searcher to be registered, then we should do it
    // outside of the synchronized block so that other update operations can proceed.
    if (waitSearcher!=null && waitSearcher[0] != null) {
       try {
        waitSearcher[0].get();//等待Searcher通過一系列操做，例如Cache的預熱。
      } catch (InterruptedException e) {
        SolrException.log(log,e);
      } catch (ExecutionException e) {
        SolrException.log(log,e);
      }
    }
  }

其中最重要的方法

core.getSearcher(true,false,waitSearcher);

再展開來看參數含義，

參數1 boolean forceNew，是否打開新的searcher對象

參數2 boolean returnSearcher，是否返回最新的searcher對象

參數3 final Future[] waitSearcher 是否等待searcher的預加工動做，也就是調用該方法的線程將會等待這個searcher對象的預加工動做，若是該searcher對象管理不少的 Cache並設置較大的預熱數目，該線程將會等待較長時間才能返回。（預熱，也許會不少人不瞭解預熱的含義，我在這裏稍微解釋下，例如一個Cache已經緩存了比較多的值，若是由於新的IndexSearcher被從新構建，那麼新的Cache又會須要從新累積數據，那麼會發現搜索忽然會在一段時間性能急劇降低，要等到Cache從新累計了必定數據，命中率纔會慢慢恢復。因此這樣的情形實際上是不可接受的，那麼咱們能夠作的事情就是將老Cache對應的 key,在從新構建SolrIndexSearcher返回以前將這些已經在老Cache中Key預先從磁盤從新load Value到Cache中，這樣暴露出去的SolrIndexSearcher對應的Cache就不是一個內容爲空的Cache。而是已經「背地」準備好內容的Cache）

getSearcher()關於Cache有2個最重要的代碼段，其一，從新構造新的SolrIndexSearcher：

newestSearcher = getNewestSearcher(false);
      String newIndexDir = getNewIndexDir();
      File indexDirFile = new File(getIndexDir()).getCanonicalFile();
      File newIndexDirFile = new File(newIndexDir).getCanonicalFile();
      // reopenReaders在solrconfig.xml配置，若是爲false，每次都是從新打開新的IndexReader
      if (newestSearcher != null && solrConfig.reopenReaders
          && indexDirFile.equals(newIndexDirFile)) {
        IndexReader currentReader = newestSearcher.get().getReader();
        IndexReader newReader = currentReader.reopen();//若是索引目錄沒變則是reopen indexReader

        if (newReader == currentReader) {
          currentReader.incRef();
        }

        tmp = new SolrIndexSearcher(this, schema, "main", newReader, true, true);//構建新的SolrIndexSearcher
      } else {//根據配置的IndexReaderFactory來返回對應的IndexReader
        IndexReader reader = getIndexReaderFactory().newReader(getDirectoryFactory().open(newIndexDir), true);
        tmp = new SolrIndexSearcher(this, schema, "main", reader, true, true);//返回構建新的SolrIndexSearcher
      }

在看看建立SolrIndexSearcher構造函數關於Cache的關鍵代碼：

if (cachingEnabled) {//若是最後的參數爲true表明能夠進行Cache
      ArrayList<SolrCache> clist = new ArrayList<SolrCache>();
      fieldValueCache = solrConfig.fieldValueCacheConfig==null ? null : solrConfig.fieldValueCacheConfig.newInstance();
      if (fieldValueCache!=null) clist.add(fieldValueCache);//若是solrconfig配置 <fieldValueCache....，構建新的Cache
      filterCache= solrConfig.filterCacheConfig==null ? null : solrConfig.filterCacheConfig.newInstance();
      if (filterCache!=null) clist.add(filterCache);//若是solrconfig配置  <filterCache ...，構建新的Cache
      queryResultCache = solrConfig.queryResultCacheConfig==null ? null : solrConfig.queryResultCacheConfig.newInstance();
      if (queryResultCache!=null) clist.add(queryResultCache);//若是solrconfig配置  <queryResultCache...，構建新的Cache
      documentCache = solrConfig.documentCacheConfig==null ? null : solrConfig.documentCacheConfig.newInstance();
      if (documentCache!=null) clist.add(documentCache);//若是solrconfig配置  <documentCache...,構建新的Cache

      if (solrConfig.userCacheConfigs == null) {
        cacheMap = noGenericCaches;
      } else {//自定義的Cache
        cacheMap = new HashMap<String,SolrCache>(solrConfig.userCacheConfigs.length);
        for (CacheConfig userCacheConfig : solrConfig.userCacheConfigs) {
          SolrCache cache = null;
          if (userCacheConfig != null) cache = userCacheConfig.newInstance();
          if (cache != null) {
            cacheMap.put(cache.name(), cache);
            clist.add(cache);
          }
        }
      }

      cacheList = clist.toArray(new SolrCache[clist.size()]);
    }

其二，將老searcher對應的Cache進行預熱：

future = searcherExecutor.submit(
                new Callable() {
                  public Object call() throws Exception {
                    try {
                      newSearcher.warm(currSearcher);
                    } catch (Throwable e) {
                      SolrException.logOnce(log,null,e);
                    }
                    return null;
                  }
                }
        );

展開看warm(SolrIndexSearcher old)方法（具體如何預熱Cache將在其餘文章進行詳述）：

public void warm(SolrIndexSearcher old) throws IOException {
    // Make sure this is first!  filters can help queryResults execute!
    boolean logme = log.isInfoEnabled();
    long warmingStartTime = System.currentTimeMillis();
    // warm the caches in order...
    for (int i=0; i<cacheList.length; i++) {//遍歷全部配置的Cache，將進行old-->new 的Cache預熱。
      if (logme) log.info("autowarming " + this + " from " + old + "\n\t" + old.cacheList[i]);
      this.cacheList[i].warm(this, old.cacheList[i]);
      if (logme) log.info("autowarming result for " + this + "\n\t" + this.cacheList[i]);
    }
    warmupTime = System.currentTimeMillis() - warmingStartTime;//整個預熱所耗時間
  }

到這裏爲止，SolrIndexSearcher進行Cache建立就介紹完畢，而Cache的銷燬也是經過SolrIndexSearcher的關閉一併進行，見solrIndexSearcher.close()方法：

public void close() throws IOException {
    if (cachingEnabled) {
      StringBuilder sb = new StringBuilder();
      sb.append("Closing ").append(name);
      for (SolrCache cache : cacheList) {
        sb.append("\n\t");
        sb.append(cache);
      }
      log.info(sb.toString());//打印Cache狀態信息，例如當前Cache命中率。累積命中率，大小等。
    } else {
      log.debug("Closing " + name);
    }
    core.getInfoRegistry().remove(name);

    // super.close();
    // can't use super.close() since it just calls reader.close() and that may only be called once
    // per reader (even if incRef() was previously called).
    if (closeReader) reader.decRef();//Reader對象計數減1

    for (SolrCache cache : cacheList) {
      cache.close();//關閉Cache
    }

    // do this at the end so it only gets done if there are no exceptions
    numCloses.incrementAndGet();
  }

OK，到這裏，Cache經由SolrIndexSearcher管理的邏輯就完整介紹完畢。

2 Cache的使用場景

（1）filterCache

該Cache主要是針對用戶Query中使用fq的狀況，會將fq對應的查詢結果放入Cache，若是業務上有不少比較固定的查詢Query，例如固定狀態值，好比固定查詢某個區間的Query均可以使用fq將結果緩存到Cache中。查詢query中能夠設置多個fq進行Cache，可是值得注意的是多個fq都是以交集的結果返回。

另一個最爲重要的例外場景，在Solr中若是設置，useFilterForSortedQuery=true，filterCache不爲空，且帶有sort的排序查詢，將會進入以下代碼塊:

if ((flags & (GET_SCORES|NO_CHECK_FILTERCACHE))==0 && useFilterForSortedQuery && cmd.getSort() != null && filterCache != null) {
      useFilterCache=true;
      SortField[] sfields = cmd.getSort().getSort();
      for (SortField sf : sfields) {
        if (sf.getType() == SortField.SCORE) {
          useFilterCache=false;
          break;
        }
      }
    }

    // disable useFilterCache optimization temporarily
    if (useFilterCache) {
      // now actually use the filter cache.
      // for large filters that match few documents, this may be
      // slower than simply re-executing the query.
      if (out.docSet == null) {//在DocSet方法中將會把Query的結果也Cache到filterCache中。
        out.docSet = getDocSet(cmd.getQuery(),cmd.getFilter());
        DocSet bigFilt = getDocSet(cmd.getFilterList());//fq不爲空將Cache結果到filterCache中。
        if (bigFilt != null) out.docSet = out.docSet.intersection(bigFilt);//返回2個結果集合的交集
      }
      // todo: there could be a sortDocSet that could take a list of
      // the filters instead of anding them first...
      // perhaps there should be a multi-docset-iterator
      superset = sortDocSet(out.docSet,cmd.getSort(),supersetMaxDoc);//排序
      out.docList = superset.subset(cmd.getOffset(),cmd.getLen());//返回len 大小的結果集合

（2）documentCache主要是對document結果的Cache，通常而言若是查詢不是特別固定，命中率將不會很高。

（3）fieldvalueCache 緩存在facet組件使用狀況下對multiValued=true的域相關計數進行Cache，通常那些多值域採用facet查詢必定要開啓該Cache，主要緩存（參考UnInvertedField 的實現）:

maxTermCounts 最大Term數目

numTermsInField 該Field有多少個Term

bigTerms 存儲那些Term docFreq 大於threshold的term

tnums 一個記錄 term和何其Nums的二維數組

每次FacetComponent執行process方法–>SimpleFacets.getFacetCounts()–>getFacetFieldCounts()–>getTermCounts(facetValue)–>

UnInvertedField.getUnInvertedField(field, searcher);展開看該方法

public static UnInvertedField getUnInvertedField(String field, SolrIndexSearcher searcher) throws IOException {
    SolrCache cache = searcher.getFieldValueCache();
    if (cache == null) {
      return new UnInvertedField(field, searcher);//直接返回
    }

    UnInvertedField uif = (UnInvertedField)cache.get(field);
    if (uif == null) {//第一次初始化該域對應的UnInvertedField
      synchronized (cache) {
        uif = (UnInvertedField)cache.get(field);
        if (uif == null) {
          uif = new UnInvertedField(field, searcher);
          cache.put(field, uif);
        }
      }
    }

    return uif;
  }

（4）queryresultCache 對Query的結果進行緩存，主要在SolrIndexSearcher類的getDocListC（）方法中被使用，主要緩存具備 QueryResultKey的結果集。也就是說具備相同QueryResultKey的查詢均可以命中cache,因此咱們看看 QueryResultKey的equals方法如何判斷怎麼纔算相同QueryResultKey：

public boolean equals(Object o) {
    if (o==this) return true;
    if (!(o instanceof QueryResultKey)) return false;
    QueryResultKey other = (QueryResultKey)o;

    // fast check of the whole hash code... most hash tables will only use
    // some of the bits, so if this is a hash collision, it's still likely
    // that the full cached hash code will be different.
    if (this.hc != other.hc) return false;

    // check for the thing most likely to be different (and the fastest things)
    // first.
    if (this.sfields.length != other.sfields.length) return false;//比較排序域長度
    if (!this.query.equals(other.query)) return false;//比較query
    if (!isEqual(this.filters, other.filters)) return false;//比較fq

    for (int i=0; i<sfields.length; i++) {
      SortField sf1 = this.sfields[i];
      SortField sf2 = other.sfields[i];
      if (!sf1.equals(sf2)) return false;//比較排序域
    }

    return true;
  }

從上面的代碼看出，若是要命中一個queryResultCache，須要知足query、filterquery sortFiled一致才行。

3 Cache的配置介紹

要使用Solr的四種Cache，只須要在SolrConfig中配置以下內容便可：

<query>
        <filterCache               size="300"      initialSize="10"      autowarmCount="300"/>
        <queryResultCache      size="300"      initialSize="10"      autowarmCount="300"/>
        <fieldValueCache       size="300"      initialSize="10"       autowarmCount="300" />
        <documentCache             size="5000"      initialSize="512"      autowarmCount="300"/>
        <useFilterForSortedQuery>true</useFilterForSortedQuery>//是否能使用到filtercache關鍵配置
        <queryResultWindowSize>50</queryResultWindowSize>//queryresult的結果集控制
        <enableLazyFieldLoading>false</enableLazyFieldLoading>//是否啓用懶加載field
 </query>

其中size爲緩存設置大小，initalSize初始化大小，autowarmCount 是最爲關鍵的參數表明每次構建新的SolrIndexSearcher的時候須要後臺線程預熱加載到新Cache中多少個結果集。

那是否是這個預熱數目越大就越好呢，其實仍是要根據實際狀況而定。若是你的應用爲實時應用，不少實時應用的實現都會在很短的時間內去獲得從新打開的內存索引indexReader，而Solr默認實現就會從新打開一個新的SolrIndexSearcher,那麼若是Cache須要預熱的數目越多，那麼打開新的SolrIndexSearcher就會越慢，這樣對實時性就會大打折扣。

可是若是設置很小。每次都打開新的SolrIndexSearcher都是空Cache，基本上那些fq和facet的查詢就基本不會命中緩存。因此對實時應用須要特別注意。

4 Cache的命中監控

頁面查詢：

http://localhost:8080/XXXX/XXXX/admin/stats.jsp 進行查詢便可：

其中 lookups 爲當前cache 查詢數， hitratio 爲當前cache命中率，inserts爲當前cache插入數，evictions從cache中踢出來的數據個數,size 爲當前cache緩存數， warmuptime爲當前cache預熱所消耗時間，而已cumulative都爲該類型Cache累計的查詢，命中，命中率，插入、踢出的數目。

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。