SolrConfig詳解

solrconfig.xml配置文件中包含了不少solr自身配置相關的參數,solrconfig.xml配置文件示例能夠從solr的解壓目錄下找到,如圖: 用文本編輯軟件打開solrconfig.xml配置,你將會看到如下配置內容:web

<?xml version="1.0" encoding="UTF-8" ?>
 
<config>
    <luceneMatchVersion>4.6</luceneMatchVersion>
    <dataDir>${solr.core0.data.dir:}</dataDir>
    <directoryFactory name="DirectoryFactory"
        class="${solr.directoryFactory:solr.StandardDirectoryFactory}" />
    <codecFactory class="solr.SchemaCodecFactory" />
    <schemaFactory class="ClassicIndexSchemaFactory" />
 
    <indexConfig>
        <lockType>${solr.lock.type:native}</lockType>
        <infoStream>true</infoStream>
    </indexConfig>
 
    <jmx />
 
    <updateHandler class="solr.DirectUpdateHandler2">
        <updateLog>
            <str name="dir">${solr.core0.data.dir:}</str>
        </updateLog>
        <autoCommit> 
           <maxTime>${solr.autoCommit.maxTime:15000}</maxTime> 
           <openSearcher>false</openSearcher> 
         </autoCommit>
    </updateHandler>
 
    <!-- Cache -->
    <query>
        <maxBooleanClauses>1024</maxBooleanClauses>
        <filterCache class="solr.FastLRUCache" size="512" initialSize="512" autowarmCount="256"/>
        <queryResultCache class="solr.LRUCache" size="5120" initialSize="5120" autowarmCount="2560"/>
        <documentCache class="solr.LRUCache" size="5120" initialSize="5120" autowarmCount="2560"/>
        <fieldValueCache class="solr.FastLRUCache" size="512" autowarmCount="128" showItems="64" />
        <cache name="perSegFilter" class="solr.search.LRUCache" size="10"
            initialSize="5" autowarmCount="10" regenerator="solr.NoOpRegenerator" />
        <enableLazyFieldLoading>true</enableLazyFieldLoading>
        <queryResultWindowSize>20</queryResultWindowSize>
        <queryResultMaxDocsCached>200</queryResultMaxDocsCached>
        <!--  CacheFirstQuery  -->
        <listener event="newSearcher" class="solr.QuerySenderListener">
            <arr name="_type">
                <lst><str name="fq">_type</str></lst>
            </arr>
            <arr name="zipCode">
                <lst><str name="fq">zipCode</str></lst>
            </arr>
            <arr name="categorys">
                <lst><str name="fq">categorys</str></lst>
            </arr>
            <arr name="overall">
                <lst><str name="fq">overall</str></lst>
            </arr>
            <arr name="languageSpoken">
                <lst><str name="fq">languageSpoken</str></lst>
            </arr>
            <arr name="status">
                <lst><str name="fq">status</str></lst>
            </arr>
            <arr name="createDate">
                <lst><str name="fq">createDate</str></lst>
            </arr>
            <arr name="budget">
                <lst>
                    <str name="facet.field">budget_min</str>
                    <str name="facet.field">budget_max</str>
                </lst>
            </arr>
            <arr name="budgets">
                <lst>
                    <str name="facet.field">budget_min:[0 TO 50] AND budget_max:{0 TO *]</str>
                    <str name="facet.field">budget_min:[0 TO 100] AND budget_max:{50 TO *]</str>
                    <str name="facet.field">budget_min:[0 TO 150] AND budget_max:{100 TO *]</str>
                    <str name="facet.field">budget_min:[0 TO 200] AND budget_max:{150 TO *]</str>
                    <str name="facet.field">budget_min:[0 TO 300] AND budget_max:{200 TO *]</str>
                    <str name="facet.field">budget_min:[0 TO 400] AND budget_max:{300 TO *]</str>
                    <str name="facet.field">budget_min:[0 TO 500] AND budget_max:{400 TO *]</str>
                    <str name="facet.field">budget_min:[0 TO 1000] AND budget_max:{500 TO *]</str>
                    <str name="facet.field">budget_min:[0 TO 2000] AND budget_max:{1500 TO *]</str>
                    <str name="facet.field">budget_min:[0 TO 3000] AND budget_max:{2000 TO *]</str>
                    <str name="facet.field">budget_min:[0 TO 4000] AND budget_max:{3000 TO *]</str>
                    <str name="facet.field">budget_min:[0 TO 5000] AND budget_max:{4000 TO *]</str>
                    <str name="facet.field">budget_max:[5000 TO *]</str>
                </lst>
            </arr>
        </listener>
        <listener event="firstSearcher" class="solr.QuerySenderListener">
            <arr name="_type">
                <lst><str name="fq">_type</str></lst>
            </arr>
            <arr name="zipCode">
                <lst><str name="fq">zipCode</str></lst>
            </arr>
            <arr name="categorys">
                <lst><str name="fq">categorys</str></lst>
            </arr>
            <arr name="overall">
                <lst><str name="fq">overall</str></lst>
            </arr>
            <arr name="languageSpoken">
                <lst><str name="fq">languageSpoken</str></lst>
            </arr>
            <arr name="status">
                <lst><str name="fq">status</str></lst>
            </arr>
            <arr name="createDate">
                <lst><str name="fq">createDate</str></lst>
            </arr>
            <arr name="budget">
                <lst>
                    <str name="facet.field">budget_min</str>
                    <str name="facet.field">budget_max</str>
                </lst>
            </arr>
            <arr name="budgets">
                <lst>
                    <str name="facet.field">budget_min:[0 TO 50] AND budget_max:{0 TO *]</str>
                    <str name="facet.field">budget_min:[0 TO 100] AND budget_max:{50 TO *]</str>
                    <str name="facet.field">budget_min:[0 TO 150] AND budget_max:{100 TO *]</str>
                    <str name="facet.field">budget_min:[0 TO 200] AND budget_max:{150 TO *]</str>
                    <str name="facet.field">budget_min:[0 TO 300] AND budget_max:{200 TO *]</str>
                    <str name="facet.field">budget_min:[0 TO 400] AND budget_max:{300 TO *]</str>
                    <str name="facet.field">budget_min:[0 TO 500] AND budget_max:{400 TO *]</str>
                    <str name="facet.field">budget_min:[0 TO 1000] AND budget_max:{500 TO *]</str>
                    <str name="facet.field">budget_min:[0 TO 2000] AND budget_max:{1500 TO *]</str>
                    <str name="facet.field">budget_min:[0 TO 3000] AND budget_max:{2000 TO *]</str>
                    <str name="facet.field">budget_min:[0 TO 4000] AND budget_max:{3000 TO *]</str>
                    <str name="facet.field">budget_min:[0 TO 5000] AND budget_max:{4000 TO *]</str>
                    <str name="facet.field">budget_max:[5000 TO *]</str>
                </lst>
            </arr>
        </listener>
        <useColdSearcher>false</useColdSearcher>
        <maxWarmingSearchers>2</maxWarmingSearchers>
    </query>
 
    <requestDispatcher handleSelect="false">
        <requestParsers enableRemoteStreaming="true"
            multipartUploadLimitInKB="2048000" formdataUploadLimitInKB="2048"
            addHttpRequestToContext="false" />
        <httpCaching never304="true" />
    </requestDispatcher>
 
    <requestHandler name="/select" class="solr.SearchHandler">
        <lst name="defaults">
            <str name="echoParams">explicit</str>
            <int name="rows">10</int>
            <str name="df">id</str>
        </lst>
    </requestHandler>
 
    <requestHandler name="/query" class="solr.SearchHandler">
        <lst name="defaults">
            <str name="echoParams">explicit</str>
            <str name="wt">json</str>
            <str name="indent">true</str>
            <str name="df">text</str>
        </lst>
    </requestHandler>
 
    <requestHandler name="/get" class="solr.RealTimeGetHandler">
        <lst name="defaults">
            <str name="omitHeader">true</str>
            <str name="wt">json</str>
            <str name="indent">true</str>
        </lst>
    </requestHandler>
    <requestHandler name="/update" class="solr.UpdateRequestHandler" />
    <requestHandler name="/analysis/field" startup="lazy" class="solr.FieldAnalysisRequestHandler" />
    <requestHandler name="/analysis/document" class="solr.DocumentAnalysisRequestHandler" startup="lazy" />
    <requestHandler name="standard" class="solr.StandardRequestHandler" default="true" />
    <requestHandler name="/admin/" class="org.apache.solr.handler.admin.AdminHandlers" />
 
    <requestHandler name="/admin/ping" class="solr.PingRequestHandler">
        <lst name="invariants">
            <str name="q">solrpingquery</str>
        </lst>
        <lst name="defaults">
            <str name="echoParams">all</str>
            <str name="df">id</str>
        </lst>
    </requestHandler>
    <requestHandler name="/replication" class="solr.ReplicationHandler"
        startup="lazy" />
 
    <queryResponseWriter name="json" class="solr.JSONResponseWriter">
        <str name="content-type">text/plain; charset=UTF-8</str>
    </queryResponseWriter>
    <queryResponseWriter name="velocity"
        class="solr.VelocityResponseWriter" startup="lazy" />
 
    <admin>
        <defaultQuery>*:*</defaultQuery>
    </admin>
 
</config>

 

 

下面我將對其中關鍵地方加以解釋說明:正則表達式

 lib

         lib標籤指令能夠用來告訴Solr如何去加載solr plugins(Solr插件)依賴的jar包,在solrconfig.xml配置文件的註釋中有配置示例.算法

       例如:<lib dir="./lib" regex=」lucene-\w+\.jar」/> apache

這裏的dir表示一個jar包目錄路徑,該目錄路徑是相對於你當前core根目錄的;regex表示一個正則表達式,用來過濾文件名的,符合正則表達式的jar文件將會被加載json

dataDir parameter

例如:<dataDir>/var/data/solr</dataDir>緩存

用來指定一個solr的索引數據目錄,solr建立的索引會存放在data\index目錄下,默認dataDir    是相 對於當前core目錄(若是solr_home下存在core的話),若是solr_home下不存在core的話,dataDir默認就是相對於solr_home啦,不過通常dataDir都在core.properties下配置服務器

codecFactory

用來設置Lucene倒排索引的編碼工廠類,默認實現是官方提供的SchemaCodecFactory類。app

indexConfig Section

solrconfig.xml<indexConfig>標籤中間有不少關於此配置項的說明:webapp

<!-- maxFieldLength was removed in 4.0. To get similar behavior, include a分佈式

     LimitTokenCountFilterFactory in your fieldType definition. E.g.

<filter class="solr.LimitTokenCountFilterFactory" maxTokenCount="10000"/>

提供咱們maxFieldLength配置項已經從4.0版本開始就已經被移除了,可使用配置一個filter達到類似的效果,maxTokenCount即在對某個域分詞的時候,最多隻提取前10000Token,後續的域值將被拋棄。maxFieldLength若表示1000,則意味着只會對域值的0~1000範圍內的字符串進行分詞索引。

writeLockTimeout

<writeLockTimeout>1000</writeLockTimeout>

writeLockTimeout表示IndexWriter實例在獲取寫鎖的時候最大等待超時時間,超過指定的超時時間仍未獲取到寫鎖,則IndexWriter寫索引操做將會拋出異常

 

 

maxIndexingThreads

<maxIndexingThreads>8</maxIndexingThreads>

表示建立索引的最大線程數,默認是開闢8個線程來建立索引

useCompoundFile

<useCompoundFile>false</useCompoundFile>

是否開啓複合文件模式,啓用了複合文件模式即意味着建立的索引文件數量會減小,這樣佔用的文件描述符也會減小,但這會帶來性能的損耗,在Lucene中,它默認是開啓,而在Solr中,自從3.6版本開始,默認就是禁用的

ramBufferSizeMB

 <ramBufferSizeMB>100</ramBufferSizeMB>

表示建立索引時內存緩存大小,單位是MB,默認最大是100M

maxBufferedDocs

<maxBufferedDocs>1000</maxBufferedDocs>

表示在document寫入到硬盤以前,緩存的document最大個數,超過這個最大值會觸發索引的flush操做。

mergePolicy

       <mergePolicy class="org.apache.lucene.index.TieredMergePolicy"> 

              <int name="maxMergeAtOnce">10</int> 

              <int name="segmentsPerTier">10</int> 

       </mergePolicy>

 用來配置Lucene索引段合併策略的,裏面有兩個參數:

maxMergeAtOne: 一次最多合併段個數

segmentPerTier:  每一個層級的段個數,同時也是內存buffer遞減的等比數列的公比.

 

 mergeScheduler

<mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler"/>

mergeScheduler剛纔提到過了,這是用來配置段合併操做的處理類。默認實現類是Lucene中自帶的ConcurrentMergeScheduler

 

lockType

<lockType>${solr.lock.type:native}</lockType>

這個是用來指定LuceneLockFactory實現的,可配置項以下:

         single = SingleInstanceLockFactory - suggested for a 

              read-only index or when there is no possibility of 

              another process trying to modify the index. 

         native = NativeFSLockFactory - uses OS native file locking. 

              Do not use when multiple solr webapps in the same 

              JVM are attempting to share a single index. 

         simple = SimpleFSLockFactory  - uses a plain file for locking 

         Defaults: 'native' is default for Solr3.6 and later, otherwise 'simple' is the default 

 

single:表示只讀鎖,沒有另一個處理線程會去修改索引數據

native:即Lucene中的NativeFSLockFactory實現,使用的是基於操做系統的本地文件鎖

simple:即Lucene中的SimpleFSLockFactory實現,經過在硬盤上建立write.lock鎖文件實現

Defaults:從solr3.6版本開始,這個默認值是native,不然,默認值就是simple,意思就是說,你若是配置爲Defaults,到底使用哪一種鎖實現,取決於你當前使用的Solr版本。

 

unlockOnStartup

<unlockOnStartup>false</unlockOnStartup>

若是這個設置爲true,那麼在solr啓動後,IndexWritercommit提交操做擁有的鎖將會被釋放,這會打破Lucene的鎖機制,請謹慎使用。若是你的lockType設置爲single,那麼這個配置true or false都不會產生任何影響。

 

deletionPolicy

<deletionPolicy class="solr.SolrDeletionPolicy">

用來配置索引刪除策略的,默認使用的是SolrSolrDeletionPolicy實現。若是你須要自定義刪除策略,那麼你須要實現Luceneorg.apache.lucene.index.IndexDeletionPolicy接口。

 

jmx

<jmx />

這個配置是用來在Solr中啓用JMX,有關這方面的詳細信息,請移步到Solr官方Wiki,訪問地址以下:

http://wiki.apache.org/solr/SolrJmx

 updateHandler

<updateHandler class="solr.DirectUpdateHandler2">

指定索引更新操做處理類,DirectUpdateHandler2是一個高性能的索引更新處理類,它支持軟提交

updateLog

<updateLog>

      <str name="dir">${solr.ulog.dir:}</str>

</updateLog>

<updateLog>用來指定上面的updateHandler的處理事務日誌存放路徑的,默認值是solrdata目錄即solrdataDir配置的目錄。

filterCache

<filterCache class="solr.FastLRUCache" size="512"

                 initialSize="512" autowarmCount="0"/>

用來配置filter過濾器的緩存相關的參數

queryResultCache

<queryResultCache class="solr.LRUCache"

                      size="512"  initialSize="512" autowarmCount="0"/>

用來配置對Query返回的查詢結果集即TopDocs的緩存

documentCache

<documentCache class="solr.LRUCache"

                   size="512" initialSize="512"  autowarmCount="0"/>

用來配置對Document中存儲域的緩存,由於每次從硬盤上加載存儲域的值都是很昂貴的操做,這裏說的存儲域指的是那些Store.YESField.

fieldValueCache

<fieldValueCache class="solr.FastLRUCache"

                        size="512" autowarmCount="128" showItems="32" />

這個配置是用來緩存Document id的,用來快速訪問你的Document id的。這個配置項默認就是開啓的,無需顯式配置。

 

cache

<cache name="myUserCache"

              class="solr.LRUCache"  size="4096" initialSize="1024" autowarmCount="1024"

              regenerator="com.mycompany.MyRegenerator" />

這個配置是用來配置你的自定義緩存的,你本身的Regenerator須要實現SolrCacheRegenerator接口。

 useFilterForSortedQuery

<useFilterForSortedQuery>true</useFilterForSortedQuery>

表示當你的Query沒有使用score進行排序時,是否使用filter來替代Query.

QuerySenderListener

         <listener event="newSearcher" class="solr.QuerySenderListener"> 

      <arr name="queries"> 

        <!-- 

           <lst><str name="q">solr</str><str name="sort">price asc</str></lst> 

           <lst><str name="q">rocks</str><str name="sort">weight asc</str></lst>

          --> 

      </arr> 

     </listener>    

QuerySenderListener用來監聽查詢發送過程,即你能夠在Query請求發送以前追加一些請求參數,如上面給的示例中,能夠追加qery關鍵字以及sort排序規則。

  

searchComponent

<searchComponent name="spellcheck" class="solr.SpellCheckComponent">

用來配置查詢組件好比SpellCheckComponent拼寫檢查.

 

<searchComponent name="terms" class="solr.TermsComponent"/>

用來返回全部的Term以及每一個documentTerm的出現頻率

 

<searchComponent class="solr.HighlightComponent" name="highlight">

用來配置關鍵字高亮的,Solr高亮配置的詳細說明這裏暫時先略過,這篇咱們只是先暫時大體瞭解下每一個配置項的含義便可,具體如何使用留到後續再深刻研究。

 

 \

 

requestHandler

主從同步配置

主節點
<requestHandler name="/replication" class="solr.ReplicationHandler" >
    <lst name="master">
       <str name="replicateAfter">startup</str>
       <str name="replicateAfter">commit</str>
       <str name="replicateAfter">optimize</str>
       <str name="confFiles">schema.xml</str>
    </lst>
</requestHandler>
<updateHandler class="solr.DirectUpdateHandler2">
   <autoCommit>
      <maxDocs>1</maxDocs>
      <maxTime>1000</maxTime>
      <openSearcher>false</openSearcher>
   </autoCommit>
</updateHandler>
 
從節點
<requestHandler name="/replication" class="solr.ReplicationHandler" >
  <lst name="slave">
        <str name="masterUrl">http://10.28.175.246:8080/solr/waiter</str>
        <str name="pollInterval">00:00:20</str>
   </lst>
</requestHandler>

這是我從WIKI上摘抄下來的之前在Adystem Solr分佈式中用到的一些配置.

說明: 

  • masterUrl : 主服務器同步URL地址
  • pollInterval:從服務器同步間隔,即每隔多長時間同步一次主服務器
  • httpConnTimeout:設置鏈接超時(單位:毫秒)
  • httpReadTimeout:若是設置同步索引文件過大,則應適當提升此值。(單位:毫秒)
  • httpBasicAuthUser:驗證用戶名,須要和主服務器一致
  • httpBasicAuthPassword:驗證密碼,需和主服務器一致
  • compression:external or internal 使用SOLR本身的壓縮算法或應用容器的

 

 

最後總結下:

solrconfig.xml中的配置項主要分如下幾大塊:

     1.依賴的lucene版本配置,這決定了你建立的Lucene索引結構,由於Lucene各版本之間的索引結構並非徹底兼容的,這個須要引發你的注意。

     2.索引建立相關的配置,如索引目錄,IndexWriterConfig類中的相關配置(它決定了你的索引建立性能)

     3.solrconfig.xml中依賴的外部jar包加載路徑配置

     4.JMX相關配置

     5.緩存相關配置,緩存包括過濾器緩存,查詢結果集緩存,Document緩存,以及自定義緩存等等

     6.updateHandler配置即索引更新操做相關配置

     7.RequestHandler相關配置,即接收客戶端HTTP請求的處理類配置

     8.查詢組件配置如HightLightSpellChecker等等

     9.ResponseWriter配置即響應數據轉換器相關配置,決定了響應數據是以什麼樣格式返回給客戶端的。

     10.自定義ValueSourceParser配置,用來干預Document的權重、評分,排序

相關文章
相關標籤/搜索