Solr

Solr
 

1、 Solr 介紹

 

1 全文檢索

 
什麼叫作全文檢索呢?這要從咱們生活中的數聽說起。
咱們生活中的數據整體分爲兩種:結構化數據和非結構化數據。
 
 
1)結構化數據:指具備固定格式或有限長度的數據,如數據庫,元數據等。
2)非結構化數據:指不定長或無固定格式的數據,如郵件,word 文檔等。
 
非結構化數據又一種叫法叫全文數據。
 
按照數據的分類,搜索也分爲兩種:
1)對結構化數據的搜索:如對數據庫的搜索,用 SQL 語句。
2)對非結構化數據的搜索:如利用 windows 的搜索也能夠搜索文件內容,Linux
下的 grep 命令,再如用 Google 和百度能夠搜索大量內容數據。
 

2 Lucene

 
Lucene 是一個高效的,基於 Java 的全文檢索庫。
 
Lucene 是 apache 軟件基金會 4 jakarta 項目組的一個子項目,是一個開放源代碼的全
文檢索引擎工具包,但它不是一個完整的全文檢索引擎,而是一個全文檢索引擎的架構,
Lucene 的目的是爲軟件開發人員提供一個簡單易用的工具包,以方便的在目標系統中實現
全文檢索的功能,或者是以此爲基礎創建起完整的全文檢索引擎。Lucene 是一套用於全文
檢索和搜尋的開源程序庫,由 Apache 軟件基金會支持和提供。Lucene 提供了一個簡單卻
強大的應用程序接口,可以作全文索引和搜尋。在 Java 開發環境裏 Lucene 是一個成熟的
免費開源工具。就其自己而言,Lucene 是當前以及最近幾年最受歡迎的免費 Java 信息檢
索程序庫。
 

3 Solr 簡介

 
Solr 是基於 Lucene 的面向企業搜索的 web 應用
Solr 是一個獨立的企業級搜索應用服務器,它對外提供相似於 Web-service 的 API 接
口。用戶能夠經過 http 請求,向搜索引擎服務器提交必定格式的 XML 文件,生成索引;也
能夠經過 Http Get 操做提出查找請求,並獲得 xml/json 格式的返回結果。
Solr 是一個高性能,採用 Java5 開發,基於 Lucene 的全文搜索服務器。同時對其進行
了擴展,提供了比 Lucene 更爲豐富的查詢語言,同時實現了可配置、可擴展並對查詢性能
進行了優化,而且提供了一個完善的功能管理界面,是一款很是優秀的全文檢索引擎。
文檔經過 Http 利用 XML 加到一個搜索集合中。查詢該集合也是經過 http 收到一個
XML/JSON 響應來實現。它的主要特性包括:高效、靈活的緩存功能,垂直搜索功能,高
亮顯示搜索結果,經過索引複製來提升可用性,提供一套強大 Data Schema 來定義字段,
類型和設置文本分析,提供基於 Web 的管理界面等。
 

2、 Solr 單機版安裝

 

1 安裝環境

 

1.1安裝 jdk

 

1.1.1JDK 版本:

 
jdk-8u11-linux-x64.tar.gz
 

1.1.2環境變量配置

vim /etc/profilephp

export JAVA_HOME=/usr/local/jdk
export
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$JAVA_HOME/bin:$PATH

1.2安裝 tomcat

 

1.2.1Tomcat 版本

 
apache-tomcat-7.0.47.tar.gz
 

2 安裝 Solr

 
Solr 版本:solr-4.10.3.tgz.tgz
 

2.1Solr 是由兩個部分構成:

 
1) Solr 的 web 服務
2) Solr 的索引庫2.2上傳 Solr 壓縮包
 

2.3解壓 Solr 壓縮包

 

2.4Solr 的目錄介紹

 
bin:啓動 solr 的一些腳本,可是須要依賴 jeety 容器
contrib:存放的是 solr 對第三方插件支持的內容
dist:solr 編譯後所產生一些文件夾。War 或者是 jar
example:是 solr 的案例。在該目錄下有兩個文件夾對於咱們來講比較重要。
1) webapps:在該目錄中存放了一個 solr 的 war 包。與 dist 目錄下的那個 war 文 件是
同一個,只是存放的目錄及名稱不一樣而已。
2) solr: 是 solr 的一個標準的索引庫示例。
3) lib/ext:該目錄下存放的是一些日誌處理器的 jar 包。Solr 的 web 服務也要依賴於日誌
處理的 jar 包。因此咱們在安裝 solr 服務時,須要將該目錄下的 jar 拷貝
給 solr 服務
 

2.5安裝 Solr 服務

 
其實安裝 solr 服務就是將 solr 的 war 包,拷貝到 tomcat 的 webapps 目錄下。
 

2.6啓動 tomcat,解壓 war 包

 
查看 tomcat 的啓動日誌,查看是否作 war 的解壓
tailf logs/catalina.out
 

2.7添加服務中所依賴的 jar 包

 
因爲咱們在解壓後的 solr 的項目中,須要依賴一些日誌處理的 jar 包。因此咱們在添加
依賴的 jar 包時,須要將原來的 war 刪除掉。不然 tomcat 再次啓動時,會將原來的目錄覆蓋
掉。那麼新添加的 jar 包也就沒了。注意:在刪除 war 包時,必定要在 tomcat 關閉的狀態下
刪除 war 包。若是在 tomcat 啓動狀態下刪除 war 包,那麼 tomcat 在關閉時會將解壓的目錄
一併的也刪除掉。
 
把 lib/ext下的jar包拷貝到tomcat中solr的lib下。

2.8安裝 solr 索引庫

 
在 solr 的解壓目錄的 example 目錄下有個 solr 的目錄,就是 solr 的一個基本的索引庫示
例。
 

2.9拷貝索引庫

 
將該索引庫拷貝到指定目錄下(能夠是任意目錄),雖然具有任意性。可是也不能太隨便。
應該放到/usr/local/solrhome。先建立 solrhome 目錄
 

2.10solr 的服務中配置索引庫

 
在 solr 的服務中配置索引庫的位置注意:須要配置的路徑爲索引庫的根。可使用 linux
中的 pwd 命令查看絕對路徑。將該路徑添加到 solr 服務中的 web.xml 文件中 Solr 服務在啓
動時,是經過他的 web.xml 文件中的節點配置獲取索引庫的絕對位置的。vim web.xml 在
web.xml 中找到<env-entry>.注意:該節點默認是註釋狀態的,咱們須要先去掉註釋。而後將
拷貝的索引庫的路徑添加到該節點的<env-entry-value>節點中

2.11訪問 Solr 服務

 
啓動 tomcat 經過 solr 的管理頁面能夠對 solr 進行操做了。啓動 tomcat 後,打開瀏覽器
輸入 solr 的訪問 url 就能夠訪問 solr 服務了

3 Solr 索引庫

 
3.1solr home 目錄結構
 
3.1.1solr.xml 配置 solr 集羣
 
3.1.2collection1(索引庫:solr core)
 
3.1.3core.properties 設置索引庫的名稱
 
3.1.4data 存放索引
 
3.1.5conf 索引庫的配置目錄
 
3.1.5.1 schema.xml:配置字段以及字段類型
 

3.2索引庫配置

 
schema.xml 是用來定義索引數據中的域的,包括域名稱,域類型,域是否索引,是否
分詞,是否存儲等等。
 

3.2.1如何定義索引庫中的 Field

 
<field>:定義域
<field name="_version_" type="long" indexed="true" stored="true"/>
name:表示域的名稱,是強制必須有的屬性
type:域類型的名稱,與 fieldType 元素的 name 屬性值對應,是強制必須有的屬性
indexed:是否參與檢索。true 即表示須要對該域進行索引。默認值爲 false
stored:是否將 field 域中的內容存儲到文檔域,簡單通俗的來講,就是你這一個
field 需不須要被看成查詢結果返回。
required:表示這個域是不是必需要在 document 中存在,默認值爲 false,若是此配
置項設爲 true,則你的 document 中必需要添加此域,不然你建立索引時會拋異常。
 

3.2.2如何定義索引庫中的 FieldType

 
<fieldType>:定義域的類型
<fieldType name="string" class="solr.StrField" sortMissingLast="true" />
Name:域類型的名稱,做爲域類型標識符存在,在定義域(Field)時使用的類型
(FieldType)屬性就是域類型的名稱。
Class:域類型的數據類型,該屬性指向的是 solr 中的已定義的類型,或者是用戶定
義的類型,域類型中的數據會被初始化成 class 執行類類的對象。sortMissingFirst/sortMissingLast:控制當排序域的值不存在時該文檔(Document)
所在隊列的位置。true 是則在隊頭/隊尾
 

3.2.3如何定義索引庫中的 CopyField

 
<copyField>:複製域。可實現更新與查詢分離
<copyField source="item_title" dest="item_keywords"/>
Source:源域
Dest:目標域
 

3.2.4Solr 的索引機制

 

3.2.4.1 正排索引(正向索引)

 
正排索引是以文檔的 ID 爲關鍵字,索引文檔中每一個字的位置信息,查找時掃描索引中
每一個文檔中字的信息直到找出全部包含查詢關鍵字的文檔。
可是在查詢的時候需對全部的文檔進行掃描以確保沒有遺漏,這樣就使得檢索時間大大
延長,檢索效率低下。
儘管正排索引的工做原理很是的簡單,可是因爲其檢索效率過低,除非在特定狀況下,
不然實用性價值不大。
正排索引從文檔編號找詞

3.2.4.2 倒排索引(反向索引)

 
對數據進行分析,抽取出數據中的詞條,以詞條做爲 key,對應數據的存儲位置做爲
value,實現索引的存儲。這種索引稱爲倒排索引。
當 solr 存儲文檔時,solr 會首先對文檔數據進行分詞,建立索引庫和文檔數據庫。所謂
的分詞是指:將一段字符文本按照必定的規則分紅若干個單詞。
 
倒排索引是從詞找文檔編號
 

3.2.5配置中文分詞器(IK Analyzer)

 
3.2.5.1 上傳中文分詞器 jar 包,以及配置文件
 
3.2.5.2 將中文分詞器的配置文件以及 jar包拷貝到 Solr 所對應的目
錄下
 
將配置文件須要放到 classes 目錄下。
 
在 solr 中的 WEB-INF 下時沒有 classes 目錄的。咱們須要先建立一個

3.2.5.3 在 schema.xml 中配置中文分詞器
 

 

 中文分詞配置以下:html

<?xml version="1.0" encoding="UTF-8" ?>
<!--
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the "License"); you may not use this file except in compliance with
 the License.  You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
-->

<!--  
 This is the Solr schema file. This file should be named "schema.xml" and
 should be in the conf directory under the solr home
 (i.e. ./solr/conf/schema.xml by default) 
 or located where the classloader for the Solr webapp can find it.

 This example schema is the recommended starting point for users.
 It should be kept correct and concise, usable out-of-the-box.

 For more information, on how to customize this file, please see
 http://wiki.apache.org/solr/SchemaXml

 PERFORMANCE NOTE: this schema includes many optional features and should not
 be used for benchmarking.  To improve performance one could
  - set stored="false" for all fields possible (esp large fields) when you
    only need to search on the field but don't need to return the original
    value.
  - set indexed="false" if you don't need to search on the field, but only
    return the field as a result of searching on other indexed fields.
  - remove all unneeded copyField statements
  - for best index size and searching performance, set "index" to false
    for all general text fields, use copyField to copy them to the
    catchall "text" field, and use that for searching.
  - For maximum indexing performance, use the ConcurrentUpdateSolrServer
    java client.
  - Remember to run the JVM in server mode, and use a higher logging level
    that avoids logging every request
-->

<schema name="example" version="1.5">
  <!-- attribute "name" is the name of this schema and is only used for display purposes.
       version="x.y" is Solr's version number for the schema syntax and 
       semantics.  It should not normally be changed by applications.

       1.0: multiValued attribute did not exist, all fields are multiValued 
            by nature
       1.1: multiValued attribute introduced, false by default 
       1.2: omitTermFreqAndPositions attribute introduced, true by default 
            except for text fields.
       1.3: removed optional field compress feature
       1.4: autoGeneratePhraseQueries attribute introduced to drive QueryParser
            behavior when a single string produces multiple tokens.  Defaults 
            to off for version >= 1.4
       1.5: omitNorms defaults to true for primitive field types 
            (int, float, boolean, string...)
     -->


   <!-- Valid attributes for fields:
     name: mandatory - the name for the field
     type: mandatory - the name of a field type from the 
       <types> fieldType section
     indexed: true if this field should be indexed (searchable or sortable)
     stored: true if this field should be retrievable
     docValues: true if this field should have doc values. Doc values are
       useful for faceting, grouping, sorting and function queries. Although not
       required, doc values will make the index faster to load, more
       NRT-friendly and more memory-efficient. They however come with some
       limitations: they are currently only supported by StrField, UUIDField
       and all Trie*Fields, and depending on the field type, they might
       require the field to be single-valued, be required or have a default
       value (check the documentation of the field type you're interested in
       for more information)
     multiValued: true if this field may contain multiple values per document
     omitNorms: (expert) set to true to omit the norms associated with
       this field (this disables length normalization and index-time
       boosting for the field, and saves some memory).  Only full-text
       fields or fields that need an index-time boost need norms.
       Norms are omitted for primitive (non-analyzed) types by default.
     termVectors: [false] set to true to store the term vector for a
       given field.
       When using MoreLikeThis, fields used for similarity should be
       stored for best performance.
     termPositions: Store position information with the term vector.  
       This will increase storage costs.
     termOffsets: Store offset information with the term vector. This 
       will increase storage costs.
     required: The field is required.  It will throw an error if the
       value does not exist
     default: a value that should be used if no value is specified
       when adding a document.
   -->

   <!-- field names should consist of alphanumeric or underscore characters only and
      not start with a digit.  This is not currently strictly enforced,
      but other field names will not have first class support from all components
      and back compatibility is not guaranteed.  Names with both leading and
      trailing underscores (e.g. _version_) are reserved.
   -->

   <!-- If you remove this field, you must _also_ disable the update log in solrconfig.xml
      or Solr won't start. _version_ and update log are required for SolrCloud
   --> 
   <field name="_version_" type="long" indexed="true" stored="true"/>
   
   <!-- points to the root document of a block of nested documents. Required for nested
      document support, may be removed otherwise
   -->
   <field name="_root_" type="string" indexed="true" stored="false"/>

   <!-- Only remove the "id" field if you have a very good reason to. While not strictly
     required, it is highly recommended. A <uniqueKey> is present in almost all Solr 
     installations. See the <uniqueKey> declaration below where <uniqueKey> is set to "id".
   -->  
   <!---
   <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" /> 
        -->
   <field name="sku" type="text_en_splitting_tight" indexed="true" stored="true" omitNorms="true"/>
   <field name="name" type="text_ik" indexed="true" stored="true"/>
   <field name="manu" type="text_general" indexed="true" stored="true" omitNorms="true"/>
   <field name="cat" type="string" indexed="true" stored="true" multiValued="true"/>
   <field name="features" type="text_general" indexed="true" stored="true" multiValued="true"/>
   <field name="includes" type="text_general" indexed="true" stored="true" termVectors="true" termPositions="true" termOffsets="true" />

   <field name="weight" type="float" indexed="true" stored="true"/>
   <field name="price"  type="float" indexed="true" stored="true"/>
   <field name="popularity" type="int" indexed="true" stored="true" />
   <field name="inStock" type="boolean" indexed="true" stored="true" />

   <field name="store" type="location" indexed="true" stored="true"/>

<field name="id" type="string" indexed="true" stored="true" 
required="true" multiValued="false" /> 
<field name="item_title" type="text_ik" indexed="true" stored="true"/>
<field name="item_sell_point" type="text_ik" indexed="true" stored="true"/>
<field name="item_price" type="long" indexed="false" stored="true"/>
<field name="item_image" type="string" indexed="false" stored="true" />


<field name="item_keywords" type="text_ik" indexed="true" stored="false" 
multiValued="true"/>
<copyField source="item_title" dest="item_keywords"/>
<copyField source="item_sell_point" dest="item_keywords"/>

   <!-- Common metadata fields, named specifically to match up with
     SolrCell metadata when parsing rich documents such as Word, PDF.
     Some fields are multiValued only because Tika currently may return
     multiple values for them. Some metadata is parsed from the documents,
     but there are some which come from the client context:
       "content_type": From the HTTP headers of incoming stream
       "resourcename": From SolrCell request param resource.name
   -->
   <field name="title" type="text_general" indexed="true" stored="true" multiValued="true"/>
   <field name="subject" type="text_general" indexed="true" stored="true"/>
   <field name="description" type="text_general" indexed="true" stored="true"/>
   <field name="comments" type="text_general" indexed="true" stored="true"/>
   <field name="author" type="text_general" indexed="true" stored="true"/>
   <field name="keywords" type="text_general" indexed="true" stored="true"/>
   <field name="category" type="text_general" indexed="true" stored="true"/>
   <field name="resourcename" type="text_general" indexed="true" stored="true"/>
   <field name="url" type="text_general" indexed="true" stored="true"/>
   <field name="content_type" type="string" indexed="true" stored="true" multiValued="true"/>
   <field name="last_modified" type="date" indexed="true" stored="true"/>
   <field name="links" type="string" indexed="true" stored="true" multiValued="true"/>

   <!-- Main body of document extracted by SolrCell.
        NOTE: This field is not indexed by default, since it is also copied to "text"
        using copyField below. This is to save space. Use this field for returning and
        highlighting document content. Use the "text" field to search the content. -->
   <field name="content" type="text_general" indexed="false" stored="true" multiValued="true"/>
   

   <!-- catchall field, containing all other searchable text fields (implemented
        via copyField further on in this schema  -->
   <field name="text" type="text_general" indexed="true" stored="false" multiValued="true"/>

   <!-- catchall text field that indexes tokens both normally and in reverse for efficient
        leading wildcard queries. -->
   <field name="text_rev" type="text_general_rev" indexed="true" stored="false" multiValued="true"/>

   <!-- non-tokenized version of manufacturer to make it easier to sort or group
        results by manufacturer.  copied from "manu" via copyField -->
   <field name="manu_exact" type="string" indexed="true" stored="false"/>

   <field name="payloads" type="payloads" indexed="true" stored="true"/>


   <!--
     Some fields such as popularity and manu_exact could be modified to
     leverage doc values:
     <field name="popularity" type="int" indexed="true" stored="true" docValues="true" />
     <field name="manu_exact" type="string" indexed="false" stored="false" docValues="true" />
     <field name="cat" type="string" indexed="true" stored="true" docValues="true" multiValued="true"/>


     Although it would make indexing slightly slower and the index bigger, it
     would also make the index faster to load, more memory-efficient and more
     NRT-friendly.
     -->

   <!-- Dynamic field definitions allow using convention over configuration
       for fields via the specification of patterns to match field names. 
       EXAMPLE:  name="*_i" will match any field ending in _i (like myid_i, z_i)
       RESTRICTION: the glob-like pattern in the name attribute must have
       a "*" only at the start or the end.  -->
   
   <dynamicField name="*_i"  type="int"    indexed="true"  stored="true"/>
   <dynamicField name="*_is" type="int"    indexed="true"  stored="true"  multiValued="true"/>
   <dynamicField name="*_s"  type="string"  indexed="true"  stored="true" />
   <dynamicField name="*_ss" type="string"  indexed="true"  stored="true" multiValued="true"/>
   <dynamicField name="*_l"  type="long"   indexed="true"  stored="true"/>
   <dynamicField name="*_ls" type="long"   indexed="true"  stored="true"  multiValued="true"/>
   <dynamicField name="*_t"  type="text_general"    indexed="true"  stored="true"/>
   <dynamicField name="*_txt" type="text_general"   indexed="true"  stored="true" multiValued="true"/>
   <dynamicField name="*_en"  type="text_en"    indexed="true"  stored="true" multiValued="true"/>
   <dynamicField name="*_b"  type="boolean" indexed="true" stored="true"/>
   <dynamicField name="*_bs" type="boolean" indexed="true" stored="true"  multiValued="true"/>
   <dynamicField name="*_f"  type="float"  indexed="true"  stored="true"/>
   <dynamicField name="*_fs" type="float"  indexed="true"  stored="true"  multiValued="true"/>
   <dynamicField name="*_d"  type="double" indexed="true"  stored="true"/>
   <dynamicField name="*_ds" type="double" indexed="true"  stored="true"  multiValued="true"/>

   <!-- Type used to index the lat and lon components for the "location" FieldType -->
   <dynamicField name="*_coordinate"  type="tdouble" indexed="true"  stored="false" />

   <dynamicField name="*_dt"  type="date"    indexed="true"  stored="true"/>
   <dynamicField name="*_dts" type="date"    indexed="true"  stored="true" multiValued="true"/>
   <dynamicField name="*_p"  type="location" indexed="true" stored="true"/>

   <!-- some trie-coded dynamic fields for faster range queries -->
   <dynamicField name="*_ti" type="tint"    indexed="true"  stored="true"/>
   <dynamicField name="*_tl" type="tlong"   indexed="true"  stored="true"/>
   <dynamicField name="*_tf" type="tfloat"  indexed="true"  stored="true"/>
   <dynamicField name="*_td" type="tdouble" indexed="true"  stored="true"/>
   <dynamicField name="*_tdt" type="tdate"  indexed="true"  stored="true"/>

   <dynamicField name="*_c"   type="currency" indexed="true"  stored="true"/>

   <dynamicField name="ignored_*" type="ignored" multiValued="true"/>
   <dynamicField name="attr_*" type="text_general" indexed="true" stored="true" multiValued="true"/>

   <dynamicField name="random_*" type="random" />

   <!-- uncomment the following to ignore any fields that don't already match an existing 
        field name or dynamic field, rather than reporting them as an error. 
        alternately, change the type="ignored" to some other type e.g. "text" if you want 
        unknown fields indexed and/or stored by default --> 
   <!--dynamicField name="*" type="ignored" multiValued="true" /-->
   



 <!-- Field to use to determine and enforce document uniqueness. 
      Unless this field is marked with required="false", it will be a required field
   -->
 <uniqueKey>id</uniqueKey>

 <!-- DEPRECATED: The defaultSearchField is consulted by various query parsers when
  parsing a query string that isn't explicit about the field.  Machine (non-user)
  generated queries are best made explicit, or they can use the "df" request parameter
  which takes precedence over this.
  Note: Un-commenting defaultSearchField will be insufficient if your request handler
  in solrconfig.xml defines "df", which takes precedence. That would need to be removed.
 <defaultSearchField>text</defaultSearchField> -->

 <!-- DEPRECATED: The defaultOperator (AND|OR) is consulted by various query parsers
  when parsing a query string to determine if a clause of the query should be marked as
  required or optional, assuming the clause isn't already marked by some operator.
  The default is OR, which is generally assumed so it is not a good idea to change it
  globally here.  The "q.op" request parameter takes precedence over this.
 <solrQueryParser defaultOperator="OR"/> -->

  <!-- copyField commands copy one field to another at the time a document
        is added to the index.  It's used either to index the same field differently,
        or to add multiple fields to the same field for easier/faster searching.  -->

   <copyField source="cat" dest="text"/>
   <copyField source="name" dest="text"/>
   <copyField source="manu" dest="text"/>
   <copyField source="features" dest="text"/>
   <copyField source="includes" dest="text"/>
   <copyField source="manu" dest="manu_exact"/>

   <!-- Copy the price into a currency enabled field (default USD) -->
   <copyField source="price" dest="price_c"/>

   <!-- Text fields from SolrCell to search by default in our catch-all field -->
   <copyField source="title" dest="text"/>
   <copyField source="author" dest="text"/>
   <copyField source="description" dest="text"/>
   <copyField source="keywords" dest="text"/>
   <copyField source="content" dest="text"/>
   <copyField source="content_type" dest="text"/>
   <copyField source="resourcename" dest="text"/>
   <copyField source="url" dest="text"/>

   <!-- Create a string version of author for faceting -->
   <copyField source="author" dest="author_s"/>
    
   <!-- Above, multiple source fields are copied to the [text] field. 
      Another way to map multiple source fields to the same 
      destination field is to use the dynamic field syntax. 
      copyField also supports a maxChars to copy setting.  -->
       
   <!-- <copyField source="*_t" dest="text" maxChars="3000"/> -->

   <!-- copy name to alphaNameSort, a field designed for sorting by name -->
   <!-- <copyField source="name" dest="alphaNameSort"/> -->
 
  
    <!-- field type definitions. The "name" attribute is
       just a label to be used by field definitions.  The "class"
       attribute and any other attributes determine the real
       behavior of the fieldType.
         Class names starting with "solr" refer to java classes in a
       standard package such as org.apache.solr.analysis
    -->

    <!-- The StrField type is not analyzed, but indexed/stored verbatim.
       It supports doc values but in that case the field needs to be
       single-valued and either required or have a default value.
      -->
      <fieldType name="text_ik" class="solr.TextField">
        <analyzer class="org.wltea.analyzer.lucene.IKAnalyzer"/>
        </fieldType>
    <fieldType name="string" class="solr.StrField" sortMissingLast="true" />

    <!-- boolean type: "true" or "false" -->
    <fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/>

    <!-- sortMissingLast and sortMissingFirst attributes are optional attributes are
         currently supported on types that are sorted internally as strings
         and on numeric types.
         This includes "string","boolean", and, as of 3.5 (and 4.x),
         int, float, long, date, double, including the "Trie" variants.
       - If sortMissingLast="true", then a sort on this field will cause documents
         without the field to come after documents with the field,
         regardless of the requested sort order (asc or desc).
       - If sortMissingFirst="true", then a sort on this field will cause documents
         without the field to come before documents with the field,
         regardless of the requested sort order.
       - If sortMissingLast="false" and sortMissingFirst="false" (the default),
         then default lucene sorting will be used which places docs without the
         field first in an ascending sort and last in a descending sort.
    -->    

    <!--
      Default numeric field types. For faster range queries, consider the tint/tfloat/tlong/tdouble types.

      These fields support doc values, but they require the field to be
      single-valued and either be required or have a default value.
    -->
    <fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
    <fieldType name="float" class="solr.TrieFloatField" precisionStep="0" positionIncrementGap="0"/>
    <fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
    <fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" positionIncrementGap="0"/>

    <!--
     Numeric field types that index each value at various levels of precision
     to accelerate range queries when the number of values between the range
     endpoints is large. See the javadoc for NumericRangeQuery for internal
     implementation details.

     Smaller precisionStep values (specified in bits) will lead to more tokens
     indexed per value, slightly larger index size, and faster range queries.
     A precisionStep of 0 disables indexing at different precision levels.
    -->
    <fieldType name="tint" class="solr.TrieIntField" precisionStep="8" positionIncrementGap="0"/>
    <fieldType name="tfloat" class="solr.TrieFloatField" precisionStep="8" positionIncrementGap="0"/>
    <fieldType name="tlong" class="solr.TrieLongField" precisionStep="8" positionIncrementGap="0"/>
    <fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8" positionIncrementGap="0"/>

    <!-- The format for this date field is of the form 1995-12-31T23:59:59Z, and
         is a more restricted form of the canonical representation of dateTime
         http://www.w3.org/TR/xmlschema-2/#dateTime    
         The trailing "Z" designates UTC time and is mandatory.
         Optional fractional seconds are allowed: 1995-12-31T23:59:59.999Z
         All other components are mandatory.

         Expressions can also be used to denote calculations that should be
         performed relative to "NOW" to determine the value, ie...

               NOW/HOUR
                  ... Round to the start of the current hour
               NOW-1DAY
                  ... Exactly 1 day prior to now
               NOW/DAY+6MONTHS+3DAYS
                  ... 6 months and 3 days in the future from the start of
                      the current day
                      
         Consult the DateField javadocs for more information.

         Note: For faster range queries, consider the tdate type
      -->
    <fieldType name="date" class="solr.TrieDateField" precisionStep="0" positionIncrementGap="0"/>

    <!-- A Trie based date field for faster date range queries and date faceting. -->
    <fieldType name="tdate" class="solr.TrieDateField" precisionStep="6" positionIncrementGap="0"/>


    <!--Binary data type. The data should be sent/retrieved in as Base64 encoded Strings -->
    <fieldtype name="binary" class="solr.BinaryField"/>

    <!--
      Note:
      These should only be used for compatibility with existing indexes (created with lucene or older Solr versions).
      Use Trie based fields instead. As of Solr 3.5 and 4.x, Trie based fields support sortMissingFirst/Last

      Plain numeric field types that store and index the text
      value verbatim (and hence don't correctly support range queries, since the
      lexicographic ordering isn't equal to the numeric ordering)

      NOTE: These field types are deprecated will be completely removed in Solr 5.0!
    -->
    <!--
    <fieldType name="pint" class="solr.IntField"/>
    <fieldType name="plong" class="solr.LongField"/>
    <fieldType name="pfloat" class="solr.FloatField"/>
    <fieldType name="pdouble" class="solr.DoubleField"/>
    <fieldType name="pdate" class="solr.DateField" sortMissingLast="true"/>
    -->

    <!-- The "RandomSortField" is not used to store or search any
         data.  You can declare fields of this type it in your schema
         to generate pseudo-random orderings of your docs for sorting 
         or function purposes.  The ordering is generated based on the field
         name and the version of the index. As long as the index version
         remains unchanged, and the same field name is reused,
         the ordering of the docs will be consistent.  
         If you want different psuedo-random orderings of documents,
         for the same version of the index, use a dynamicField and
         change the field name in the request.
     -->
    <fieldType name="random" class="solr.RandomSortField" indexed="true" />

    <!-- solr.TextField allows the specification of custom text analyzers
         specified as a tokenizer and a list of token filters. Different
         analyzers may be specified for indexing and querying.

         The optional positionIncrementGap puts space between multiple fields of
         this type on the same document, with the purpose of preventing false phrase
         matching across fields.

         For more info on customizing your analyzer chain, please see
         http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
     -->

    <!-- One can also specify an existing Analyzer class that has a
         default constructor via the class attribute on the analyzer element.
         Example:
    <fieldType name="text_greek" class="solr.TextField">
      <analyzer class="org.apache.lucene.analysis.el.GreekAnalyzer"/>
    </fieldType>
    -->

    <!-- A text field that only splits on whitespace for exact matching of words -->
    <fieldType name="text_ws" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      </analyzer>
    </fieldType>

    <!-- A text type for English text where stopwords and synonyms are managed using the REST API -->
    <fieldType name="managed_en" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.ManagedStopFilterFactory" managed="english" />
        <filter class="solr.ManagedSynonymFilterFactory" managed="english" />
      </analyzer>
    </fieldType>

    <!-- A general text field that has reasonable, generic
         cross-language defaults: it tokenizes with StandardTokenizer,
     removes stop words from case-insensitive "stopwords.txt"
     (empty by default), and down cases.  At query time only, it
     also applies synonyms. -->
    <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>

    <!-- A text field with defaults appropriate for English: it
         tokenizes with StandardTokenizer, removes English stop words
         (lang/stopwords_en.txt), down cases, protects words from protwords.txt, and
         finally applies Porter's stemming.  The query time analyzer
         also applies synonyms from synonyms.txt. -->
    <fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <!-- Case insensitive stop word removal.
        -->
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="lang/stopwords_en.txt"
                />
        <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EnglishPossessiveFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
    <!-- Optionally you may want to use this less aggressive stemmer instead of PorterStemFilterFactory:
        <filter class="solr.EnglishMinimalStemFilterFactory"/>
    -->
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="lang/stopwords_en.txt"
                />
        <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EnglishPossessiveFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
    <!-- Optionally you may want to use this less aggressive stemmer instead of PorterStemFilterFactory:
        <filter class="solr.EnglishMinimalStemFilterFactory"/>
    -->
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
    </fieldType>

    <!-- A text field with defaults appropriate for English, plus
     aggressive word-splitting and autophrase features enabled.
     This field is just like text_en, except it adds
     WordDelimiterFilter to enable splitting and matching of
     words on case-change, alpha numeric boundaries, and
     non-alphanumeric chars.  This means certain compound word
     cases will work, for example query "wi fi" will match
     document "WiFi" or "wi-fi".
        -->
    <fieldType name="text_en_splitting" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <!-- Case insensitive stop word removal.
        -->
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="lang/stopwords_en.txt"
                />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="lang/stopwords_en.txt"
                />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
    </fieldType>

    <!-- Less flexible matching, but less false matches.  Probably not ideal for product names,
         but may be good for SKUs.  Can insert dashes in the wrong place and still match. -->
    <fieldType name="text_en_splitting_tight" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <filter class="solr.EnglishMinimalStemFilterFactory"/>
        <!-- this filter can remove any duplicate tokens that appear at the same position - sometimes
             possible with WordDelimiterFilter in conjuncton with stemming. -->
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>

    <!-- Just like text_general except it reverses the characters of
     each token, to enable more efficient leading wildcard queries. -->
    <fieldType name="text_general_rev" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.ReversedWildcardFilterFactory" withOriginal="true"
           maxPosAsterisk="3" maxPosQuestion="2" maxFractionAsterisk="0.33"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>

    <!-- charFilter + WhitespaceTokenizer  -->
    <!--
    <fieldType name="text_char_norm" class="solr.TextField" positionIncrementGap="100" >
      <analyzer>
        <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      </analyzer>
    </fieldType>
    -->

    <!-- This is an example of using the KeywordTokenizer along
         With various TokenFilterFactories to produce a sortable field
         that does not include some properties of the source text
      -->
    <fieldType name="alphaOnlySort" class="solr.TextField" sortMissingLast="true" omitNorms="true">
      <analyzer>
        <!-- KeywordTokenizer does no actual tokenizing, so the entire
             input string is preserved as a single token
          -->
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <!-- The LowerCase TokenFilter does what you expect, which can be
             when you want your sorting to be case insensitive
          -->
        <filter class="solr.LowerCaseFilterFactory" />
        <!-- The TrimFilter removes any leading or trailing whitespace -->
        <filter class="solr.TrimFilterFactory" />
        <!-- The PatternReplaceFilter gives you the flexibility to use
             Java Regular expression to replace any sequence of characters
             matching a pattern with an arbitrary replacement string, 
             which may include back references to portions of the original
             string matched by the pattern.
             
             See the Java Regular Expression documentation for more
             information on pattern and replacement string syntax.
             
             http://docs.oracle.com/javase/7/docs/api/java/util/regex/package-summary.html
          -->
        <filter class="solr.PatternReplaceFilterFactory"
                pattern="([^a-z])" replacement="" replace="all"
        />
      </analyzer>
    </fieldType>
    
    <fieldtype name="phonetic" stored="false" indexed="true" class="solr.TextField" >
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.DoubleMetaphoneFilterFactory" inject="false"/>
      </analyzer>
    </fieldtype>

    <fieldtype name="payloads" stored="false" indexed="true" class="solr.TextField" >
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <!--
        The DelimitedPayloadTokenFilter can put payloads on tokens... for example,
        a token of "foo|1.4"  would be indexed as "foo" with a payload of 1.4f
        Attributes of the DelimitedPayloadTokenFilterFactory : 
         "delimiter" - a one character delimiter. Default is | (pipe)
     "encoder" - how to encode the following value into a playload
        float -> org.apache.lucene.analysis.payloads.FloatEncoder,
        integer -> o.a.l.a.p.IntegerEncoder
        identity -> o.a.l.a.p.IdentityEncoder
            Fully Qualified class name implementing PayloadEncoder, Encoder must have a no arg constructor.
         -->
        <filter class="solr.DelimitedPayloadTokenFilterFactory" encoder="float"/>
      </analyzer>
    </fieldtype>

    <!-- lowercases the entire field value, keeping it as a single token.  -->
    <fieldType name="lowercase" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory" />
      </analyzer>
    </fieldType>

    <!-- 
      Example of using PathHierarchyTokenizerFactory at index time, so
      queries for paths match documents at that path, or in descendent paths
    -->
    <fieldType name="descendent_path" class="solr.TextField">
      <analyzer type="index">
    <tokenizer class="solr.PathHierarchyTokenizerFactory" delimiter="/" />
      </analyzer>
      <analyzer type="query">
    <tokenizer class="solr.KeywordTokenizerFactory" />
      </analyzer>
    </fieldType>
    <!-- 
      Example of using PathHierarchyTokenizerFactory at query time, so
      queries for paths match documents at that path, or in ancestor paths
    -->
    <fieldType name="ancestor_path" class="solr.TextField">
      <analyzer type="index">
    <tokenizer class="solr.KeywordTokenizerFactory" />
      </analyzer>
      <analyzer type="query">
    <tokenizer class="solr.PathHierarchyTokenizerFactory" delimiter="/" />
      </analyzer>
    </fieldType>

    <!-- since fields of this type are by default not stored or indexed,
         any data added to them will be ignored outright.  --> 
    <fieldtype name="ignored" stored="false" indexed="false" multiValued="true" class="solr.StrField" />

    <!-- This point type indexes the coordinates as separate fields (subFields)
      If subFieldType is defined, it references a type, and a dynamic field
      definition is created matching *___<typename>.  Alternately, if 
      subFieldSuffix is defined, that is used to create the subFields.
      Example: if subFieldType="double", then the coordinates would be
        indexed in fields myloc_0___double,myloc_1___double.
      Example: if subFieldSuffix="_d" then the coordinates would be indexed
        in fields myloc_0_d,myloc_1_d
      The subFields are an implementation detail of the fieldType, and end
      users normally should not need to know about them.
     -->
    <fieldType name="point" class="solr.PointType" dimension="2" subFieldSuffix="_d"/>

    <!-- A specialized field for geospatial search. If indexed, this fieldType must not be multivalued. -->
    <fieldType name="location" class="solr.LatLonType" subFieldSuffix="_coordinate"/>

    <!-- An alternative geospatial field type new to Solr 4.  It supports multiValued and polygon shapes.
      For more information about this and other Spatial fields new to Solr 4, see:
      http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4
    -->
    <fieldType name="location_rpt" class="solr.SpatialRecursivePrefixTreeFieldType"
        geo="true" distErrPct="0.025" maxDistErr="0.000009" units="degrees" />

    <!-- Spatial rectangle (bounding box) field. It supports most spatial predicates, and has
     special relevancy modes: score=overlapRatio|area|area2D (local-param to the query).  DocValues is required for
     relevancy. -->
    <fieldType name="bbox" class="solr.BBoxField"
        geo="true" units="degrees" numberType="_bbox_coord" />
    <fieldType name="_bbox_coord" class="solr.TrieDoubleField" precisionStep="8" docValues="true" stored="false"/>

   <!-- Money/currency field type. See http://wiki.apache.org/solr/MoneyFieldType
        Parameters:
          defaultCurrency: Specifies the default currency if none specified. Defaults to "USD"
          precisionStep:   Specifies the precisionStep for the TrieLong field used for the amount
          providerClass:   Lets you plug in other exchange provider backend:
                           solr.FileExchangeRateProvider is the default and takes one parameter:
                             currencyConfig: name of an xml file holding exchange rates
                           solr.OpenExchangeRatesOrgProvider uses rates from openexchangerates.org:
                             ratesFileLocation: URL or path to rates JSON file (default latest.json on the web)
                             refreshInterval: Number of minutes between each rates fetch (default: 1440, min: 60)
   -->
    <fieldType name="currency" class="solr.CurrencyField" precisionStep="8" defaultCurrency="USD" currencyConfig="currency.xml" />
             


   <!-- some examples for different languages (generally ordered by ISO code) -->

    <!-- Arabic -->
    <fieldType name="text_ar" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <!-- for any non-arabic -->
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ar.txt" />
        <!-- normalizes ﻯ to ﻱ, etc -->
        <filter class="solr.ArabicNormalizationFilterFactory"/>
        <filter class="solr.ArabicStemFilterFactory"/>
      </analyzer>
    </fieldType>

    <!-- Bulgarian -->
    <fieldType name="text_bg" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/> 
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_bg.txt" /> 
        <filter class="solr.BulgarianStemFilterFactory"/>       
      </analyzer>
    </fieldType>
    
    <!-- Catalan -->
    <fieldType name="text_ca" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <!-- removes l', etc -->
        <filter class="solr.ElisionFilterFactory" ignoreCase="true" articles="lang/contractions_ca.txt"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ca.txt" />
        <filter class="solr.SnowballPorterFilterFactory" language="Catalan"/>       
      </analyzer>
    </fieldType>
    
    <!-- CJK bigram (see text_ja for a Japanese configuration using morphological analysis) -->
    <fieldType name="text_cjk" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <!-- normalize width before bigram, as e.g. half-width dakuten combine  -->
        <filter class="solr.CJKWidthFilterFactory"/>
        <!-- for any non-CJK -->
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.CJKBigramFilterFactory"/>
      </analyzer>
    </fieldType>

    <!-- Kurdish -->
    <fieldType name="text_ckb" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.SoraniNormalizationFilterFactory"/>
        <!-- for any latin text -->
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ckb.txt"/>
        <filter class="solr.SoraniStemFilterFactory"/>
      </analyzer>
    </fieldType>

    <!-- Czech -->
    <fieldType name="text_cz" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_cz.txt" />
        <filter class="solr.CzechStemFilterFactory"/>       
      </analyzer>
    </fieldType>
    
    <!-- Danish -->
    <fieldType name="text_da" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_da.txt" format="snowball" />
        <filter class="solr.SnowballPorterFilterFactory" language="Danish"/>       
      </analyzer>
    </fieldType>
    
    <!-- German -->
    <fieldType name="text_de" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt" format="snowball" />
        <filter class="solr.GermanNormalizationFilterFactory"/>
        <filter class="solr.GermanLightStemFilterFactory"/>
        <!-- less aggressive: <filter class="solr.GermanMinimalStemFilterFactory"/> -->
        <!-- more aggressive: <filter class="solr.SnowballPorterFilterFactory" language="German2"/> -->
      </analyzer>
    </fieldType>
    
    <!-- Greek -->
    <fieldType name="text_el" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <!-- greek specific lowercase for sigma -->
        <filter class="solr.GreekLowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="false" words="lang/stopwords_el.txt" />
        <filter class="solr.GreekStemFilterFactory"/>
      </analyzer>
    </fieldType>
    
    <!-- Spanish -->
    <fieldType name="text_es" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_es.txt" format="snowball" />
        <filter class="solr.SpanishLightStemFilterFactory"/>
        <!-- more aggressive: <filter class="solr.SnowballPorterFilterFactory" language="Spanish"/> -->
      </analyzer>
    </fieldType>
    
    <!-- Basque -->
    <fieldType name="text_eu" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_eu.txt" />
        <filter class="solr.SnowballPorterFilterFactory" language="Basque"/>
      </analyzer>
    </fieldType>
    
    <!-- Persian -->
    <fieldType name="text_fa" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <!-- for ZWNJ -->
        <charFilter class="solr.PersianCharFilterFactory"/>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.ArabicNormalizationFilterFactory"/>
        <filter class="solr.PersianNormalizationFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_fa.txt" />
      </analyzer>
    </fieldType>
    
    <!-- Finnish -->
    <fieldType name="text_fi" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_fi.txt" format="snowball" />
        <filter class="solr.SnowballPorterFilterFactory" language="Finnish"/>
        <!-- less aggressive: <filter class="solr.FinnishLightStemFilterFactory"/> -->
      </analyzer>
    </fieldType>
    
    <!-- French -->
    <fieldType name="text_fr" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <!-- removes l', etc -->
        <filter class="solr.ElisionFilterFactory" ignoreCase="true" articles="lang/contractions_fr.txt"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_fr.txt" format="snowball" />
        <filter class="solr.FrenchLightStemFilterFactory"/>
        <!-- less aggressive: <filter class="solr.FrenchMinimalStemFilterFactory"/> -->
        <!-- more aggressive: <filter class="solr.SnowballPorterFilterFactory" language="French"/> -->
      </analyzer>
    </fieldType>
    
    <!-- Irish -->
    <fieldType name="text_ga" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <!-- removes d', etc -->
        <filter class="solr.ElisionFilterFactory" ignoreCase="true" articles="lang/contractions_ga.txt"/>
        <!-- removes n-, etc. position increments is intentionally false! -->
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/hyphenations_ga.txt"/>
        <filter class="solr.IrishLowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ga.txt"/>
        <filter class="solr.SnowballPorterFilterFactory" language="Irish"/>
      </analyzer>
    </fieldType>
    
    <!-- Galician -->
    <fieldType name="text_gl" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_gl.txt" />
        <filter class="solr.GalicianStemFilterFactory"/>
        <!-- less aggressive: <filter class="solr.GalicianMinimalStemFilterFactory"/> -->
      </analyzer>
    </fieldType>
    
    <!-- Hindi -->
    <fieldType name="text_hi" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <!-- normalizes unicode representation -->
        <filter class="solr.IndicNormalizationFilterFactory"/>
        <!-- normalizes variation in spelling -->
        <filter class="solr.HindiNormalizationFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_hi.txt" />
        <filter class="solr.HindiStemFilterFactory"/>
      </analyzer>
    </fieldType>
    
    <!-- Hungarian -->
    <fieldType name="text_hu" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_hu.txt" format="snowball" />
        <filter class="solr.SnowballPorterFilterFactory" language="Hungarian"/>
        <!-- less aggressive: <filter class="solr.HungarianLightStemFilterFactory"/> -->   
      </analyzer>
    </fieldType>
    
    <!-- Armenian -->
    <fieldType name="text_hy" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_hy.txt" />
        <filter class="solr.SnowballPorterFilterFactory" language="Armenian"/>
      </analyzer>
    </fieldType>
    
    <!-- Indonesian -->
    <fieldType name="text_id" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_id.txt" />
        <!-- for a less aggressive approach (only inflectional suffixes), set stemDerivational to false -->
        <filter class="solr.IndonesianStemFilterFactory" stemDerivational="true"/>
      </analyzer>
    </fieldType>
    
    <!-- Italian -->
    <fieldType name="text_it" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <!-- removes l', etc -->
        <filter class="solr.ElisionFilterFactory" ignoreCase="true" articles="lang/contractions_it.txt"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_it.txt" format="snowball" />
        <filter class="solr.ItalianLightStemFilterFactory"/>
        <!-- more aggressive: <filter class="solr.SnowballPorterFilterFactory" language="Italian"/> -->
      </analyzer>
    </fieldType>
    
    <!-- Japanese using morphological analysis (see text_cjk for a configuration using bigramming)

         NOTE: If you want to optimize search for precision, use default operator AND in your query
         parser config with <solrQueryParser defaultOperator="AND"/> further down in this file.  Use 
         OR if you would like to optimize for recall (default).
    -->
    <fieldType name="text_ja" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="false">
      <analyzer>
      <!-- Kuromoji Japanese morphological analyzer/tokenizer (JapaneseTokenizer)

           Kuromoji has a search mode (default) that does segmentation useful for search.  A heuristic
           is used to segment compounds into its parts and the compound itself is kept as synonym.

           Valid values for attribute mode are:
              normal: regular segmentation
              search: segmentation useful for search with synonyms compounds (default)
            extended: same as search mode, but unigrams unknown words (experimental)

           For some applications it might be good to use search mode for indexing and normal mode for
           queries to reduce recall and prevent parts of compounds from being matched and highlighted.
           Use <analyzer type="index"> and <analyzer type="query"> for this and mode normal in query.

           Kuromoji also has a convenient user dictionary feature that allows overriding the statistical
           model with your own entries for segmentation, part-of-speech tags and readings without a need
           to specify weights.  Notice that user dictionaries have not been subject to extensive testing.

           User dictionary attributes are:
                     userDictionary: user dictionary filename
             userDictionaryEncoding: user dictionary encoding (default is UTF-8)

           See lang/userdict_ja.txt for a sample user dictionary file.

           Punctuation characters are discarded by default.  Use discardPunctuation="false" to keep them.

           See http://wiki.apache.org/solr/JapaneseLanguageSupport for more on Japanese language support.
        -->
        <tokenizer class="solr.JapaneseTokenizerFactory" mode="search"/>
        <!--<tokenizer class="solr.JapaneseTokenizerFactory" mode="search" userDictionary="lang/userdict_ja.txt"/>-->
        <!-- Reduces inflected verbs and adjectives to their base/dictionary forms (辭書形) -->
        <filter class="solr.JapaneseBaseFormFilterFactory"/>
        <!-- Removes tokens with certain part-of-speech tags -->
        <filter class="solr.JapanesePartOfSpeechStopFilterFactory" tags="lang/stoptags_ja.txt" />
        <!-- Normalizes full-width romaji to half-width and half-width kana to full-width (Unicode NFKC subset) -->
        <filter class="solr.CJKWidthFilterFactory"/>
        <!-- Removes common tokens typically not useful for search, but have a negative effect on ranking -->
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ja.txt" />
        <!-- Normalizes common katakana spelling variations by removing any last long sound character (U+30FC) -->
        <filter class="solr.JapaneseKatakanaStemFilterFactory" minimumLength="4"/>
        <!-- Lower-cases romaji characters -->
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>
    
    <!-- Latvian -->
    <fieldType name="text_lv" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_lv.txt" />
        <filter class="solr.LatvianStemFilterFactory"/>
      </analyzer>
    </fieldType>
    
    <!-- Dutch -->
    <fieldType name="text_nl" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_nl.txt" format="snowball" />
        <filter class="solr.StemmerOverrideFilterFactory" dictionary="lang/stemdict_nl.txt" ignoreCase="false"/>
        <filter class="solr.SnowballPorterFilterFactory" language="Dutch"/>
      </analyzer>
    </fieldType>
    
    <!-- Norwegian -->
    <fieldType name="text_no" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_no.txt" format="snowball" />
        <filter class="solr.SnowballPorterFilterFactory" language="Norwegian"/>
        <!-- less aggressive: <filter class="solr.NorwegianLightStemFilterFactory" variant="nb"/> -->
        <!-- singular/plural: <filter class="solr.NorwegianMinimalStemFilterFactory" variant="nb"/> -->
        <!-- The "light" and "minimal" stemmers support variants: nb=Bokmål, nn=Nynorsk, no=Both -->
      </analyzer>
    </fieldType>
    
    <!-- Portuguese -->
    <fieldType name="text_pt" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_pt.txt" format="snowball" />
        <filter class="solr.PortugueseLightStemFilterFactory"/>
        <!-- less aggressive: <filter class="solr.PortugueseMinimalStemFilterFactory"/> -->
        <!-- more aggressive: <filter class="solr.SnowballPorterFilterFactory" language="Portuguese"/> -->
        <!-- most aggressive: <filter class="solr.PortugueseStemFilterFactory"/> -->
      </analyzer>
    </fieldType>
    
    <!-- Romanian -->
    <fieldType name="text_ro" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ro.txt" />
        <filter class="solr.SnowballPorterFilterFactory" language="Romanian"/>
      </analyzer>
    </fieldType>
    
    <!-- Russian -->
    <fieldType name="text_ru" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ru.txt" format="snowball" />
        <filter class="solr.SnowballPorterFilterFactory" language="Russian"/>
        <!-- less aggressive: <filter class="solr.RussianLightStemFilterFactory"/> -->
      </analyzer>
    </fieldType>
    
    <!-- Swedish -->
    <fieldType name="text_sv" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_sv.txt" format="snowball" />
        <filter class="solr.SnowballPorterFilterFactory" language="Swedish"/>
        <!-- less aggressive: <filter class="solr.SwedishLightStemFilterFactory"/> -->
      </analyzer>
    </fieldType>
    
    <!-- Thai -->
    <fieldType name="text_th" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.ThaiTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_th.txt" />
      </analyzer>
    </fieldType>
    
    <!-- Turkish -->
    <fieldType name="text_tr" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.ApostropheFilterFactory"/>
        <filter class="solr.TurkishLowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="false" words="lang/stopwords_tr.txt" />
        <filter class="solr.SnowballPorterFilterFactory" language="Turkish"/>
      </analyzer>
    </fieldType>
  
  <!-- Similarity is the scoring routine for each document vs. a query.
       A custom Similarity or SimilarityFactory may be specified here, but 
       the default is fine for most applications.  
       For more info: http://wiki.apache.org/solr/SchemaXml#Similarity
    -->
  <!--
     <similarity class="com.example.solr.CustomSimilarityFactory">
       <str name="paramkey">param value</str>
     </similarity>
    -->

</schema>
View Code

 

3.2.5.4 測試

3.3Solr 管理頁面操做

 

3.3.1Dashboard(儀表盤)

 
訪問 http://localhost:8080/solr 時,出現該主頁面,可查看到 solr 運行時間、solr 版本,
系統內存、虛擬機內存的使用狀況

3.3.2Logging(日誌)

 
顯示 solr 運行出現的異常或錯誤
3.3.3Core Admin (core 管理)
 
主要有 Add Core(添加核心),Unload(卸載核心),Rename(重命名核心),Reload(重
新加載核心),Optimize(優化索引庫)
Add Core 是添加 core : 主 要 是 在 instanceDir 對 應 的 文 件 夾 裏 生 成 一 個
core.properties 文件
name:給 core 起的名字;
instanceDir:與咱們在配置 solr 到 tomcat 裏時的 solr_home 裏新建的 core
文件夾名一致;
dataDir:確認 Add Core 時,會在 new_core 目錄下生成名爲 data 的文件夾
config:new_core 下的 conf 下的 config 配置文件(solrconfig.xml)
schema: new_core 下的 conf 下的 schema 文件(schema.xml) 
 

3.3.4Java Properties

 
可查看到 java 相關的一些屬性的信息

3.3.5Thread Dump

 
查看每一個線程的詳細信息,以及狀態信息
 

3.3.6Core Selecter(core 選擇器)

3.3.6.1 overview(概覽)

 
包含基本統計如當前文檔數;和實例信息如當前核心的配置目錄

3.3.6.2 Analysis(分析)

檢驗分詞效果java

 

 

3.3.6.3 Dataimport(導入數據)

3.3.6.4 Documents

 
Documents (索引文檔)索引的相關操做,如:增長,修改,刪除等
在以下頁面,選擇/update ,文檔格式選擇 json ,而後 submit 提交。這樣 索引就增
加上了。修改與增長同樣,都是/update ,刪除爲/delete 。 成功以後,咱們去 query 裏查詢數據
就能查到咱們剛添加的數據.
 

3.3.6.5 Files 文件夾

 
solr_home 下的 core 下的 conf 下的相關文件,可單擊查看裏面的內容
3.3.6.6 Ping
 
查看當前核心庫仍是否工做的以及響應時間

3.3.6.7 Plugins /stats

 
Solr 自帶的一些插件以及咱們安裝的插件的信息以及統計
3.3.6.8 Query(查詢頁面)
 
查詢的結果要顯示哪一個字段,就得將 schema.xml 文件配置字段時的 stored 屬性設爲 true

Request-Handler(qt): 請求處理器

 
q: 查詢字符串(必須的)。:表示查詢全部;keyword:尚學堂 表示按關鍵字「尚學堂」
查詢
fq: filter query 過濾查詢。使用 Filter Query 能夠充分利用 Filter Query Cache,提升檢索
性能。做用:在 q 查詢符合結果中同時是 fq 查詢符合的(相似求交集),例如:
q=mm&fq=date_time:[20081001 TO 20091031],找關鍵字 mm,而且 date_time 是 20081001
到 20091031 之間的。
sort: 排序。格式以下:字段名 排序方式;如 id desc 表示按 id 字段降序排列查詢結果。
start,rows:表示查回結果從第幾條數據開始顯示,共顯示多少條。
fl: field list。指定查詢結果返回哪些字段。多個時以空格「 」或逗號「,」分隔。不指
定時,默認全返回。
 
df: default field 默認的查詢字段,通常默認指定。
Raw Query Parameters: 原始查詢參數的
wt: write type。指定查詢輸出結果格式,咱們經常使用的有 json 格式與 xml 格式。在
solrconfig.xml 中定義了查詢輸出格式:xml、json、python、ruby、php、csv。
indent: 返回的結果是否縮進,默認關閉,用 indent=true | on 開啓,通常調試
json,php,phps,ruby 輸出纔有必要用這個參數。
debugQuery: 設置返回結果是否顯示 Debug 信息。
dismax:
edismax:
hl: high light 高亮。hl=true 表示啓用高亮
hl.fl : 用空格或逗號隔開的字段列表(指定高亮的字段)。要啓用某個字段的
highlight 功能,就得保證該字段在 schema 中是 stored。
hl.simple.pre: 設置高亮顯示的 html 標記的開始標記
hl.simple.post:設置高亮顯示的 html 標記的結束標記
hl.requireFieldMatch: 若是置爲 true,除非該字段的查詢結果不爲空纔會被高亮。它
的默認值是 false,意味 着它可能匹配某個字段卻高亮一個不一樣的字段。若是 hl.fl 使用了通
配符,那麼就要啓用該參數。儘管如此,若是你的查詢是 all 字段(多是使用 copy-field 指
令),那麼仍是把它設爲 false,這樣搜索結果能代表哪一個字段的查詢文本未被找到
hl.usePhraseHighlighter:若是一個查詢中含有短語(引號框起來的)那麼會保證一
定要徹底匹配短語的纔會被高亮。
hl.highlightMultiTerm:若是使用通配符和模糊搜索,那麼會確保與通配符匹配的
term 會高亮。默認爲 false,同時 hl.usePhraseHighlighter 要爲 true。
facet:分組統計,在搜索關鍵字的同時,可以按照 Facet 的字段進行分組並統計。
facet.query:Facet Query 利用相似於 filter query 的語法提供了更爲靈活的 Facet.通
過 facet.query 參數,能夠對任意字段進行篩選。
facet.field:須要分組統計的字段,能夠多個。
facet.prefix: 表示 Facet 字段值的前綴。好比 facet.field=cpu&facet.prefix=Intel,
那麼對 cpu 字段進行 Facet 查詢,返回的 cpu 都是以 Intel 開頭的, AMD 開頭的 cpu 型號將
不會被統計在內。
spatial:
spellcheck:  拼寫檢查。
 

3.3.6.9 Replication

 
顯示你當前 Core 的副本,並提供 disable/enable 功能

3.3.6.10 Schema

 
展現該 Core 的 shema.xml 文件中的內容

3、 在索引庫中定義業務字段

 
1 表結構
CREATE TABLE `tb_item` (
`id` bigint(20) NOT NULL COMMENT '商品 id,同時也是商品編號',
`title` varchar(100) NOT NULL COMMENT '商品標題',
`sell_point` varchar(500) DEFAULT NULL COMMENT '商品賣點',
`price` bigint(20) NOT NULL COMMENT '商品價格,單位爲:分',
`num` int(10) NOT NULL COMMENT '庫存數量',
`barcode` varchar(30) DEFAULT NULL COMMENT '商品條形碼',
`image` varchar(500) DEFAULT NULL COMMENT '商品圖片',
`cid` bigint(10) NOT NULL COMMENT '所屬類目,葉子類目',
`status` tinyint(4) NOT NULL DEFAULT '1' COMMENT '商品狀態,1-正常,2-下架,
3-刪除',
`created` datetime NOT NULL COMMENT '建立時間',
`updated` datetime NOT NULL COMMENT '更新時間',
PRIMARY KEY (`id`),
KEY `cid` (`cid`),
KEY `status` (`status`),
KEY `updated` (`updated`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='商品表';

 

 

2 定義域

<field name="id" type="string" indexed="true" stored="true" 
required="true" multiValued="false" /> 
<field name="item_title" type="text_ik" indexed="true" stored="true"/>
<field name="item_sell_point" type="text_ik" indexed="true" stored="true"/>
<field name="item_price" type="long" indexed="false" stored="true"/>
<field name="item_image" type="string" indexed="false" stored="true" />

3 定義默認檢索域

<field name="item_keywords" type="text_ik" indexed="true" stored="false" 
multiValued="true"/>
<copyField source="item_title" dest="item_keywords"/>
<copyField source="item_sell_point" dest="item_keywords"/>

4、 SolrJ 的使用

 

1 什麼是 SolrJ

 
solrJ 是訪問 Solr 服務的 JAVA 客戶端,提供索引和搜索的請求方法,SolrJ 一般嵌入在
業務系統中,經過 solrJ 的 API 接口操做 Solr 服務。
 

2 測試 SolrJ

 

2.1建立項目

2.2修改 POM 文件添加 SolrJ 座標

<project
xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instan
ce"
xsi:schemaLocation="http://maven.apache.org/P
OM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.bjsxt</groupId> <artifactId>solrJDemo</artifactId> <version>0.0.1-SNAPSHOT</version> <dependencies> <dependency> <groupId>org.apache.solr</groupId> <artifactId>solr-solrj</artifactId> <version>4.10.3</version>
</dependency> <dependency> <groupId>commons-logging</groupId> <artifactId>commons-logging</artifactId> <version>1.2</version>
</dependency>
</dependencies>
</project>

2.3向索引庫中添加文檔

/**
* 向索引庫中添加文檔
* @throws Exception 
* @throws SolrServerException 
*/
public static void solrInsert() throws
Exception{
//建立一個 solrj 的連接對象
SolrServer server = new
HttpSolrServer("http://192.168.70.147:8080/solr");
//建立一個 Solr 的文檔對象
SolrInputDocument docu = new
SolrInputDocument();
//向文檔對象中添加須要插入到索引庫的內容
docu.addField("id", "OldLu");
docu.addField("item_title", "老好了");
docu.addField("item_price",1000);
//將文檔插入到 solr 的索引庫中
server.add(docu);
//事務的提交
server.commit();
}

 

2.4刪除索引庫文檔

/**
* 刪除索引庫中的文檔
* @throws Exception 
* @throws SolrServerException 
*/
public static void solrDelete() throws
Exception{
//建立一個 solrj 的連接對象
SolrServer server = new
HttpSolrServer("http://192.168.70.147:8080/solr");
//給定刪除條件
//1.根據主鍵刪除
//server.deleteById("test");
//2.根據查詢刪除
server.deleteByQuery("*:*");
server.commit();
}

2.5查詢索引庫中的文檔

/**
* 查詢索引庫
* @throws Exception 
*/
public static void solrSearch() throws
Exception{
//建立一個 solrj 的連接對象
SolrServer server = new
HttpSolrServer("http://192.168.70.147:8080/solr");
//建立查詢條件
SolrQuery query = new SolrQuery();
query.setQuery("老好了 0");
query.set("df", "item_keywords");
//設置分頁
query.setStart(0);
query.setRows(10);
//執行查詢
//QueryResponse:封裝查詢的結果集
QueryResponse res = server.query(query);
SoloDocumentList docu =res.getResults();
//list.getNumFound() 數據總條數
System.out.println("總條數:
"+docu.getNumFound());
for(SolrDocument var :docu){
System.out.println(var.get("item_title"));
System.out.println(var.get("item_price"));
} }
View Code

 

5、 Solr 集羣(SolrCloud)
1 什麼是 SolrCloud
SolrCloud(solr 雲)是 Solr 提供的分佈式搜索方案,當你須要大規模,容錯,分佈
式索引和檢索能力時使用 SolrCloud。當一個系統的索引數據量少的時候是不須要使用
SolrCloud 的,當索引量很大,搜索請求併發很高,這時須要使用 SolrCloud 來知足這些
需求。
SolrCloud 是基於 Solr 和 Zookeeper 的分佈式搜索方案,它的主要思想是使用
Zookeeper 做爲集羣的配置信息中心。
它有幾個特點功能:
1)集中式的配置信息
2)自動容錯
3)近實時搜索
4)查詢時自動負載均衡2 Solr 集羣結構圖
 
 
2 Solr 集羣結構圖

3 Solr 集羣搭建設計

 
本次集羣採用僞集羣的方式進行安裝,若是是真正的生產環境,建議搭建真實集羣。
SolrCloud 結構圖以下:

4 安裝 Solr 集羣環境

 

4.1需求:

 
1) 在 192.168.70.147 環境中安裝 zookeeper 集羣(已安裝)
2) 建立 4 個 tomcat 實例,修改其端口爲 8080-8083
3) 使用已安裝好的單機版 solr 做爲集羣的節點使用
 

4.2建立 solrcloud 目錄

 
mkdir solrcloud
 

4.3安裝 Zookeeper 集羣

略.......
 

4.4安裝

 
4 個 tomcat 實例並將 tomcat 與索引庫拷貝到 solrCloud
 
目錄中

4.5修改 tomcat 端口

 

 

 

 

 

 

 

4.6修改 solr 服務中指向 solr 索引庫的路徑

  /usr/local/solrCloud/tomcat1/webapps/solr/WEB-INF/web.xmlnode

 

 

 

5 建立集羣

 

5.1上傳索引庫配置文件

 
把 solrhome 中的配置文件上傳到 zookeeper 集羣。使用 zookeeper 的客戶端上傳

5.3修改每一臺 solr 的 tomcat 的 bin 目錄下 catalina.sh 文件中
加入 DzkHost 指定 zookeeper 服務器地址
JAVA_OPTS="-DzkHost=192.168.70.147:2181,192.168.70.147:2182,192.168.70.14
7:2183"
(可使用 vim 的查找功能查找到 JAVA_OPTS 的定義的位置,而後添加)
注意不能含有空格
 

5.4啓動 tomcat

5.5建立新的邏輯索引庫並分片
 
建立一個新的 collection,並分兩片,每片是一主一備。
使用如下命令建立:
http://192.168.70.147:8080/solr/admin/collections?action=CREATE&name=collec
tion2&numShards=2&replicationFactor=2
 
5.6刪除原來的邏輯索引庫
http://192.168.70.147:8080/solr/admin/collections?action=DELETE&name=collec
tion1
5.7測試 SolrJ 操做 Solr 集羣
 
5.7.1在集羣中添加文檔
/**
* 向集羣的索引庫中添加文檔
*/
public static void solrCloudInsert()throws
Exception{
//zookeeper 地址
String zkHost = 
"192.168.70.147:2181,192.168.70.147:2182,192.168.7
0.147:2183";
//建立 solrCloud 對象
CloudSolrServer cloud = new
CloudSolrServer(zkHost);
//給定索引庫
cloud.setDefaultCollection("collection2");
//建立 solr 文檔對象
SolrInputDocument docu = new
SolrInputDocument();
docu.addField("id", "OldLu");
docu.addField("item_title", "老好了");
docu.addField("item_price",1000);
cloud.add(docu);
cloud.commit();
cloud.shutdown();
}
View Code
5.7.2刪除集羣中的文檔
/**
* 刪除集羣中的文檔
*/
public static void solrCloudDel()throws
Exception{
//zookeeper 地址
String zkHost = 
"192.168.70.147:2181,192.168.70.147:2182,192.168.7
0.147:2183";
//建立 solrCloud 對象
CloudSolrServer cloud = new
CloudSolrServer(zkHost);
cloud.setDefaultCollection("collection2");
cloud.deleteByQuery("*:*");
cloud.commit();
cloud.shutdown();
}

5.7.3查詢集羣中的文檔

/**
* 查詢集羣中的文檔
*/
public static void solrCloudSearch()throws
Exception{
//zookeeper 地址
String zkHost = 
"192.168.70.147:2181,192.168.70.147:2182,192.168.7
0.147:2183";
//建立 solrCloud 對象
CloudSolrServer cloud = new
CloudSolrServer(zkHost);
cloud.setDefaultCollection("collection2");
//常見查詢對象
SolrQuery query = new SolrQuery();
query.setQuery("老好了");
query.set("df", "item_keywords");
query.setStart(0);
query.setRows(10);
//開始查詢
QueryResponse res = cloud.query(query);
list = res.getResults();
System.out.println("總條數:
"+list.getNumFound());
for var:list){
System.out.println(var.get("item_title"));
System.out.println(var.get("item_price"));
} }
View Code

6、 Solr 實戰案例

 
1 案例需求:
1) 使用技術 springMVC+Spring+Mybatis+solrJ
2) 將 mysql 中的 tb_item 表中的部分業務數據導入到 solr 的索引庫中
3) 提供一個搜索頁面,在搜索頁面中完成數據搜索
2 建立實戰項目
2.1建立項目

2.2修改 POM 文件添加依賴

<project
xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instan
ce"
xsi:schemaLocation="http://maven.apache.org/POM/4.
0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <parent> <groupId>com.bjsxt</groupId> <artifactId>parent</artifactId> <version>0.0.1-SNAPSHOT</version>
</parent> <groupId>com.bjsxt</groupId> <artifactId>solrDemo2</artifactId> <version>0.0.1-SNAPSHOT</version> <packaging>war</packaging> <dependencies> <dependency> <groupId>org.apache.solr</groupId> <artifactId>solr-solrj</artifactId>
</dependency>
<!-- 單元測試 --> <dependency>
<groupId>junit</groupId> <artifactId>junit</artifactId>
</dependency>
<!-- 日誌處理 --> <dependency> <groupId>org.slf4j</groupId> <artifactId>slf4j-log4j12</artifactId>
</dependency>
<!-- Mybatis --> <dependency> <groupId>org.mybatis</groupId> <artifactId>mybatis</artifactId>
</dependency> <dependency> <groupId>org.mybatis</groupId> <artifactId>mybatis-spring</artifactId>
</dependency>
<!-- MySql --> <dependency> <groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
</dependency>
<!-- 鏈接池 --> <dependency> <groupId>com.alibaba</groupId> <artifactId>druid</artifactId>
</dependency>
<!-- Spring --> <dependency> <groupId>org.springframework</groupId> <artifactId>spring-context</artifactId>
</dependency> <dependency> <groupId>org.springframework</groupId> <artifactId>spring-beans</artifactId>
</dependency> <dependency>
<groupId>org.springframework</groupId> <artifactId>spring-webmvc</artifactId>
</dependency> <dependency> <groupId>org.springframework</groupId> <artifactId>spring-jdbc</artifactId>
</dependency> <dependency> <groupId>org.springframework</groupId> <artifactId>spring-aspects</artifactId>
</dependency>
<!-- JSP 相關 --> <dependency> <groupId>jstl</groupId> <artifactId>jstl</artifactId>
</dependency> <dependency> <groupId>javax.servlet</groupId>
<artifactId>servlet-api</artifactId> <scope>provided</scope>
</dependency> <dependency> <groupId>javax.servlet</groupId> <artifactId>jsp-api</artifactId> <scope>provided</scope>
</dependency>
</dependencies> <build> <resources> <resource> <directory>src/main/java</directory> <includes> <include>**/*.xml</include>
</includes>
</resource> <resource> <directory>src/main/resources</directory> <includes>
<include>**/*.xml</include> <include>**/*.properties</include>
</includes>
</resource>
</resources>
<!-- tomcat 插件,因爲子項目不必定每一個都是 web
項目,因此該插件只是聲明,並未開啓 --> <plugins>
<!-- 配置 Tomcat 插件 --> <plugin> <groupId>org.apache.tomcat.maven</groupId> <artifactId>tomcat7-maven-plugin</artifactId> <configuration> <path>/</path> <port>8080</port>
</configuration>
</plugin>
</plugins>
</build>
</project>
View Code

2.3框架整合

2.4測試整合

3 Spring 整合 SolrJ

 

3.1建立 application-solrj.xml

<?xml version="1.0" encoding="UTF-8"?>
<beans
xmlns="http://www.springframework.org/schema/beans
"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-i
nstance"
xmlns:p="http://www.springframework.org/schema/p"
xmlns:context="http://www.springframework.org
/schema/context"
xmlns:mvc="http://www.springframework.org/sch
ema/mvc"
xsi:schemaLocation="http://www.springframewor
k.org/schema/beans 
http://www.springframework.org/schema/beans/spring
-beans.xsd
http://www.springframework.org/schema/mvc 
http://www.springframework.org/schema/mvc/spring-m
vc-4.0.xsd
http://www.springframework.org/schema/context 
http://www.springframework.org/schema/context/spri
ng-context.xsd">
<!-- 整合單機版 Solr -->
<!-- <bean id="httpSolrServer" 
class="org.apache.solr.client.solrj.impl.HttpSolrS
erver">
<constructor-arg name="baseURL">
<value>${SOLR_SERVICE_URL}</value>
</constructor-arg>
</bean> -->
<!-- 整合 solr 集羣 --> <bean
class="org.apache.solr.client.solrj.impl.CloudSolr
Server"><constructor-arg name="zkHost"> <value>${SOLR_CLOUD_SERVICE_URL}</value>
</constructor-arg> <property name="defaultCollection"> <value>${DEFAULT_COLLECTION}</value>
</property>
</bean>
</beans>
View Code

3.2編寫測試代碼

/**
* 測試 Spring 整合 SolrCloud
*/
@Test
public void solrCloudTest()throws Exception{
ApplicationContext ac = new
ClassPathXmlApplicationContext("classpath:spring/a
pplication*");
CloudSolrServer cloud = 
ac.getBean(CloudSolrServer.class);
SolrInputDocument docu = new
SolrInputDocument();
//向文檔對象中添加須要插入到索引庫的內容
docu.addField("id", "OldLu");
docu.addField("item_title", "老好了");
docu.addField("item_price",1000);
//將文檔插入到 solr 的索引庫中
cloud.add(docu);
cloud.commit();
cloud.shutdown();
}
View Code
4 將 tb_item 表中的數據導入到 Solr 的索引庫中
4.1建立導入數據 Service
@Service
public class ImportItemServiceImpl implements
ImportItemService {
@Autowired
private ItemMapper itemMapper;
@Autowired
private CloudSolrServer cloudServer;
/**
* 導入數據
*/
@Override
public void importItem() {
try{
//查詢數據庫中的數據
List<TbItem> list = 
this.itemMapper.findAll();
List<SolrInputDocument> result = new
ArrayList<>();
//模型轉換
for(TbItem item:list){
SolrInputDocument docu = new
SolrInputDocument();
docu.setField("id", item.getId()+"");
docu.setField("item_title", 
item.getTitle());
docu.setField("item_sell_point", 
item.getSell_Point());
docu.setField("item_price", 
item.getPrice());
docu.setField("item_image", 
item.getImage());
result.add(docu);
}
this.cloudServer.add(result);
this.cloudServer.commit();
}catch(Exception e){
e.printStackTrace();
} } }
View Code
4.2建立導入數據 Controller
/**
* 處理導入數據的 Controller
* @author Administrator
*
*/
@Controller
@RequestMapping("/import")
public class ImportDataController {
@Autowired
private ImportItemService importItemService;
@RequestMapping("/importData")
public String importData(){
this.importItemService.importItem();
return "ok"; } }
View Code

4.3測試

5 實現搜索業務
 
5.1建立搜索頁面
<%@ page language="java" contentType="text/html; 
charset=UTF-8"
pageEncoding="UTF-8"%>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 
Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html> <head> <meta http-equiv="Content-Type"
content="text/html; charset=UTF-8"> <title>Insert title here</title>
</head> <body> <form action="/search/searchItem"
method="post"> <input type="text" name="query"/>
<input type="submit" value="Search"/>
</form>
</body>
</html>
View Code
5.2建立 SolrDao
/**
* 查詢 solr
* @author Administrator
*
*/
@Repository
public class SolrDaoImpl implements SolrDao {
@Autowired
private CloudSolrServer cloudSolrServer;
@Override
public PageResult searchItem(SolrQuery 
solrQuery)throws Exception {
QueryResponse res = 
this.cloudSolrServer.query(solrQuery);
SolrDocumentList list = res.getResults();
//處理結果集
PageResult page = new PageResult();
//總條數
page.setTotalNum(list.getNumFound());
List<TbItem> items = new ArrayList<>();
//取高亮信息
Map<String,Map<String, List<String>> > hl = 
res.getHighlighting();
//模型轉換
for(SolrDocument var :list){
TbItem item = new TbItem();
item.setId(Long.parseLong((String)var.get("id")
));
item.setImage((String)var.get("item_image"));
item.setSell_Point((String)var.get("item_sell_p
oint"));
item.setPrice(Long.parseLong((String)var.get("i
tem_price")));
List<String> h = 
hl.get(var.get("id")).get("item_tile");
String title = "";
if(h != null && h.size() > 0){
title = h.get((0));
}else{
title= (String)var.get("item_title");
}
item.setTitle(title);
items.add(item);
}
page.setResult(items);
return page; } }
View Code
5.3建立搜索 Service
/**
* 完成 solr 搜索業務
* @author Administrator
*
*/
@Service
public class SearchItemServiceImpl implements
SearchItemService {
@Autowired
private SolrDao solrDao;
@Override
public PageResult searchItem(String query, 
Integer page, Integer rows)throws Exception {
//建立查詢條件
SolrQuery solrQuery = new SolrQuery();
//添加查詢條件
solrQuery.setQuery(query);
//默認域
solrQuery.set("df", "item_keywords");
//設置分頁
solrQuery.setStart((page-1)*rows);
solrQuery.setRows(rows);
//設置高亮
solrQuery.setHighlight(true);
solrQuery.addHighlightField("item_title");
//設置高亮樣式
solrQuery.setHighlightSimplePre("<em 
style='color:red;'>");
solrQuery.setHighlightSimplePost("</em>");
//調用 SolrDao
PageResult result = 
this.solrDao.searchItem(solrQuery);
//補齊數據
//當前頁
result.setPageIndex(page);
//總頁數
Long total = result.getTotalNum()/rows;
if(result.getTotalNum() % rows > 0){
total++;
}
result.setTotalPage(total);
return result; } }
View Code
5.4建立搜索 Controller 
/**
* 處理商品搜索 Controller
* @author Administrator
*
*/
@Controller
@RequestMapping("/search")
public class SearchController {
@Autowired
private SearchItemService 
searchItemService;
/**
* 商品搜索
*/
@RequestMapping("/searchItem")
public String searchItem(String 
query,@RequestParam(value="page",defaultValue="
1")Integer page, 
@RequestParam(value="rows",defaultValue="20")In
teger rows,Model model){
try{
PageResult result = 
this.searchItemService.searchItem(query, page, 
rows);
model.addAttribute("result", result);
}catch(Exception e){
e.printStackTrace();
}
return "showItem"; }
}
View Code
5.5建立展現搜索結果頁面
<%@ page language="java" contentType="text/html; 
charset=UTF-8"
pageEncoding="UTF-8"%>
<%@ taglib prefix="c"
uri="http://java.sun.com/jsp/jstl/core" %>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 
Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd"> <html> <head> <meta http-equiv="Content-Type"
content="text/html; charset=UTF-8"> <title>Insert title here</title>
</head> <body> <span>當前頁:${result.pageIndex }</span> <span>總頁數:${result.totalPage }</span> <span>總條數:${result.totalNum }</span>
<table align="center" border="1"> <c:forEach items="${result.result }"
var="item"><tr><td>${item.id }</td> <td>${item.title }</td> <td>${item.sell_Point }</td> <td>${item.price }</td> <td>${item.image }</td>
</tr>
</c:forEach>
</table>
</body>
</html>
View Code
5.6測試

 刪除節點:https://blog.csdn.net/fellhair/article/details/82429100python

 

 

刪除solr集羣某個分支的一個節點mysql

 http://10.100.10.130:6060/solr/admin/collections?action=DELETEREPLICA&collection=bjsxt&shard=shard1&replica=core_node6
 
刪除solr集羣
http://10.100.10.130:6060/solr/admin/collections?action=DELETE&name=solrCore
相關文章
相關標籤/搜索