（五）solr7.1.0之solrJ的使用

時間 2019-12-11

標籤 solr7.1.0 solr solrj 使用简体版

原文原文鏈接

（五）solr7.1.0之solrJ的使用

下面是solr7的官網API介紹：html

網頁翻譯的不是很準確，只能瞭解個大概，基本能獲取以下信息：java

1、構建和運行SolrJ應用程序

　　對於用Maven構建的項目, pom.xml配置: mysql

<dependency>
  <groupId>org.apache.solr</groupId>
  <artifactId>solr-solrj</artifactId>
  <version>7.1.0</version>
</dependency>

若是不用maven構建項目，只須要將solr-solrj-7.1.0.jar 和在dist/solrj-lib目錄中的依賴包放入到工程中。 sql

2、solr7 API

　　在solr5系以後跟solr4最大的區別是被髮布成了一個獨立的應用。而再也不須要tomcat等容器。在其內部集成了jetty服務器，他能夠經過bin目錄的腳本直接運行啓動。solr5有兩種運行模式，獨立模式和雲模式，獨立模式是以core來管理，雲模式是以collection來管理。apache

　　SolrClient是一個抽象類，下邊有不少被實現的子類，HttpSolrClient是通用客戶端。能夠與一個Solr節點直接通訊。），數組

　　LBHttpSolrClient，CloudSolrClient，ConcurrentUpdateSolrClient緩存

　　HttpSolrClient的建立需要用戶指定一個或多個Solr基礎URL,而後客戶端使用Solr發送HTTP請求。 tomcat

一個URL的路徑指向一個特定的核心或集合(例如, http://hostname:8983/solr/core1 )。當核心或集合中指定基礎的URL,後續請求由客戶機不須要測量影響集合。然而，客戶端是有限的核心/集合、發送請求，不能發送請求到任何其餘實例。服務器
一個URL指向根Solr路徑(例如, http://hostname:8983/solr )。當沒有指定核心或集合的基URL,能夠請求任何核心/收集,但受影響的核心/必須指定集合的全部請求。 app

通常來講,若是你的 SolrClient 只會被用在一個核心/收集,包括實體的路徑是最方便的。須要更多的靈活性,收集/核心應該被排除在外。

一、solrJ客戶端實例建立並設置鏈接超時時間：

final String solrUrl = "http://127.0.0.1:8080/solr"; //建立solrClient同時指定超時時間，不指定走默認配置
HttpSolrClient build = new HttpSolrClient.Builder(solrUrl) .withConnectionTimeout(10000) .withSocketTimeout(60000) .build();

不一樣solr版本solrj 的建立方式有所不一樣

//solr4建立方式 //SolrServer solrServer = new HttpSolrServer("http://127.0.0.1:8080/solr"); //solr5建立方式,在url中指定core名稱：core1 //HttpSolrClient solrServer=new HttpSolrClient("http://127.0.0.1:8080/solr/core1"); //solr7建立方式,在url中指定core名稱：core1 HttpSolrClient solrServer= new HttpSolrClient.Builder("http://127.0.0.1:8080/solr/core1").build();

注意：solr5之後URL指向自定義核心的名稱，如實例名稱是core1，那麼URL爲http://127.0.0.1:8080/solr/core1

二、solrJ之查詢

SolrClient有不少quary() 查詢方法用於從solr中獲取結果，這些方法都須要一個SolrParams 類型的參數，該對象能夠封裝任意的查詢參數。和每一個方法輸出 QueryResponse 一個包裝器,能夠用來訪問結果文檔和其餘相關的元數據。

　　　　/** * 查詢 * @throws Exception */ @Test public void querySolr() throws Exception{ //[1]獲取鏈接 // HttpSolrClient client= new HttpSolrClient.Builder("http://127.0.0.1:8080/solr/core1").build();
        String solrUrl = "http://127.0.0.1:8080/solr/core1"; //建立solrClient同時指定超時時間，不指定走默認配置
        HttpSolrClient client = new HttpSolrClient.Builder(solrUrl) .withConnectionTimeout(10000) .withSocketTimeout(60000) .build(); //[2]封裝查詢參數
        Map<String, String> queryParamMap = new HashMap<String, String>(); queryParamMap.put("q", "*:*"); //[3]添加到SolrParams對象
        MapSolrParams queryParams = new MapSolrParams(queryParamMap); //[4]執行查詢返回QueryResponse
        QueryResponse response = client.query(queryParams); //[5]獲取doc文檔
        SolrDocumentList documents = response.getResults(); //[6]內容遍歷
        for(SolrDocument doc : documents) { System.out.println("id:"+doc.get("id") +"\tproduct_name:"+doc.get("product_name") +"\tproduct_catalog_name:"+doc.get("product_catalog_name") +"\tproduct_number:"+doc.get("product_number") +"\tproduct_price:"+doc.get("product_price") +"\tproduct_picture:"+doc.get("product_picture")); } client.close(); }

SolrParams 有一個 SolrQuery 子類，它提供了一些方法極大地簡化了查詢操做。下面是 SolrQuery示例代碼 :

 　　　　/** * 二、使用 SolrParams 的子類 SolrQuery,它提供了一些方便的方法,極大地簡化了查詢操做。 * @throws Exception */ @Test public void querySolr2() throws Exception{ //[1]獲取鏈接 // HttpSolrClient client= new HttpSolrClient.Builder("http://127.0.0.1:8080/solr/core1").build();
            String solrUrl = "http://127.0.0.1:8080/solr/core1"; //建立solrClient同時指定超時時間，不指定走默認配置
            HttpSolrClient client = new HttpSolrClient.Builder(solrUrl) .withConnectionTimeout(10000) .withSocketTimeout(60000) .build(); //[2]封裝查詢參數
            SolrQuery query = new SolrQuery("*:*"); //[3]添加須要回顯得內容
            query.addField("id"); query.addField("product_name"); query.setRows(20);//設置每頁顯示多少條 //[4]執行查詢返回QueryResponse
            QueryResponse response = client.query(query); //[5]獲取doc文檔
            SolrDocumentList documents = response.getResults(); //[6]內容遍歷
            for(SolrDocument doc : documents) { System.out.println("id:"+doc.get("id") +"\tproduct_name:"+doc.get("product_name") +"\tname:"+doc.get("name") +"\tproduct_catalog_name:"+doc.get("product_catalog_name") +"\tproduct_number:"+doc.get("product_number") +"\tproduct_price:"+doc.get("product_price") +"\tproduct_picture:"+doc.get("product_picture")); } client.close(); }

三、用solrJ建立索引

　　添加索引使用SolrClient的add（）方法

  　　/** * 添加 * @throws SolrServerException * @throws IOException */ @Test public void solrAdd() throws Exception{ //[1]獲取鏈接 // HttpSolrClient client= new HttpSolrClient.Builder("http://127.0.0.1:8080/solr/core1").build();
            String solrUrl = "http://127.0.0.1:8080/solr/core1"; //建立solrClient同時指定超時時間，不指定走默認配置
            HttpSolrClient client = new HttpSolrClient.Builder(solrUrl) .withConnectionTimeout(10000) .withSocketTimeout(60000) .build(); //[2]建立文檔doc
            SolrInputDocument doc = new SolrInputDocument(); //[3]添加內容
            String str = UUID.randomUUID().toString(); System.out.println(str); doc.addField("id", str); doc.addField("name", "Amazon Kindle Paperwhite"); //[4]添加到client
            UpdateResponse updateResponse = client.add(doc); System.out.println(updateResponse.getElapsedTime()); //[5] 索引文檔必須commit
 client.commit(); }

　　在正常狀況下,文檔應該在更大的批次,索引,而不是一次一個的進行索引。它也建議使用Solra Solr管理員提交文檔時設置爲autocommit自動提交,而不是使用顯式的 commit() 調用。

四、solrJ之單個id 的刪除索引

 /** * 四、單個id 的刪除索引 */ @Test public void solrDelete() throws Exception{ //[1]獲取鏈接
    HttpSolrClient client = Constant.getSolrClient(); //[2]經過id刪除
    client.deleteById("30000"); //[3]提交
 client.commit(); //[4]關閉資源
 client.close(); }

五、solrJ之多個id 的list集合刪除索引

 /** * 五、多個id 的list集合 刪除索引 */ @Test public void solrDeleteList() throws Exception{ //[1]獲取鏈接
        HttpSolrClient client = Constant.getSolrClient(); //[2]經過id刪除
        ArrayList<String> ids = new ArrayList<String>(); ids.add("30000"); ids.add("1"); client.deleteById(ids); //[3]提交
 client.commit(); //[4]關閉資源
 client.close(); }

六、Java對象綁定

SolrJ提供兩個有用的接口，UpdateResponse 和 QueryResponse，它們能夠很方便的處理特定域的對象,可使您的應用程序更容易被理解。SolrJ支持經過@Field註解隱式轉換文檔與任何類。每一個實例變量在Java對象能夠映射到一個相應的Solr字段中,使用 field註解。

先查看一下配置：

solrconfig.xml配置

<requestHandler name="/dataimport" class="solr.DataImportHandler"> <lst name="defaults"> <!--數據源配置文件所在路徑--> <str name="config">./data-config.xml</str> </lst> </requestHandler>

data-config.xml配置

<?xml version="1.0" encoding="UTF-8" ?>  
<dataConfig>   
     <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/solrdata" user="root" password="root"/>   
    <document>   
         <entity name="product" query="select pid,name,catalog,catalog_name,price,number,description,picture from products">
             <field column="pid" name="id"/>
             <field column="name" name="p_name"/>
             <field column="catalog_name" name="p_catalog_name"/>
             <field column="price" name="p_price"/>
             <field column="number" name="p_number"/>
             <field column="description" name="p_description"/>
             <field column="picture" name="p_picture"/>
         </entity>   
    </document>        
</dataConfig>

managed-schema文件配置

 <!--配置ik分詞器-->
  <fieldType name="text_ik" class="solr.TextField">
    <analyzer type="index" class="org.wltea.analyzer.lucene.IKAnalyzer"/>
    <analyzer type="query" class="org.wltea.analyzer.lucene.IKAnalyzer"/>
  </fieldType>
  <!--配置ik分詞器-->
  <field name="name_ik" type="text_ik" indexed="true" stored="true"/>
  <!--項目中的字段-->
    <field name="p_name" type="text_ik" indexed="true" stored="true"/>
    <field name="p_catalog_name" type="string" indexed="true" stored="true"/>
    <field name="p_price" type="pfloat" indexed="true" stored="true"/>
    <field name="p_number" type="plong" indexed="true" stored="true"/>
    <field name="p_description" type="text_ik" indexed="true" stored="true"/>
    <field name="p_picture" type="string" indexed="false" stored="true"/>
    
    <!--關鍵詞  定義複製域字段,將商品名稱和商品描述都複製到 product_keywords這一個字段上-->
    <field name="p_keywords" type="text_ik" indexed="true" stored="false" multiValued="true" />
    <copyField source="p_name" dest="p_keywords" />
    <copyField source="p_description" dest="p_keywords" />

其中 indexed="true" 表示開啓索引（當字段不須要被檢索時，最好不要開啓索引,） stored="true"表示存儲原來數據（當字段不被檢索，而只是須要經過其餘字段檢索而得到時，要設爲true） multiValued="true" 表示返回多值，如一個返回多個content,此時要在java代碼中把 content設置集合或數組類型如

private String[] content;//多值，對應 multiValued="true"

注意：solr4版本的field的type屬性的基本數據類型到solr7的變化

詳細內容參照solr-7.1.0\example\example-DIH\solr\db\conf\目錄下的managed-schema

string	string
boolean	boolean
int	pint
double	pdouble
long	plong
float	pfloat
date	pdate

首先須要建立對象Product，字段必須與schema.xml或managed-schema配置文件的field的一致，該配置文件中必需有這個field，否則會報錯。

①字段寫錯：查詢時不報錯，查不出來想要的數據，添加時建立的索引字段也不是本身想要的field。

②字段type類型不一致：Caused by: java.long.IllegalArgementException:Can not set java.lang.Integer field junit.Product.id to java.long.String

solr的fieldtype屬性	javaBean 屬性類型
string	String
boolean	Boolean
pint	Integer
pdouble	Double
plong	Long
pfloat	Float
pdate	Date

product實體對象：

package junit; import org.apache.solr.client.solrj.beans.Field; public class Product { /** * 商品編號 */ @Field private String id; /** * 商品名稱 */ @Field private String p_name; /** * 商品分類名稱 */ @Field private String p_catalog_name; /** * 價格 */ @Field private Float p_price; /** * 數量 */ @Field private Long p_number; /** * 圖片名稱 */ @Field private String p_picture; /** * 商品描述 */ @Field private String p_description; public String getId() { return id; } public void setId(String id) { this.id = id; } public String getP_name() { return p_name; } public void setP_name(String p_name) { this.p_name = p_name; } public String getP_catalog_name() { return p_catalog_name; } public void setP_catalog_name(String p_catalog_name) { this.p_catalog_name = p_catalog_name; } public Float getP_price() { return p_price; } public void setP_price(Float p_price) { this.p_price = p_price; } public Long getP_number() { return p_number; } public void setP_number(Long p_number) { this.p_number = p_number; } public String getP_picture() { return p_picture; } public void setP_picture(String p_picture) { this.p_picture = p_picture; } public String getP_description() { return p_description; } public void setP_description(String p_description) { this.p_description = p_description; } //空參數構造 public Product() {} //滿參數構造 public Product(String id, String p_name, String p_catalog_name, Float p_price, Long p_number, String p_picture, String p_description) { super(); this.id = id; this.p_name = p_name; this.p_catalog_name = p_catalog_name; this.p_price = p_price; this.p_number = p_number; this.p_picture = p_picture; this.p_description = p_description; } }

4.一、在應用中使用：

 (1)Java對象綁定，經過對象建立索引

  　　　　/** * 六、Java對象綁定，經過對象建立索引 */ @Test public void addBean() throws Exception{ //[1]獲取鏈接 // HttpSolrClient client= new HttpSolrClient.Builder("http://127.0.0.1:8080/solr/core1").build();
            String solrUrl = "http://127.0.0.1:8080/solr/core1"; //建立solrClient同時指定超時時間，不指定走默認配置
            HttpSolrClient client = new HttpSolrClient.Builder(solrUrl) .withConnectionTimeout(10000) .withSocketTimeout(60000) .build(); //[3]建立對象
            Product product = new Product(); product.setId("30000"); product.setP_name("測試商品名稱"); product.setP_catalog_name("測試商品分類名稱"); product.setP_price(399F); product.setP_number(30000L); product.setP_description("測試商品描述"); product.setP_picture("測試商品圖片.jpg"); //[4]添加對象
            UpdateResponse response = client.addBean(product); //[5]提交操做
 client.commit(); //[6]關閉資源
 client.close(); }

查看添加的內容以下：

(2)Java對象綁定，經過對象索引查詢

　　搜索時能夠經過QueryResponse的getbean()方法將結果直接轉換成bean對象：

 　　　　/** * 七、Java對象綁定，經過對象查詢索引 */ @Test public void queryBean() throws Exception{ //[1]獲取鏈接 // HttpSolrClient client= new HttpSolrClient.Builder("http://127.0.0.1:8080/solr/core1").build();
            String solrUrl = "http://127.0.0.1:8080/solr/core1"; //建立solrClient同時指定超時時間，不指定走默認配置
            HttpSolrClient client = new HttpSolrClient.Builder(solrUrl) .withConnectionTimeout(10000) .withSocketTimeout(60000) .build(); //[2]建立SolrQuery對象
            SolrQuery query = new SolrQuery("*:*"); //添加回顯的內容
            query.addField("id"); query.addField("p_name"); query.addField("p_price"); query.addField("p_catalog_name"); query.addField("p_number"); query.addField("p_picture"); query.setRows(200);//設置每頁顯示多少條 //[3]執行查詢返回QueryResponse
            QueryResponse response = client.query(query); //[4]獲取doc文檔
            List<Product> products = response.getBeans(Product.class); //[5]遍歷
            for (Product product : products) { System.out.println("id:"+product.getId() +"\tp_name:"+product.getP_name() +"\tp_price:"+product.getP_price() +"\tp_catalog_name:"+product.getP_catalog_name() +"\tp_number:"+product.getP_number() +"\tp_picture:"+product.getP_picture() ); } //[6]關閉資源
 client.close(); }

查詢結果：

（3）solrJ之經過deleteByQuery刪除索引

　　　　 /**
         * 八、經過deleteByQuery刪除索引
         */ @Test public void deleteBean() throws Exception{ //[1]獲取鏈接 // HttpSolrClient client= new HttpSolrClient.Builder("http://127.0.0.1:8080/solr/core1").build();
            String solrUrl = "http://127.0.0.1:8080/solr/core1"; //建立solrClient同時指定超時時間，不指定走默認配置
            HttpSolrClient client = new HttpSolrClient.Builder(solrUrl) .withConnectionTimeout(10000) .withSocketTimeout(60000) .build(); //[2]執行刪除
            client.deleteByQuery("id:100"); //[3]提交操做
 client.commit(); //[4]關閉資源
 client.close(); }

下面是官網API的對字段類型定義的說法

字段類型定義和屬性

在managed-schema文件中的字段類型定義

① 上面的示例中的第一行包含字段類型名稱 name="text_general",實現類的名稱class="solr.TextField"

② 其他的定義是關於對field分析、描述理解分析、分詞器和過濾器。

實現類負責確保字段是正確的被處理。在managed-schema中的類名，字符串 solr 是 org.apache.solr.schema 或 org.apache.solr.analysis的縮寫。如： solr.TextField 真的是 org.apache.solr.schema.TextField 。

字段類型屬性

field type的class屬性決定了大多數字段類型的行爲,但可選屬性也能夠被定義。例如，下面的日期字段類型定義兩個屬性的定義，sortMissingLast和omitNorms 。

<fieldType name="date" class="solr.DatePointField" sortMissingLast="true" omitNorms="true"/>

能夠爲一個給定的指定的屬性字段類型分爲三大類:

特定的字段類型的class屬性。
Solr支持任何字段類型。
能夠指定的字段類型所繼承的字段,使用這個類型而不是默認的行爲。

通常屬性

這些都是通常的屬性字段

name
fieldType的name。這個值被用於field定義的「type」屬性。強烈建議名稱只包含字母數字或下劃線字符,而不是從一個數字開始。這不是目前嚴格執行。

　　class

class的name,用於存儲和索引的數據類型。請注意，您可能包括類名前面加上「solr。」,Solr搜索會自動找出哪些包類,因此 solr.TextField 將工做。

若是您使用的是第三方的類,你可能須要一個徹底限定的類名。徹底限定的等效 solr.TextField 是 org.apache.solr.schema.TextField 。

positionIncrementGap: 對於多值字段,指定多個值之間的距離,防止虛假的短語匹配。
autoGeneratePhraseQueries: 對於文本字段。若是 true，Solr自動生成短語查詢相鄰。若是false 、terms 必須括在雙引號被視爲短語。

enableGraphQueries: 對於text fields,查詢時適用 sow = false (這是默認的 sow 參數)。使用 true 、默認字段類型的查詢分析器包括graph-aware過濾器,例如, Synonym Graph Filter 和 Word Delimiter Graph Filter 。

使用 false字段類型的查詢分析器能夠匹配文檔包括過濾器,當一些令牌丟失,例如, Shingle Filter。

docValuesFormat: 定義了一個定製的 DocValuesFormat 用於這種類型的字段。這就要求一個感知的編解碼器,如 SchemaCodecFactory 已經配置在 xml 。
postingsFormat: 定義了一個定製的 PostingsFormat 用於這種類型的字段。這就要求一個感知的編解碼器,如 SchemaCodecFactory 已經配置在 xml 。

字段默認屬性

這些屬性能夠指定字段類型,或對我的領域覆蓋提供的字段類型的值。

每一個屬性的默認值取決於底層 FieldType 類,進而可能取決於 版本 的屬性<schema/> 。下表包含了大部分的默認值 FieldType Solr提供了實現,假設 schema.xml 聲明 version = " 1.6 " 。

屬性	描述	值	默認值
indexed	若是true,字段的值可用於查詢來檢索匹配的文檔。	true or false	true
stored	若是true,field的實際值能夠經過查詢檢索。	true or false	true
docValues	若是true,字段的值將用於 DocValues 結構。	true or false	false
sortMissingFirst sortMissingLast	控制文檔的位置，當分類字段不存在時。	true or false	false
multiValued	若是是true,代表一個文檔可能包含多個值的字段類型。	true or false	false
omitNorms	若是是true,省略了與這個領域相關的規範(這個禁用長度歸一化的領域,並節省一些內存)。全部原始默認值爲true(non-analyzed)字段類型,如整數、浮點數、數據、布爾值和字符串。只有全文字段或字段須要規範。	true or false	*
omitTermFreqAndPositions	若是是true,省略了詞頻率、位置和有效載荷從檢索條數的field。這是一個性能提高不須要這些信息的field。它還能夠減小所需的存儲空間索引。依賴於位置的查詢字段上發佈這個選項將默默地找不到文件。該屬性默認爲適用於全部字段類型非 text fields。	true or false	*
omitPositions	相似於 `omitTermFreqAndPositions` 但保留詞頻率信息。	true or false	*
termVectors termPositions termOffsets termPayloads	這些選項指示Solr保持完整任期爲每一個文檔向量,可選地包括位置、抵消和負載信息爲每一個術語出如今這些向量。這些能夠用來加速高亮顯示和其餘輔助功能,但對大量成本指數的大小。他們不是典型的使用Solr的必要條件。	true or false	false
required	沒有這個字段的值，命令Solr拒絕任何試圖添加一個文檔。該屬性默認值爲false。	true or false	false
useDocValuesAsStored	若是字段有 docValues 啓用時,設置爲true將容許返回字段,就好像它是一個存儲字段(即便它`stored=false`)當匹配 * 一個在一個 fl參數。	true or false	true
large	多數的field老是懶加載和當文檔中佔用空間緩存的實際值 < 512 kb。這個選項須要 stored="true"和 multiValued="false"。它是用於field有很是大的價值,這樣他們不會在內存中緩存。	true or false	false

包含在Solr中字段類型

下表列出了在Solr可用字段類型。的 org.apache.solr.schema 包包括全部表中列出的類。

　　　因爲工做緣由，下邊的描述還有待查證。

class	描述
BinaryField
BoolField
CollationField
CurrencyField
CurrencyFieldType
DateRangeField
DatePointField
DoublePointField
ExternalFileField
EnumField
EnumFieldType
FloatPointField
ICUCollationField
IntPointField
LatLonPointSpatialField
LatLonType
LongPointField
PointType
PreAnalyzedField
RandomSortField
SpatialRecursivePrefixTreeFieldType
StrField
TextField
TrieDateField	棄用。使用DatePointField代替。
TrieDoubleField	棄用。使用DoublePointField代替。
TrieFloatField	棄用。使用FloatPointField代替。
TrieIntField	棄用。使用IntPointField代替。
TrieLongField	棄用。使用LongPointField代替。
TrieField	棄用。這個field須要 `type` 參數定義特定類使用Trie* field;使用一個適當field type代替。
UUIDField