[solr] - suggestion

時間 2019-11-11

標籤 solr suggestion 简体版

原文原文鏈接

前文使用了SpellCheck作了個自動完成模擬（Solr SpellCheck），使用第一種SpellCheck方式作auto-complete，是基於動態代碼方式創建內容，下面方式可經過讀文件方式創建內容，並有點擊率排序。html

一、在mycore/conf目錄下新建一個dictionary.txt文件（UTF-8格式），內容爲：java

# sample dict 
cpu intel I7    1.0
cpu AMD 5000+    2.0
中央處理器 英特爾    1.0
中央處理器 AMD    2.0
中央空調 海爾 1匹    1.0
中央空調 海爾 1.5匹    2.0
中央空調 海爾 2匹    3.0
中央空調 格力 1匹    4.0
中央空調 格力 1.5匹    5.0
中央空調 格力 2匹    6.0
中央空調 美的 1匹    7.0
中央空調 美的 1.5匹    8.0
中央空調 美的 2匹    9.0
中國中央政府    1.0
中國中央銀行    2.0
中國中央人民銀行    3.0
啓信有限公司    1.0
啓信科技有限公司    2.0

注意上面的「1.0、2.0、3.0」，這就是點擊率。以Tab字符(\t)隔開與前面的文字，不然視爲普通文本。web

二、打開solrconfig.xml文件，加入節點到<config />當中：apache

    <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
      <lst name="spellchecker">
        <str name="name">file</str>
        <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>  
        <str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
        <!-- 下面這個field名字指的是拼寫檢查的依據，也就是說要根據哪一個Field來檢查用戶輸入。 -->
        <str name="field">content</str>
        <str name="combineWords">true</str>
        <str name="breakWords">true</str>
        <!-- 自動完成提示內容文件 -->
        <str name="sourceLocation">dictionary.txt</str>
        <!-- 自動完成提示索引目錄，若是不寫默認使用內存模式RAMDirectory -->
        <str name="spellcheckIndexDir">./spellchecker</str>
        <!-- 什麼時候建立拼寫索引：buildOnCommit/buildOnOptimize -->  
        <str name="buildOnCommit">true</str>
      </lst>
    </searchComponent>
    <requestHandler name="/spellcheck" class="org.apache.solr.handler.component.SearchHandler">
      <lst name="defaults">
        <str name="spellcheck">true</str>
        <str name="spellcheck.dictionary">file</str>
        <!-- 提示查詢的字符數量 -->
        <str name="spellcheck.count">20</str>
        <!-- 使用點擊率排序 -->
        <str name="spellcheck.onlyMorePopular">true</str>
      </lst>
      <arr name="last-components">
        <str>spellcheck</str>
      </arr>
    </requestHandler>

在<searchComponent />中關鍵這句：瀏覽器

<str name="sourceLocation">dictionary.txt</str>

三、打開瀏覽器地址欄輸入：tomcat

http://localhost:8899/solr/mycore/spellcheck?spellcheck.build=true

結果爲：app

四、在瀏覽器測試，輸入地址：webapp

http://localhost:8899/solr/mycore/spellcheck?q=中央&rows=0

五、使用代碼測試：socket

package com.my.solr;

import java.io.IOException;
import java.util.ArrayList;
import java.util.Date;
import java.util.List;
import java.util.Map;

import org.apache.solr.client.solrj.SolrQuery;
import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.client.solrj.impl.HttpSolrServer;
import org.apache.solr.client.solrj.impl.XMLResponseParser;
import org.apache.solr.client.solrj.response.QueryResponse;
import org.apache.solr.client.solrj.response.SpellCheckResponse;
import org.apache.solr.client.solrj.response.SpellCheckResponse.Collation;
import org.apache.solr.client.solrj.response.SpellCheckResponse.Correction;
import org.apache.solr.client.solrj.response.SpellCheckResponse.Suggestion;

import com.my.entity.Item;

public class TestSolr {

    public static void main(String[] args) throws IOException, SolrServerException {
        String url = "http://localhost:8899/solr/mycore";
        HttpSolrServer core = new HttpSolrServer(url);
        core.setMaxRetries(1);
        core.setConnectionTimeout(5000);
        core.setParser(new XMLResponseParser()); // binary parser is used by default
        core.setSoTimeout(1000); // socket read timeout
        core.setDefaultMaxConnectionsPerHost(100);
        core.setMaxTotalConnections(100);
        core.setFollowRedirects(false); // defaults to false
        core.setAllowCompression(true);

        // ------------------------------------------------------
        // search
        // ------------------------------------------------------
        SolrQuery query = new SolrQuery();
        String token = "中央";
        query.set("qt", "/spellcheck");
        query.set("q", token);
        query.set("spellcheck", "on");
        query.set("spellcheck.build", "true");
        query.set("spellcheck.onlyMorePopular", "true");

        query.set("spellcheck.count", "100");
        query.set("spellcheck.alternativeTermCount", "4");
        query.set("spellcheck.onlyMorePopular", "true");

        query.set("spellcheck.extendedResults", "true");
        query.set("spellcheck.maxResultsForSuggest", "5");

        query.set("spellcheck.collate", "true");
        query.set("spellcheck.collateExtendedResults", "true");
        query.set("spellcheck.maxCollationTries", "5");
        query.set("spellcheck.maxCollations", "3");

        QueryResponse response = null;

        try {
            response = core.query(query);
            System.out.println("查詢耗時：" + response.getQTime());
        } catch (SolrServerException e) {
            System.err.println(e.getMessage());
            e.printStackTrace();
        } catch (Exception e) {
            System.err.println(e.getMessage());
            e.printStackTrace();
        } finally {
            core.shutdown();
        }

        SpellCheckResponse spellCheckResponse = response.getSpellCheckResponse();
        if (spellCheckResponse != null) {
            List<Suggestion> suggestionList = spellCheckResponse.getSuggestions();
            for (Suggestion suggestion : suggestionList) {
                System.out.println("Suggestions NumFound: " + suggestion.getNumFound());
                System.out.println("Token: " + suggestion.getToken());
                System.out.print("Suggested: ");
                List<String> suggestedWordList = suggestion.getAlternatives();
                for (String word : suggestedWordList) {
                    System.out.println(word + ", ");
                }
                System.out.println();
            }
            System.out.println();
            Map<String, Suggestion> suggestedMap = spellCheckResponse.getSuggestionMap();
            for (Map.Entry<String, Suggestion> entry : suggestedMap.entrySet()) {
                System.out.println("suggestionName: " + entry.getKey());
                Suggestion suggestion = entry.getValue();
                System.out.println("NumFound: " + suggestion.getNumFound());
                System.out.println("Token: " + suggestion.getToken());
                System.out.print("suggested: ");

                List<String> suggestedList = suggestion.getAlternatives();
                for (String suggestedWord : suggestedList) {
                    System.out.print(suggestedWord + ", ");
                }
                System.out.println("\n\n");
            }

            Suggestion suggestion = spellCheckResponse.getSuggestion(token);
            System.out.println("NumFound: " + suggestion.getNumFound());
            System.out.println("Token: " + suggestion.getToken());
            System.out.print("suggested: ");
            List<String> suggestedList = suggestion.getAlternatives();
            for (String suggestedWord : suggestedList) {
                System.out.print(suggestedWord + ", ");
            }
            System.out.println("\n\n");

            System.out.println("The First suggested word for solr is : " + spellCheckResponse.getFirstSuggestion(token));
            System.out.println("\n\n");

            List<Collation> collatedList = spellCheckResponse.getCollatedResults();
            if (collatedList != null) {
                for (Collation collation : collatedList) {
                    System.out.println("collated query String: " + collation.getCollationQueryString());
                    System.out.println("collation Num: " + collation.getNumberOfHits());
                    List<Correction> correctionList = collation.getMisspellingsAndCorrections();
                    for (Correction correction : correctionList) {
                        System.out.println("original: " + correction.getOriginal());
                        System.out.println("correction: " + correction.getCorrection());
                    }
                    System.out.println();
                }
            }
            System.out.println();
            System.out.println("The Collated word: " + spellCheckResponse.getCollatedResult());
            System.out.println();
        }

        System.out.println("查詢耗時：" + response.getQTime());
    }
}

輸出結果：測試

這裏已經根據點擊率排好序了。

上面dictionary.txt中有一個「啓信」，這不是一個分詞，因此若是查詢「啓」字，是不會有結果的。

加入用戶自定義分詞方法：

一、打開solr web的目錄webapps\solr\WEB-INF\classes，新建一個etc.dic文本文件，內容：

啓信

編輯IKAnalyzer.cfg.xml文件：

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">  
<properties>  
    <comment>IK Analyzer 擴展配置</comment>
    <!--用戶能夠在這裏配置本身的擴展字典-->
    <entry key="ext_dict">ext.dic;</entry> 
    
    <!--用戶能夠在這裏配置本身的擴展中止詞字典-->
    <entry key="ext_stopwords">stopword.dic;</entry> 
    
</properties>

保存，重啓tomcat。

地址欄輸入：

http://localhost:8899/solr/mycore/spellcheck?q=啓&rows=0

結果：

使用代碼方式亦同。

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。