ES1.4 中文分詞插件ik

時間 2019-12-30

標籤 es1.4 中文分詞插件简体版

原文原文鏈接

ES 官方只支持smartcn這個中文分詞插件，是按照單個字進行索引。前端產品搜索時，凡是帶這個關鍵字的都會被索引到。打個比方:搜索"蘋果"關鍵詞，凡是包含"蘋"和"果"的title都會被索引到。因此爲了這個需求，找一個第三方的中文分詞插件.看了一下，國內有幾款中分分詞插件:ik、ansj和mmseg。最後我選擇使用了ik。前端

ES1.4 下安裝ik，碰到了很多坑。最後終於搞定，因此分享一下ik的安裝步驟。python

1. 下載es源代碼，而後進行編譯，將相應jar包複製到$ES_HOME/plugins/目錄下。git

2. 下載ik配置文件，複製到$ES_HOME/config/目錄下。github

3.修改elasticsearch.yml配置文件curl

4.測試ikelasticsearch

1.首先下載es源代碼，並進行編譯maven

wget --no-check-certificate 

unzip master.zip
cd  elasticsearch-analysis-ik-master
mvn clean install -Dmaven.test.skip=true   #編譯過程，須要下載相應的jar包。因此喝一杯咖啡,慢慢等待...

將編譯後的elasticsearch-analysis-ik-1.2.9.zip，解壓縮，複製到$ES_HOME/plugins目錄

相應jar包:ide

2. 下載ik配置文件，複製到$ES_HOME/config/目錄下測試

https://github.com/davidbj/elasticsearch-rtf/archive/master.zip
unzip master.zip
將解壓目錄config/ik文件夾複製到$ES_HOME/config目錄下

3. 更改$ES_HOME/config/elasticsearch.yml 配置文件url

index:  
  analysis:                     
    analyzer:        
      ik:  
          alias: [ik_analyzer]  
          type: org.elasticsearch.index.analysis.IkAnalyzerProvider  
      ik_max_word:  
          type: ik  
          use_smart: false  
      ik_smart:  
          type: ik  
          use_smart: true
 
#或
  index.analysis.analyzer.ik.type : 「ik」

最後重啓elaticsearch服務

4.測試：

curl -XPOST  "

 測試結果以下
{  
tokens: [  
{  
token: text  
start_offset: 2 
end_offset: 6 
type: ENGLISH  
position: 1 
}  
{  
token: 我  
start_offset: 9 
end_offset: 10 
type: CN_CHAR  
position: 2 
}  
{  
token: 中國人  
start_offset: 11 
end_offset: 14 
type: CN_WORD  
position: 3 
}  
{  
token: 中國  
start_offset: 11 
end_offset: 13 
type: CN_WORD  
position: 4 
}  
{  
token: 國人  
start_offset: 12 
end_offset: 14 
type: CN_WORD  
position: 5 
}  
]  
}

至此，es 的ik 插件已經安裝配置完成~

　　更多詳細信息，請查看官方文檔:https://github.com/awnuxkjy/es-ik