1.es默認的分詞器對中文支持很差,會分割成一個個的漢字。ik分詞器對中文的支持要好一些,主要由兩種模式:ik_smart和ik_max_word
2.環境
操做系統:centos
es版本:6.0.0java
1.插件地址:https://github.com/medcl/elasticsearch-analysis-ik
2.運行命令行:git
./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.0.0/elasticsearch-analysis-ik-6.0.0.zip
運行完成後會發現多瞭如下文件:esroot 下的plugins和config文件夾多了analysis-ik目錄。github
1.查找es進程centos
ps -ef | grep elastic
2.終止進程
從上面的結果能夠看到es進程號是12776.
執行命令:api
kill 12776
3.啓動es後臺運行elasticsearch
./bin/sh elastic search –d
提醒:重啓es會從新分片,線上環境要注意了。測試
1.使用ik_max_word分詞spa
GET _analyze { "analyzer":"ik_max_word", "text":"中華人民共和國國歌" }
分詞結果:操作系統
{ "tokens": [ { "token": "中華人民共和國", "start_offset": 0, "end_offset": 7, "type": "CN_WORD", "position": 0 }, { "token": "中華人民", "start_offset": 0, "end_offset": 4, "type": "CN_WORD", "position": 1 }, { "token": "中華", "start_offset": 0, "end_offset": 2, "type": "CN_WORD", "position": 2 }, { "token": "華人", "start_offset": 1, "end_offset": 3, "type": "CN_WORD", "position": 3 }, { "token": "人民共和國", "start_offset": 2, "end_offset": 7, "type": "CN_WORD", "position": 4 }, { "token": "人民", "start_offset": 2, "end_offset": 4, "type": "CN_WORD", "position": 5 }, { "token": "共和國", "start_offset": 4, "end_offset": 7, "type": "CN_WORD", "position": 6 }, { "token": "共和", "start_offset": 4, "end_offset": 6, "type": "CN_WORD", "position": 7 }, { "token": "國", "start_offset": 6, "end_offset": 7, "type": "CN_CHAR", "position": 8 }, { "token": "國歌", "start_offset": 7, "end_offset": 9, "type": "CN_WORD", "position": 9 } ] }
2.使用ik_smart分詞插件
GET _analyze { "analyzer":"ik_smart", "text":"中華人民共和國國歌" }
分詞結果:
{ "tokens": [ { "token": "中華人民共和國", "start_offset": 0, "end_offset": 7, "type": "CN_WORD", "position": 0 }, { "token": "國歌", "start_offset": 7, "end_offset": 9, "type": "CN_WORD", "position": 1 } ] }
1.調用ik_max_word分詞
@Test public void analyzer_ik_max_word() throws Exception { java.lang.String text = "提早祝你們春節快樂!"; TransportClient client = EsClient.get(); AnalyzeRequest request = (new AnalyzeRequest()).analyzer("ik_max_word").text(text); List<AnalyzeResponse.AnalyzeToken> tokens = client.admin().indices().analyze(request).actionGet().getTokens(); System.out.println(tokens.size());//6 for (AnalyzeResponse.AnalyzeToken token : tokens) { System.out.println(token.getTerm() + " "); } }
結果:
6
提早
祝
你們
春節快樂
春節
快樂
2.調用ik_smart分詞
@Test public void analyzer_ik_smart() throws Exception { java.lang.String text = "提早祝你們春節快樂!"; TransportClient client = EsClient.get(); AnalyzeRequest request = (new AnalyzeRequest()).analyzer("ik_smart").text(text); List<AnalyzeResponse.AnalyzeToken> tokens = client.admin().indices().analyze(request).actionGet().getTokens(); System.out.println(tokens.size()); for (AnalyzeResponse.AnalyzeToken token : tokens) { System.out.println(token.getTerm() + " "); } }
結果:
4
提早
祝
你們
春節快樂