1.測試Elasticsearch的分詞git
Elasticsearch有多種分詞器(參考:https://www.jianshu.com/p/d57935ba514b)github
Set the shape to semi-transparent by calling set_trans(5)bash
(1)standard analyzer:標準分詞器(默認是這種)
set,the,shape,to,semi,transparent by,calling,set_trans,5elasticsearch
(2)simple analyzer:簡單分詞器
set, the, shape, to, semi, transparent, by, calling, set, trans測試
(3)whitespace analyzer:空白分詞器。大小寫,下劃線等都不會轉換
Set, the, shape, to, semi-transparent, by, calling, set_trans(5)spa
(4)language analyzer:(特定語言分詞器,好比說English英語分瓷器)
set, shape, semi, transpar, call, set_tran, 5code
2.爲Elasticsearch的index設置分詞blog
這樣就將這個index裏面的全部type的分詞設置成了simpletoken
PUT my_index { "settings": { "analysis": { "analyzer": {"default":{"type":"simple"}} } } }
http://localhost:9200/_analyze?analyzer=standard&pretty=true&text=test測試
分詞結果ip
{ "tokens" : [ { "token" : "test", "start_offset" : 0, "end_offset" : 4, "type" : "<ALPHANUM>", "position" : 0 }, { "token" : "測", "start_offset" : 4, "end_offset" : 5, "type" : "<IDEOGRAPHIC>", "position" : 1 }, { "token" : "試", "start_offset" : 5, "end_offset" : 6, "type" : "<IDEOGRAPHIC>", "position" : 2 } ] }
簡單分詞器 : simple analyzer
http://localhost:9200/_analyze?analyzer=simple&pretty=true&text=test_測試
結果
{ "tokens" : [ { "token" : "test", "start_offset" : 0, "end_offset" : 4, "type" : "word", "position" : 0 }, { "token" : "測試", "start_offset" : 5, "end_offset" : 7, "type" : "word", "position" : 1 } ] }
IK分詞器 : ik_max_word analyzer 和
ik_smart analyzer
首先須要安裝
https://github.com/medcl/elasticsearch-analysis-ik
下zip包,而後使用install plugin進行安裝,我機器上的es版本是5.6.10,因此安裝的就是5.6.10
./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v5.6.10/elasticsearch-analysis-ik-5.6.10.zip
而後從新啓動Elasticsearch就能夠了
進行測試
http://localhost:9200/_analyze?analyzer=ik_max_word&pretty=true&text=test_tes_te測試
結果
{ "tokens" : [ { "token" : "test_tes_te", "start_offset" : 0, "end_offset" : 11, "type" : "LETTER", "position" : 0 }, { "token" : "test", "start_offset" : 0, "end_offset" : 4, "type" : "ENGLISH", "position" : 1 }, { "token" : "tes", "start_offset" : 5, "end_offset" : 8, "type" : "ENGLISH", "position" : 2 }, { "token" : "te", "start_offset" : 9, "end_offset" : 11, "type" : "ENGLISH", "position" : 3 }, { "token" : "測試", "start_offset" : 11, "end_offset" : 13, "type" : "CN_WORD", "position" : 4 } ] }