注: elasticsearch 版本6.2.2html
域名 ip master 192.168.0.120 slave1 192.168.0.121 slave2 192.168.0.122
在github中搜索ik,找到"medcl/elasticsearch-analysis-ik",並找到https://github.com/medcl/elasticsearch-analysis-ik/releases,選擇本身須要的版本:java
或者git
如上圖所示,選擇和elsticsearch 匹配的版本,並下載zip包。github
①解壓elasticsearch-analysis-ik-6.2.2.zipjson
②打開eclispe導入 maven項目服務器
下一步微信
③導入後,使用maven build...編譯jar包app
彈出編輯框:eclipse
點擊「Run」執行curl
完成後,在target文件夾上右鍵 選擇 Refresh,如圖所示:
把編譯好的jar包上傳到master服務器上,
執行命令安裝:
[spark@master ~]$ cd /opt/ [spark@master opt]$ unzip elasticsearch-analysis-ik-6.2.2.zip -d /opt/elasticsearch-6.2.2/plugins/ Archive: elasticsearch-analysis-ik-6.2.2.zip creating: /opt/elasticsearch-6.2.2/plugins/elasticsearch/ inflating: /opt/elasticsearch-6.2.2/plugins/elasticsearch/plugin-descriptor.properties creating: /opt/elasticsearch-6.2.2/plugins/elasticsearch/config/ inflating: /opt/elasticsearch-6.2.2/plugins/elasticsearch/config/extra_main.dic inflating: /opt/elasticsearch-6.2.2/plugins/elasticsearch/config/extra_single_word.dic inflating: /opt/elasticsearch-6.2.2/plugins/elasticsearch/config/extra_single_word_full.dic inflating: /opt/elasticsearch-6.2.2/plugins/elasticsearch/config/extra_single_word_low_freq.dic inflating: /opt/elasticsearch-6.2.2/plugins/elasticsearch/config/extra_stopword.dic inflating: /opt/elasticsearch-6.2.2/plugins/elasticsearch/config/IKAnalyzer.cfg.xml inflating: /opt/elasticsearch-6.2.2/plugins/elasticsearch/config/main.dic inflating: /opt/elasticsearch-6.2.2/plugins/elasticsearch/config/preposition.dic inflating: /opt/elasticsearch-6.2.2/plugins/elasticsearch/config/quantifier.dic inflating: /opt/elasticsearch-6.2.2/plugins/elasticsearch/config/stopword.dic inflating: /opt/elasticsearch-6.2.2/plugins/elasticsearch/config/suffix.dic inflating: /opt/elasticsearch-6.2.2/plugins/elasticsearch/config/surname.dic inflating: /opt/elasticsearch-6.2.2/plugins/elasticsearch/elasticsearch-analysis-ik-6.2.2.jar inflating: /opt/elasticsearch-6.2.2/plugins/elasticsearch/httpclient-4.5.2.jar inflating: /opt/elasticsearch-6.2.2/plugins/elasticsearch/httpcore-4.4.4.jar inflating: /opt/elasticsearch-6.2.2/plugins/elasticsearch/commons-logging-1.2.jar inflating: /opt/elasticsearch-6.2.2/plugins/elasticsearch/commons-codec-1.9.jar [spark@master opt]$ cd /opt/elasticsearch-6.2.2/plugins/
[spark@master plugins]$ mv elasticsearch/ ik/
slave1,slave2一樣安裝,這裏省略。。
master,slave1,slave2三臺服務器安裝完成後,重啓elasticsearch 便可加載ik分詞器。
1) 刪除、建立索引:
curl -Xdelete "http://192.168.0.120:9200/index" curl -Xput "http://192.168.0.120:9200/index"
2)使用index索引建立mapping(對字段‘content’進行中文分詞):
curl -XPOST "http://192.168.0.120:9200/index/fulltext/_mapping" -H 'Content-Type: application/json' -d' { "properties": { "content": { "type": "text", "analyzer": "ik_max_word", "search_analyzer": "ik_max_word" } } }'
3)先添加4條記錄:
curl -XPOST "http://192.168.0.120:9200/index/fulltext/1" -H 'Content-Type: application/json' -d' {"content":"美國留給伊拉克的是個爛攤子嗎"}' curl -XPOST "http://192.168.0.120:9200/index/fulltext/2" -H 'Content-Type: application/json' -d' {"content":"公安部:各地校車將享最高路權"}' curl -XPOST "http://192.168.0.120:9200/index/fulltext/3" -H 'Content-Type: application/json' -d' {"content":"中韓漁警衝突調查:韓警平均天天扣1艘中國漁船"}' curl -XPOST "http://192.168.0.120:9200/index/fulltext/4" -H 'Content-Type: application/json' -d' {"content":"中國駐洛杉磯領事館遭亞裔男子槍擊 嫌犯已自首"}'
4)執行統計:
curl -XPOST "http://192.168.0.120:9200/index/fulltext/_search" -H 'Content-Type: application/json' -d' { "query" : { "match" : { "content" : "中國" }}, "highlight" : { "pre_tags" : ["<tag1>", "<tag2>"], "post_tags" : ["</tag1>", "</tag2>"], "fields" : { "content" : {} } } }'
返回結果:
{ "took": 133, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 2, "max_score": 0.6489038, "hits": [ { "_index": "index", "_type": "fulltext", "_id": "4", "_score": 0.6489038, "_source": { "content": "中國駐洛杉磯領事館遭亞裔男子槍擊 嫌犯已自首" }, "highlight": { "content": [ "<tag1>中國</tag1>駐洛杉磯領事館遭亞裔男子槍擊 嫌犯已自首" ] } }, { "_index": "index", "_type": "fulltext", "_id": "3", "_score": 0.2876821, "_source": { "content": "中韓漁警衝突調查:韓警平均天天扣1艘中國漁船" }, "highlight": { "content": [ "中韓漁警衝突調查:韓警平均天天扣1艘<tag1>中國</tag1>漁船" ] } } ] } }
5)再添加3條記錄:
curl -XPOST "http://192.168.0.120:9200/index/fulltext/5" -H 'Content-Type: application/json' -d' {"content":"俄偵委:俄一輛卡車渡河時翻車 致2名中國遊客遇難"}' curl -XPOST "http://192.168.0.120:9200/index/fulltext/6" -H 'Content-Type: application/json' -d' {"content":"韓國銀行面向中國留學生推出微信支付服務"}' curl -XPOST "http://192.168.0.120:9200/index/fulltext/7" -H 'Content-Type: application/json' -d' {"content":"印媒:中國東北「鏽帶」在困境中反擊"}'
6)從新執行統計:
curl -XPOST "http://192.168.0.120:9200/index/fulltext/_search" -H 'Content-Type: application/json' -d' { "query" : { "match" : { "content" : "中國" }}, "highlight" : { "pre_tags" : ["<tag1>", "<tag2>"], "post_tags" : ["</tag1>", "</tag2>"], "fields" : { "content" : {} } } }'
返回結果:
{ "took": 41, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 5, "max_score": 0.6785375, "hits": [ { "_index": "index", "_type": "fulltext", "_id": "7", "_score": 0.6785375, "_source": { "content": "印媒:中國東北「鏽帶」在困境中反擊" }, "highlight": { "content": [ "印媒:<tag1>中國</tag1>東北「鏽帶」在困境中反擊" ] } }, { "_index": "index", "_type": "fulltext", "_id": "6", "_score": 0.47000363, "_source": { "content": "韓國銀行面向中國留學生推出微信支付服務" }, "highlight": { "content": [ "韓國銀行面向<tag1>中國</tag1>留學生推出微信支付服務" ] } }, { "_index": "index", "_type": "fulltext", "_id": "4", "_score": 0.44000342, "_source": { "content": "中國駐洛杉磯領事館遭亞裔男子槍擊 嫌犯已自首" }, "highlight": { "content": [ "<tag1>中國</tag1>駐洛杉磯領事館遭亞裔男子槍擊 嫌犯已自首" ] } }, { "_index": "index", "_type": "fulltext", "_id": "5", "_score": 0.2876821, "_source": { "content": "俄偵委:俄一輛卡車渡河時翻車 致2名中國遊客遇難" }, "highlight": { "content": [ "俄偵委:俄一輛卡車渡河時翻車 致2名<tag1>中國</tag1>遊客遇難" ] } }, { "_index": "index", "_type": "fulltext", "_id": "3", "_score": 0.2876821, "_source": { "content": "中韓漁警衝突調查:韓警平均天天扣1艘中國漁船" }, "highlight": { "content": [ "中韓漁警衝突調查:韓警平均天天扣1艘<tag1>中國</tag1>漁船" ] } } ] } }
IK支持自定義配置詞庫,配置文件在config文件夾下的analysis-ik/IKAnalyzer.cfg.xml,字典文件也在同級目錄下,能夠支持多個選項的配置,ext_dict-自定義詞庫,ext_stopwords-屏蔽詞庫。
同時還支持熱更新配置,配置remote_ext_dict爲http地址,輸入一行一個詞語,注意文檔格式要爲UTF8無BOM格式,若是詞庫發生更新,只須要更新response header中任意一個字段Last-Modified或ETag便可。
[spark@master config]$ pwd /opt/elasticsearch-6.2.2/plugins/ik/config [spark@master config]$ more IKAnalyzer.cfg.xml <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd"> <properties> <comment>IK Analyzer 擴展配置</comment> <!--用戶能夠在這裏配置本身的擴展字典 --> <entry key="ext_dict"></entry> <!--用戶能夠在這裏配置本身的擴展中止詞字典--> <entry key="ext_stopwords"></entry> <!--用戶能夠在這裏配置遠程擴展字典 --> <!-- <entry key="remote_ext_dict">words_location</entry> --> <!--用戶能夠在這裏配置遠程擴展中止詞字典--> <!-- <entry key="remote_ext_stopwords">words_location</entry> --> </properties> [spark@master config]$
參考:
《https://blog.csdn.net/moxiong3212/article/details/79338586》
《https://www.cnblogs.com/gaoxu387/p/7889626.html》