一、安裝IK分詞器,下載對應版本的插件,elasticsearch-analysis-ik中文分詞器的開發者一直進行維護的,對應着elasticsearch的版本,因此選擇好本身的版本便可。IKAnalyzer中文分詞器原做者已經不進行維護了,可是Lucece在不斷更新,因此使用Lucece和IKAnalyzer中文分詞器集成,須要你進行修改IKAnalyzer中文分詞器。html
下載地址:https://github.com/medcl/elasticsearch-analysis-ik/releasesjava
將下載好的中文分詞器上傳到你的服務器,或者使用wget命令聯網下載,蘿蔔白菜各有所愛吧。個人IK中文分詞器版本對應了ElasticSearch的版本。下載好開始安裝,因爲我是僞分佈集羣,就一臺集羣,就安裝一次便可,可是若是是分佈式的話,每一臺機器都要安裝中文分詞器的。node
二、開始解壓縮操做,將elasticsearch-analysis-ik-5.4.3.zip拷貝到一個目錄裏面進行解壓縮操做,安裝IK中文分詞器。linux
1 [root@slaver4 package]# mkdir elasticsearch-analysis-ik 2 [root@slaver4 package]# cp elasticsearch-analysis-ik-5.4.3.zip elasticsearch-analysis-ik 3 [root@slaver4 elasticsearch-analysis-ik]# unzip elasticsearch-analysis-ik-5.4.3.zip 4 Archive: elasticsearch-analysis-ik-5.4.3.zip 5 inflating: elasticsearch-analysis-ik-5.4.3.jar 6 inflating: httpclient-4.5.2.jar 7 inflating: httpcore-4.4.4.jar 8 inflating: commons-logging-1.2.jar 9 inflating: commons-codec-1.9.jar 10 creating: config/ 11 creating: config/custom/ 12 inflating: config/surname.dic 13 inflating: config/preposition.dic 14 inflating: config/custom/mydict.dic 15 inflating: config/custom/single_word_full.dic 16 inflating: config/custom/sougou.dic 17 inflating: config/custom/ext_stopword.dic 18 inflating: config/custom/single_word.dic 19 inflating: config/custom/single_word_low_freq.dic 20 inflating: config/main.dic 21 inflating: config/IKAnalyzer.cfg.xml 22 inflating: config/quantifier.dic 23 inflating: config/stopword.dic 24 inflating: config/suffix.dic 25 inflating: plugin-descriptor.properties 26 [root@slaver4 elasticsearch-analysis-ik]# ls 27 commons-codec-1.9.jar commons-logging-1.2.jar config elasticsearch-analysis-ik-5.4.3.jar elasticsearch-analysis-ik-5.4.3.zip httpclient-4.5.2.jar httpcore-4.4.4.jar plugin-descriptor.properties 28 [root@slaver4 elasticsearch-analysis-ik]#
因爲unzip的默認解壓縮到當前目錄,這裏能夠將elasticsearch-analysis-ik-5.4.3.zip包刪除掉。git
1 [root@slaver4 elasticsearch-analysis-ik]# ls 2 commons-codec-1.9.jar commons-logging-1.2.jar config elasticsearch-analysis-ik-5.4.3.jar elasticsearch-analysis-ik-5.4.3.zip httpclient-4.5.2.jar httpcore-4.4.4.jar plugin-descriptor.properties 3 [root@slaver4 elasticsearch-analysis-ik]# rm -rf elasticsearch-analysis-ik-5.4.3.zip 4 [root@slaver4 elasticsearch-analysis-ik]# ls 5 commons-codec-1.9.jar commons-logging-1.2.jar config elasticsearch-analysis-ik-5.4.3.jar httpclient-4.5.2.jar httpcore-4.4.4.jar plugin-descriptor.properties 6 [root@slaver4 elasticsearch-analysis-ik]#
而後將解壓縮好的IK移動到ElasticSearch的plugins目錄下面。記得三個節點plugins目錄裏面都須要放的哦。以下所示:github
注意:放到plugins目錄裏面的IK中文分詞器插件,必須放到一個目錄裏面,再放到plugins目錄裏面。如個人elasticsearch-analysis-ik裏面存的就是IK中文分詞器解壓縮後的文件。web
1 [root@slaver4 package]# mv elasticsearch-analysis-ik/ /home/hadoop/soft/elasticsearch-5.4.3/plugins 2 [root@slaver4 package]# cd /home/hadoop/soft/elasticsearch-5.4.3/plugins 3 [root@slaver4 plugins]# ls 4 elasticsearch-analysis-ik 5 [root@slaver4 plugins]#
因爲Elasticsearch沒有提供關閉的命令,使用kill -9 進程號,在生產環境也是大忌的,生產環境若是使用kill命令的話,建議使用kill 進程號,讓系統把任務處理完,再關閉進程。若是你的es是啓動的,使用以下命令進行重啓,若是es沒有啓動,直接啓動便可。express
1 -- 若是是集羣式的,每一個節點在每一個機器上面,執行以下命令就能夠中止es 2 [root@slaver4 elasticsearch-analysis-ik]# kill `ps -ef | grep Elasticsearch | grep -v grep | awk '{print $2}'`
個人es僞集羣未啓動,這裏直接啓動便可,僞集羣,你啓動最好同時啓動,否則會出現一些報錯,由於他們要感應到集羣中的節點,可是沒有很大影響的。npm
在啓動head插件的時候,報這種錯誤,總感受內心不爽,Local Npm module "grunt-contrib-jasmine" not found. Is it installed?。json
1 [elsearch@slaver4 soft]$ cd elasticsearch-head-master/ 2 [elsearch@slaver4 elasticsearch-head-master]$ ls 3 crx Dockerfile-alpine Gruntfile.js index.html node_modules package-lock.json proxy _site test 4 Dockerfile elasticsearch-head.sublime-project grunt_fileSets.js LICENCE package.json plugin-descriptor.properties README.textile src 5 [elsearch@slaver4 elasticsearch-head-master]$ cd node_modules/grunt 6 [elsearch@slaver4 grunt]$ ls 7 bin CHANGELOG lib LICENSE node_modules package.json README.md 8 [elsearch@slaver4 grunt]$ cd bin/ 9 [elsearch@slaver4 bin]$ ls 10 grunt 11 [elsearch@slaver4 bin]$ ./grunt -server & 12 [1] 8602 13 [elsearch@slaver4 bin]$ >> Local Npm module "grunt-contrib-jasmine" not found. Is it installed? 14 Warning: Task "jasmine" not found. Use --force to continue. 15 16 Aborted due to warnings. 17 18 [1]+ Exit 3 ./grunt -server 19 [elsearch@slaver4 bin]$ jps 20 8388 Elasticsearch 21 8457 Elasticsearch 22 8527 Elasticsearch 23 8623 Jps 24 [elsearch@slaver4 bin]$ ls 25 grunt 26 [elsearch@slaver4 bin]$ ./grunt server & 27 [1] 8633 28 [elsearch@slaver4 bin]$ >> Local Npm module "grunt-contrib-jasmine" not found. Is it installed? 29 30 Running "connect:server" (connect) task 31 Waiting forever... 32 Started connect web server on http://192.168.110.133:9100
如何解決上面的錯誤呢,以下所示,安裝錯誤的缺失包便可。切記,進入到head的目錄,執行命令便可npm install grunt-contrib-jasmine。
若是缺乏下面的包,執行如是命令便可。
[elsearch@slaver4 elasticsearch-head-master]$ npm install grunt-contrib-clean grunt-contrib-concat grunt-contrib-watch grunt-contrib-connect grunt-contrib-copy grunt-contrib-jasmine
或者,使用淘寶的源,下載比較快些。
[elsearch@slaver4 elasticsearch-head-master]$ npm install grunt-contrib-clean grunt-contrib-concat grunt-contrib-watch grunt-contrib-connect grunt-contrib-copy grunt-contrib-jasmine --registry=https://registry.npm.taobao.org
1 [elsearch@slaver4 elasticsearch-head-master]$ ls 2 crx Dockerfile-alpine Gruntfile.js index.html node_modules package-lock.json proxy _site test 3 Dockerfile elasticsearch-head.sublime-project grunt_fileSets.js LICENCE package.json plugin-descriptor.properties README.textile src 4 [elsearch@slaver4 elasticsearch-head-master]$ npm install grunt-contrib-jasmine 5 npm WARN elasticsearch-head@0.0.0 license should be a valid SPDX license expression 6 npm WARN optional SKIPPING OPTIONAL DEPENDENCY: fsevents@1.2.9 (node_modules/fsevents): 7 npm WARN notsup SKIPPING OPTIONAL DEPENDENCY: Unsupported platform for fsevents@1.2.9: wanted {"os":"darwin","arch":"any"} (current: {"os":"linux","arch":"x64"}) 8 9 + grunt-contrib-jasmine@1.0.3 10 added 9 packages from 13 contributors in 30.811s 11 [elsearch@slaver4 elasticsearch-head-master]$
搞了一下,把head插件搞的起不來了,也是鬱悶。啓動錯誤以下所示。
1 [elsearch@slaver4 bin]$ >> No "clean" targets found. 2 Warning: Task "clean" failed. Use --force to continue. 3 4 Aborted due to warnings. 5 6 [1]+ Exit 3 ./grunt -server
起不來了,估計是依賴包,出現了問題,我這裏直接就將全部的依賴包都更新到最新了。
注意:我使用命令./grunt -server &,這個命令是使用錯誤了,因此下面更新到最新的依賴包能夠不進行操做,浪費時間。
1 [elsearch@slaver4 bin]$ npm install grunt@latest 2 stall grunt-contrib-connect@latest 3 npm ERR! code ENOSELF 4 npm ERR! Refusing to install package with name "grunt" under a package 5 npm ERR! also called "grunt". Did you name your project the same 6 npm ERR! as the dependency you're installing? 7 npm ERR! 8 npm ERR! For more information, see: 9 npm ERR! <https://docs.npmjs.com/cli/install#limitations-of-npms-install-algorithm> 10 11 npm ERR! A complete log of this run can be found in: 12 npm ERR! /home/elsearch/.npm/_logs/2019-10-20T09_08_08_039Z-debug.log 13 [elsearch@slaver4 bin]$ npm install grunt-cli@latest 14 npm notice created a lockfile as package-lock.json. You should commit this file. 15 + grunt-cli@1.3.2 16 added 148 packages from 121 contributors, updated 2 packages and audited 984 packages in 180.566s 17 found 0 vulnerabilities 18 19 [elsearch@slaver4 bin]$ npm install grunt-contrib-copy@latest 20 + grunt-contrib-copy@1.0.0 21 added 9 packages from 3 contributors and audited 994 packages in 36.523s 22 found 0 vulnerabilities 23 24 [elsearch@slaver4 bin]$ npm install grunt-contrib-concat@latest 25 npm WARN grunt-contrib-concat@1.0.1 requires a peer of grunt@>=0.4.0 but none is installed. You must install peer dependencies yourself. 26 27 + grunt-contrib-concat@1.0.1 28 added 1 package from 1 contributor and audited 1004 packages in 5.282s 29 found 0 vulnerabilities 30 31 [elsearch@slaver4 bin]$ npm install grunt-contrib-uglify@latest 32 npm WARN grunt-contrib-concat@1.0.1 requires a peer of grunt@>=0.4.0 but none is installed. You must install peer dependencies yourself. 33 34 + grunt-contrib-uglify@4.0.1 35 added 18 packages from 44 contributors and audited 1032 packages in 84.271s 36 found 0 vulnerabilities 37 38 [elsearch@slaver4 bin]$ npm install grunt-contrib-clean@latest 39 npm WARN grunt-contrib-concat@1.0.1 requires a peer of grunt@>=0.4.0 but none is installed. You must install peer dependencies yourself. 40 npm WARN grunt-contrib-clean@2.0.0 requires a peer of grunt@>=0.4.5 but none is installed. You must install peer dependencies yourself. 41 42 + grunt-contrib-clean@2.0.0 43 added 15 packages from 8 contributors and audited 1050 packages in 16.732s 44 found 0 vulnerabilities 45 46 [elsearch@slaver4 bin]$ npm install grunt-contrib-watch@latest 47 npm WARN grunt-contrib-clean@2.0.0 requires a peer of grunt@>=0.4.5 but none is installed. You must install peer dependencies yourself. 48 npm WARN grunt-contrib-concat@1.0.1 requires a peer of grunt@>=0.4.0 but none is installed. You must install peer dependencies yourself. 49 50 + grunt-contrib-watch@1.1.0 51 added 24 packages from 28 contributors in 46.459s 52 [elsearch@slaver4 bin]$ npm install grunt-contrib-connect@latest 53 npm WARN grunt-contrib-clean@2.0.0 requires a peer of grunt@>=0.4.5 but none is installed. You must install peer dependencies yourself. 54 npm WARN grunt-contrib-concat@1.0.1 requires a peer of grunt@>=0.4.0 but none is installed. You must install peer dependencies yourself. 55 npm WARN grunt-contrib-connect@2.1.0 requires a peer of grunt@>=0.4.0 but none is installed. You must install peer dependencies yourself. 56 57 + grunt-contrib-connect@2.1.0 58 added 69 packages from 69 contributors and audited 1226 packages in 77.331s 59 found 0 vulnerabilities 60 61 [elsearch@slaver4 bin]$ npm install grunt-contrib-jasmine@latest 62 63 > puppeteer@1.20.0 install /home/hadoop/soft/elasticsearch-head-master/node_modules/grunt/node_modules/puppeteer 64 > node install.js 65 66 Downloading Chromium r686378 - 114 Mb [====================] 100% 0.0s 67 Chromium downloaded to /home/hadoop/soft/elasticsearch-head-master/node_modules/grunt/node_modules/puppeteer/.local-chromium/linux-686378 68 npm WARN grunt-contrib-clean@2.0.0 requires a peer of grunt@>=0.4.5 but none is installed. You must install peer dependencies yourself. 69 npm WARN grunt-contrib-concat@1.0.1 requires a peer of grunt@>=0.4.0 but none is installed. You must install peer dependencies yourself. 70 npm WARN grunt-contrib-connect@2.1.0 requires a peer of grunt@>=0.4.0 but none is installed. You must install peer dependencies yourself. 71 npm WARN grunt-eslint@22.0.0 requires a peer of grunt@>=1 but none is installed. You must install peer dependencies yourself. 72 73 + grunt-contrib-jasmine@2.1.0 74 added 244 packages from 120 contributors and audited 2725 packages in 424.187s 75 found 3 high severity vulnerabilities 76 run `npm audit fix` to fix them, or `npm audit` for details 77 [elsearch@slaver4 bin]$ ls 78 grunt 79 [elsearch@slaver4 bin]$ ./grunt server & 80 [1] 8297 81 [elsearch@slaver4 bin]$ Running "connect:server" (connect) task 82 Waiting forever... 83 Started connect web server on http://192.168.110.133:9100 84 85 [elsearch@slaver4 bin]$
三、迴歸正題,因爲安裝好了IK中文分詞器,如今須要測試一下,是否安裝成功了。
首先建立索引名字叫news,建立完畢之後能夠在head插件看到你建立的index索引名稱news,以下所示:
1 [elsearch@slaver4 soft]$ ls 2 elasticsearch-5.4.3 elasticsearch-head-master el_slave node-v8.16.2-linux-x64 nohup.out 3 [elsearch@slaver4 soft]$ curl -XPUT http://192.168.110.133:9200/news 4 {"acknowledged":true,"shards_acknowledged":true}[elsearch@slaver4 soft]$ 5 [elsearch@slaver4 soft]$
建立好索引之後,建立mapping映射(至關於數據中的schema信息,表名和字段名以及字段的類型),指定那些字段使用什麼分詞器。切記,三個節點的plugins目錄都要放IK中文分詞器。
注意:text是分詞,存儲,建索引。analyzer指定建立索引的時候使用的分詞器是IK中文分詞器。search_analyzer搜索的時候使用IK中文分詞器。
1 [elsearch@slaver4 soft]$ curl -XPOST http://192.168.110.133:9200/news/fulltext/_mapping -d' 2 { 3 "properties": { 4 "content": { 5 "type": "text", 6 "analyzer": "ik_max_word", 7 "search_analyzer": "ik_max_word" 8 } 9 } 10 11 }' 12 {"acknowledged":true}[elsearch@slaver4 soft]$
而後開始測試,是否能將中文進行分詞。指定中文分詞器ik_max_word,-d後面是傳參的。
1 [elsearch@slaver4 soft]$ curl -XGET 'http://192.168.110.133:9200/_analyze?pretty&analyzer=ik_max_word' -d '我要成爲java高級工程師' 2 { 3 "tokens" : [ 4 { 5 "token" : "我", 6 "start_offset" : 0, 7 "end_offset" : 1, 8 "type" : "CN_CHAR", 9 "position" : 0 10 }, 11 { 12 "token" : "要", 13 "start_offset" : 1, 14 "end_offset" : 2, 15 "type" : "CN_CHAR", 16 "position" : 1 17 }, 18 { 19 "token" : "成爲", 20 "start_offset" : 2, 21 "end_offset" : 4, 22 "type" : "CN_WORD", 23 "position" : 2 24 }, 25 { 26 "token" : "java", 27 "start_offset" : 4, 28 "end_offset" : 8, 29 "type" : "ENGLISH", 30 "position" : 3 31 }, 32 { 33 "token" : "高級工程師", 34 "start_offset" : 8, 35 "end_offset" : 13, 36 "type" : "CN_WORD", 37 "position" : 4 38 }, 39 { 40 "token" : "高級工", 41 "start_offset" : 8, 42 "end_offset" : 11, 43 "type" : "CN_WORD", 44 "position" : 5 45 }, 46 { 47 "token" : "高級", 48 "start_offset" : 8, 49 "end_offset" : 10, 50 "type" : "CN_WORD", 51 "position" : 6 52 }, 53 { 54 "token" : "工程師", 55 "start_offset" : 10, 56 "end_offset" : 13, 57 "type" : "CN_WORD", 58 "position" : 7 59 }, 60 { 61 "token" : "工程", 62 "start_offset" : 10, 63 "end_offset" : 12, 64 "type" : "CN_WORD", 65 "position" : 8 66 }, 67 { 68 "token" : "師", 69 "start_offset" : 12, 70 "end_offset" : 13, 71 "type" : "CN_CHAR", 72 "position" : 9 73 } 74 ] 75 } 76 [elsearch@slaver4 soft]$
ik_smart是更加智能的分詞器,推薦使用的,效果以下所示:
1 [elsearch@slaver4 soft]$ curl -XGET 'http://192.168.110.133:9200/_analyze?pretty&analyzer=ik_smart' -d '我要成爲Java高級工程師' 2 { 3 "tokens" : [ 4 { 5 "token" : "我", 6 "start_offset" : 0, 7 "end_offset" : 1, 8 "type" : "CN_CHAR", 9 "position" : 0 10 }, 11 { 12 "token" : "要", 13 "start_offset" : 1, 14 "end_offset" : 2, 15 "type" : "CN_CHAR", 16 "position" : 1 17 }, 18 { 19 "token" : "成爲", 20 "start_offset" : 2, 21 "end_offset" : 4, 22 "type" : "CN_WORD", 23 "position" : 2 24 }, 25 { 26 "token" : "java", 27 "start_offset" : 4, 28 "end_offset" : 8, 29 "type" : "ENGLISH", 30 "position" : 3 31 }, 32 { 33 "token" : "高級工程師", 34 "start_offset" : 8, 35 "end_offset" : 13, 36 "type" : "CN_WORD", 37 "position" : 4 38 } 39 ] 40 } 41 [elsearch@slaver4 soft]$
至此,IK中文分詞器就建立完畢了。
如何向索引Index中類型Type中添加數據,數據插入成功能夠在head插件進行瀏覽,以下所示。
1 [elsearch@slaver4 soft]$ curl -XPOST http://192.168.110.133:9200/news/fulltext/1 -d' 2 > {"content":"美國留給伊拉克的是個爛攤子嗎"}' 3 {"_index":"news","_type":"fulltext","_id":"1","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"created":true}[elsearch@slaver4 soft]$ 4 [elsearch@slaver4 soft]$ curl -XPOST http://192.168.110.133:9200/news/fulltext/2 -d' 5 > {"content":"公安部:各地校車將享最高路權"}' 6 {"_index":"news","_type":"fulltext","_id":"2","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"created":true}[elsearch@slaver4 soft]$ 7 [elsearch@slaver4 soft]$ curl -XPOST http://192.168.110.133:9200/news/fulltext/3 -d' 8 > {"content":"中韓漁警衝突調查:韓警平均天天扣1艘中國漁船"}' 9 {"_index":"news","_type":"fulltext","_id":"3","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"created":true}[elsearch@slaver4 soft]$ 10 [elsearch@slaver4 soft]$ curl -XPOST http://192.168.110.133:9200/news/fulltext/4 -d' 11 > {"content":"中國駐洛杉磯領事館遭亞裔男子槍擊 嫌犯已自首"}' 12 {"_index":"news","_type":"fulltext","_id":"4","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"created":true}[elsearch@slaver4 soft]$ x":"news","_type":"fulltext","_id":"4","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"created":true}[elsearch@slaver4 soft]$
數據插入成功之後,能夠進行查詢高亮顯示,以下所示:
1 [elsearch@slaver4 soft]$ curl -XPOST http://192.168.110.133:9200/news/fulltext/_search -d' 2 > { 3 > "query" : { "match" : { "content" : "中國" }}, 4 > "highlight" : { 5 > "pre_tags" : ["<font color='red'>", "<tag2>"], 6 > "post_tags" : ["</font>", "</tag2>"], 7 > "fields" : { 8 > "content" : {} 9 > } 10 > } 11 > }' 12 {"took":194,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":2,"max_score":0.5347766,"hits":[{"_index":"news","_type":"fulltext","_id":"4","_score":0.5347766,"_source": 13 {"content":"中國駐洛杉磯領事館遭亞裔男子槍擊 嫌犯已自首"},"highlight":{"content":["<font color=red>中國</font>駐洛杉磯領事館遭亞裔男子槍擊 嫌犯已自首"]}},{"_index":"news","_type":"fulltext","_id":"3","_score":0.27638745,"_source": 14 {"content":"中韓漁警衝突調查:韓警平均天天扣1艘中國漁船"},"highlight":{"content":["中韓漁警衝突調查:韓警平均天天扣1艘<font color=red>中國</font>漁船"]}}]}}[elsearch@slaver4 soft]$ 15 [elsearch@slaver4 soft]$
做者:別先生
博客園:https://www.cnblogs.com/biehongli/
若是您想及時獲得我的撰寫文章以及著做的消息推送,能夠掃描上方二維碼,關注我的公衆號哦。