ElasticSearch 是強大的搜索工具,而且是ELK套件的重要組成部分node
好記性不如亂筆頭,此次是在windows環境下搭建es中文分詞搜索測試環境,步驟以下git
一、安裝jdk1.8,配置好環境變量github
二、下載ElasticSearch7.1.1,版本變化比較快,剛纔看了下最新版已是7.2.0,本環境基於7.1.1搭建,下載地址https://www.elastic.co/cn/downloads/elasticsearch,獲得一個zip壓縮包,解壓縮後cmd下運行下面的命令便可啓動ESnpm
./bin/elasticsearch.bat
正常啓動的話提示符下回輸出一些日誌記錄json
瀏覽器中輸入http://localhost:9200/測試服務是否可以正常訪問,正常狀況會顯示下面的概要信息,說明ES搭建成功windows
三、ElasticSearch 雖然提供了強大Restful接口,但沒有一個UI界面操做起來不是很直觀,elasticsearch-head很好的解決這個問題,elasticsearch-head是基於node的一個工具,經過鏈接ES服務提供可視化展現界面,詳細參考:瀏覽器
https://github.com/mobz/elasticsearch-head,安裝步驟也是很簡單,以下app
git clone git://github.com/mobz/elasticsearch-head.git cd elasticsearch-head npm install npm run start
服務正常啓動後顯示界面以下curl
瀏覽器中輸入http://localhost:9100/能夠看到對應UIelasticsearch
四、中文分詞插件詳細介紹見https://github.com/medcl/elasticsearch-analysis-ik,注意版本不要選錯,不然會按照失敗,es7.1.1選擇對應版本,安裝步驟以下:
./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.1.1/elasticsearch-analysis-ik-7.1.1.zip
五、測試中文分詞檢索功能,先創建索引,在postman或者elasticsearch-head中發送以下請求
--建立索引 curl -XPUT http://localhost:9200/news --索引中添加數據 curl -XPOST http://localhost:9200/news/_create/1 -H 'Content-Type:application/json' -d' {"content":"美國留給伊拉克的是個爛攤子嗎"} '
添加的數據以下
添加索引映射
curl -XPOST http://localhost:9200/news/_mapping -H 'Content-Type:application/json' -d' { "properties": { "content": { "type": "text", "analyzer": "ik_max_word", "search_analyzer": "ik_smart" } } }'
ik_max_word ik_smart二者的區別
ik_max_word: 會將文本作最細粒度的拆分,好比會將「中華人民共和國國歌」拆分爲「中華人民共和國,中華人民,中華,華人,人民共和國,人民,人,民,共和國,共和,和,國國,國歌」,會窮盡各類可能的組合,適合 Term Query;
ik_smart: 會作最粗粒度的拆分,好比會將「中華人民共和國國歌」拆分爲「中華人民共和國,國歌」,適合 Phrase 查詢。
測試示例:
http://localhost:9200/_analyze,經過ik_max_word分詞,結果以下
輸入
{"text":"中華人民共和國人民大會堂","analyzer":"ik_max_word" }
輸出
{ "tokens": [ { "token": "中華人民共和國", "start_offset": 0, "end_offset": 7, "type": "CN_WORD", "position": 0 }, { "token": "中華人民", "start_offset": 0, "end_offset": 4, "type": "CN_WORD", "position": 1 }, { "token": "中華", "start_offset": 0, "end_offset": 2, "type": "CN_WORD", "position": 2 }, { "token": "華人", "start_offset": 1, "end_offset": 3, "type": "CN_WORD", "position": 3 }, { "token": "人民共和國", "start_offset": 2, "end_offset": 7, "type": "CN_WORD", "position": 4 }, { "token": "人民", "start_offset": 2, "end_offset": 4, "type": "CN_WORD", "position": 5 }, { "token": "共和國", "start_offset": 4, "end_offset": 7, "type": "CN_WORD", "position": 6 }, { "token": "共和", "start_offset": 4, "end_offset": 6, "type": "CN_WORD", "position": 7 }, { "token": "國人", "start_offset": 6, "end_offset": 8, "type": "CN_WORD", "position": 8 }, { "token": "人民大會堂", "start_offset": 7, "end_offset": 12, "type": "CN_WORD", "position": 9 }, { "token": "人民大會", "start_offset": 7, "end_offset": 11, "type": "CN_WORD", "position": 10 }, { "token": "人民", "start_offset": 7, "end_offset": 9, "type": "CN_WORD", "position": 11 }, { "token": "大會堂", "start_offset": 9, "end_offset": 12, "type": "CN_WORD", "position": 12 }, { "token": "大會", "start_offset": 9, "end_offset": 11, "type": "CN_WORD", "position": 13 }, { "token": "會堂", "start_offset": 10, "end_offset": 12, "type": "CN_WORD", "position": 14 } ] }
若是輸入
{"text":"中華人民共和國人民大會堂","analyzer":"ik_smart" }
輸出
{ "tokens": [ { "token": "中華人民共和國", "start_offset": 0, "end_offset": 7, "type": "CN_WORD", "position": 0 }, { "token": "人民大會堂", "start_offset": 7, "end_offset": 12, "type": "CN_WORD", "position": 1 } ] }
根據分詞檢索輸入語法,請求url:http://localhost:9200/news/_search
輸入:
{ "query" : { "match" : { "content" : "中華人民共和國國歌" }}, "highlight" : { "pre_tags" : ["<tag1>", "<tag2>"], "post_tags" : ["</tag1>", "</tag2>"], "fields" : { "content" : {} } } }
輸出:
{ "took": 11, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 2, "relation": "eq" }, "max_score": 1.6810182, "hits": [ { "_index": "news", "_type": "_doc", "_id": "6", "_score": 1.6810182, "_source": { "content": "中華民族國歌" }, "highlight": { "content": [ "<tag1>中華</tag1>民族<tag1>國歌</tag1>" ] } }, { "_index": "news", "_type": "_doc", "_id": "5", "_score": 0.9426802, "_source": { "content": "人民公社" }, "highlight": { "content": [ "<tag1>人民</tag1>公社" ] } } ] } }
運行效果以下