首先咱們經過Postman
發送GET
請求查詢分詞效果java
GET http://localhost:9200/_analyze { "text":"農業銀行" }
獲得以下結果,能夠發現es的默認分詞器沒法識別中文中農業
、銀行
這樣的詞彙,而是簡單的將每一個字拆完分爲一個詞,這顯然不符合咱們的使用要求。git
{ "tokens": [ { "token": "農", "start_offset": 0, "end_offset": 1, "type": "<IDEOGRAPHIC>", "position": 0 }, { "token": "業", "start_offset": 1, "end_offset": 2, "type": "<IDEOGRAPHIC>", "position": 1 }, { "token": "銀", "start_offset": 2, "end_offset": 3, "type": "<IDEOGRAPHIC>", "position": 2 }, { "token": "行", "start_offset": 3, "end_offset": 4, "type": "<IDEOGRAPHIC>", "position": 3 } ] }
首先咱們訪問 https://github.com/medcl/elasticsearch-analysis-ik/releases 下載與es對應版本的中文分詞器。將解壓後的後的文件夾放入es根目錄下的plugins目錄下,重啓es便可使用。github
咱們此次加入新的參數"analyzer":"ik_max_word"
json
GET http://localhost:9200/_analyze { "analyzer":"ik_max_word", "text":"農業銀行" }
獲得以下結果elasticsearch
{ "tokens": [ { "token": "農業銀行", "start_offset": 0, "end_offset": 4, "type": "CN_WORD", "position": 0 }, { "token": "農業", "start_offset": 0, "end_offset": 2, "type": "CN_WORD", "position": 1 }, { "token": "銀行", "start_offset": 2, "end_offset": 4, "type": "CN_WORD", "position": 2 } ] }
百度搜索中天天都會收錄新的詞彙,es中也能夠進行擴展詞彙。code
咱們首先查詢弗雷爾卓德字段
xml
GET http://localhost:9200/_analyze { "analyzer":"ik_max_word", "text":"弗雷爾卓德" }
僅僅能夠獲得每一個字的分詞結果,咱們須要作的就是使分詞器識別到弗雷爾卓德
也是一個詞語。token
{ "tokens": [ { "token": "弗", "start_offset": 0, "end_offset": 1, "type": "CN_CHAR", "position": 0 }, { "token": "雷", "start_offset": 1, "end_offset": 2, "type": "CN_CHAR", "position": 1 }, { "token": "爾", "start_offset": 2, "end_offset": 3, "type": "CN_CHAR", "position": 2 }, { "token": "卓", "start_offset": 3, "end_offset": 4, "type": "CN_CHAR", "position": 3 }, { "token": "德", "start_offset": 4, "end_offset": 5, "type": "CN_CHAR", "position": 4 } ] }
首先進入es根目錄中的plugins文件夾下的ik文件夾,進入config目錄,建立custom.dic
文件,寫入弗雷爾卓德
。同時代開IKAnalyzer.cfg
文件,將新建的custom.dic
配置其中,重啓es。rem
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd"> <properties> <comment>IK Analyzer 擴展配置</comment> <!--用戶能夠在這裏配置本身的擴展字典 --> <entry key="ext_dict">custom.doc</entry> <!--用戶能夠在這裏配置本身的擴展中止詞字典--> <entry key="ext_stopwords"></entry> <!--用戶能夠在這裏配置遠程擴展字典 --> <!-- <entry key="remote_ext_dict">words_location</entry> --> <!--用戶能夠在這裏配置遠程擴展中止詞字典--> <!-- <entry key="remote_ext_stopwords">words_location</entry> --> </properties>
再次查詢發現es的分詞器能夠識別到弗雷爾卓德詞彙
get
{ "tokens": [ { "token": "弗雷爾卓德", "start_offset": 0, "end_offset": 5, "type": "CN_WORD", "position": 0 }, { "token": "弗雷爾", "start_offset": 0, "end_offset": 3, "type": "CN_WORD", "position": 1 }, { "token": "卓", "start_offset": 3, "end_offset": 4, "type": "CN_CHAR", "position": 2 }, { "token": "德", "start_offset": 4, "end_offset": 5, "type": "CN_CHAR", "position": 3 } ] }