Elasticsearch拼音分詞和IK分詞的安裝及使用

時間 2020-05-12

標籤 elasticsearch 拼音分詞安裝使用欄目日誌分析简体版

原文原文鏈接

1、Es插件配置及下載git

1.IK分詞器的下載安裝github

關於IK分詞器的介紹再也不多少，一言以蔽之，IK分詞是目前使用很是普遍分詞效果比較好的中文分詞器。作ES開發的，中文分詞十有八九使用的都是IK分詞器。app

下載地址:https://github.com/medcl/elasticsearch-analysis-ikelasticsearch

2.pinyin分詞器的下載安裝測試

能夠在淘寶、京東的搜索框中輸入pinyin就能查找到本身想要的結果，這就是拼音分詞，拼音分詞則是將中文分析成拼音格式，能夠經過拼音分詞分析出來的數據進行查找想要的結果。spa

下載地址：https://github.com/medcl/elasticsearch-analysis-pinyin插件

注：插件下載必定要和本身版本對應的Es版本一致，而且安裝完插件後需重啓Es，才能生效。

插件安裝位置：（本人安裝了三個插件，暫時先不介紹murmur3插件，能夠暫時忽略）3d

插件配置成功，重啓Escode

2、拼音分詞器和IK分詞器的使用blog

1.IK中文分詞器的使用

1.1 ik_smart: 會作最粗粒度的拆分

GET /_analyze { "text":"中華人民共和國國徽", "analyzer":"ik_smart" } 結果： { "tokens": [ { "token": "中華人民共和國", "start_offset": 0, "end_offset": 7, "type": "CN_WORD", "position": 0 }, { "token": "國徽", "start_offset": 7, "end_offset": 9, "type": "CN_WORD", "position": 1 } ] }

1.2 ik_max_word: 會將文本作最細粒度的拆分

GET /_analyze { "text": "中華人民共和國國徽", "analyzer": "ik_max_word" } 結果： { "tokens": [ { "token": "中華人民共和國", "start_offset": 0, "end_offset": 7, "type": "CN_WORD", "position": 0 }, { "token": "中華人民", "start_offset": 0, "end_offset": 4, "type": "CN_WORD", "position": 1 }, { "token": "中華", "start_offset": 0, "end_offset": 2, "type": "CN_WORD", "position": 2 }, { "token": "華人", "start_offset": 1, "end_offset": 3, "type": "CN_WORD", "position": 3 }, { "token": "人民共和國", "start_offset": 2, "end_offset": 7, "type": "CN_WORD", "position": 4 }, { "token": "人民", "start_offset": 2, "end_offset": 4, "type": "CN_WORD", "position": 5 }, { "token": "共和國", "start_offset": 4, "end_offset": 7, "type": "CN_WORD", "position": 6 }, { "token": "共和", "start_offset": 4, "end_offset": 6, "type": "CN_WORD", "position": 7 }, { "token": "國", "start_offset": 6, "end_offset": 7, "type": "CN_CHAR", "position": 8 }, { "token": "國徽", "start_offset": 7, "end_offset": 9, "type": "CN_WORD", "position": 9 } ] }

2.拼音分詞器的使用

GET /_analyze { "text":"劉德華", "analyzer": "pinyin" } 結果: { "tokens": [ { "token": "liu", "start_offset": 0, "end_offset": 1, "type": "word", "position": 0 }, { "token": "ldh", "start_offset": 0, "end_offset": 3, "type": "word", "position": 0 }, { "token": "de", "start_offset": 1, "end_offset": 2, "type": "word", "position": 1 }, { "token": "hua", "start_offset": 2, "end_offset": 3, "type": "word", "position": 2 } ] }

注：無論是拼音分詞器仍是IK分詞器，當深刻搜索一條數據是時，必須是經過分詞器分析的數據，才能被搜索到，不然搜索不到

3、IK分詞和拼音分詞的組合使用

當咱們建立索引時能夠自定義分詞器，經過指定映射去匹配自定義分詞器

PUT /my_index { "settings": { "analysis": { "analyzer": { "ik_smart_pinyin": { "type": "custom", "tokenizer": "ik_smart", "filter": ["my_pinyin", "word_delimiter"] }, "ik_max_word_pinyin": { "type": "custom", "tokenizer": "ik_max_word", "filter": ["my_pinyin", "word_delimiter"] } }, "filter": { "my_pinyin": { "type" : "pinyin", "keep_separate_first_letter" : true, "keep_full_pinyin" : true, "keep_original" : true, "limit_first_letter_length" : 16, "lowercase" : true, "remove_duplicated_term" : true } } } } }

當咱們建type時，須要在字段的analyzer屬性填寫本身的映射

PUT /my_index/my_type/_mapping { "my_type":{ "properties": { "id":{ "type": "integer" }, "name":{ "type": "text", "analyzer": "ik_smart_pinyin" } } } }

測試，讓咱們先添加幾條數據

POST /my_index/my_type/_bulk { "index": { "_id":1}} { "name": "張三"} { "index": { "_id": 2}} { "name": "張四"} { "index": { "_id": 3}} { "name": "李四"}

IK分詞查詢

GET /my_index/my_type/_search { "query": { "match": { "name": "李" } } } 結果： { "took": 3, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 1, "max_score": 0.47160998, "hits": [ { "_index": "my_index", "_type": "my_type", "_id": "3", "_score": 0.47160998, "_source": { "name": "李四" } } ] } }

拼音分詞查詢：

GET /my_index/my_type/_search { "query": { "match": { "name": "zhang" } } } 結果： { "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 2, "max_score": 0.3758317, "hits": [ { "_index": "my_index", "_type": "my_type", "_id": "2", "_score": 0.3758317, "_source": { "name": "張四" } }, { "_index": "my_index", "_type": "my_type", "_id": "1", "_score": 0.3758317, "_source": { "name": "張三" } } ] } }

注：搜索時，先查看被搜索的詞被分析成什麼樣的數據，若是你搜索該詞輸入沒有被分析出的參數時，是查不到的！！！！

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。