Elasticsearch的標準版本及以上是支持設置同義詞功能的, 其實也就是除了OSS(開源)版之外其它的都支持.html
$
開頭的表明是shell命令, 不然表示Kibana的console命令同義詞能夠使用synonym
參數來內嵌指定,或者必須 存在於集羣每個節點上的同義詞文件中。 同義詞文件路徑由synonyms_path
參數指定,應絕對或相對於 Elasticsearchconfig
目錄。
下面以同義詞的兩種設置方式來介紹:shell
# 進入Elasticsearch目錄執行,生成文件 $ echo '"iPhone,蘋果手機 => iPhone,蘋果手機", "2233,22娘,33娘 => bilibili,B站"' > config/analysis/synonyms.txt
PUT /goods2 { "settings": { "analysis": { "filter": { "my_synonym_filter": { "type": "synonym", "updateable": true, "synonyms_path": "analysis/synonyms.txt" } }, "analyzer": { "my_synonyms_analyzer": { "tokenizer": "ik_smart", "filter": [ "my_synonym_filter" ] } } } }, "mappings": { "properties": { "title": { "type": "text", "analyzer": "ik_smart", "search_analyzer": "my_synonyms_analyzer" } } } }
my_synonym_filter
是自定義的詞彙過濾器
,my_synonyms_analyzer
是自定義的分析器, 能夠看出後者是包含並引用了前者的.在本索引中自定義的詞彙過慮器和分析器也只能在當前索引中使用.json
updateable
指示可否動態更新, 必須爲true
才能動態更新同義詞api
synonyms_path
指示同義詞文件的位置app
analysis.analyzer.tokenizer
指示在這個分析器裏用ik_smart
的分詞器, 在這個索引中的分析鏈是原始文本 => 分詞器 => 詞彙過濾器
, 即原始文本先通過分詞的結果再用來給詞彙過濾器
處理(在這個索引的做用是同義詞).iphone
mappings.properties.title.search_analyzer
指示title
字段在查詢時使用my_synonyms_analyzer
分析器, 同理mappings.properties.title.analyzer
指示其在索引時使用的分析器.elasticsearch
# 字母大小寫沒有影響 GET goods2/_analyze { "analyzer": "my_synonyms_analyzer", "text": "iphone" } GET goods2/_analyze { "analyzer": "my_synonyms_analyzer", "text": "蘋果手機" }
上面兩條語句的結果是同樣的ide
{ "tokens" : [ { "token" : "iphone", "start_offset" : 0, "end_offset" : 6, "type" : "ENGLISH", "position" : 0 }, { "token" : "蘋果", "start_offset" : 0, "end_offset" : 6, "type" : "SYNONYM", "position" : 0 }, { "token" : "手機", "start_offset" : 0, "end_offset" : 6, "type" : "SYNONYM", "position" : 1 } ] }
GET goods2/_analyze { "analyzer": "my_synonyms_analyzer", "text": "2233" } GET goods2/_analyze { "analyzer": "my_synonyms_analyzer", "text": "22娘" }
結果ui
{ "tokens" : [ { "token" : "bilibili", "start_offset" : 0, "end_offset" : 4, "type" : "SYNONYM", "position" : 0 }, { "token" : "b", "start_offset" : 0, "end_offset" : 4, "type" : "SYNONYM", "position" : 0 }, { "token" : "站", "start_offset" : 0, "end_offset" : 4, "type" : "SYNONYM", "position" : 1 } ] }
# 進入Elasticsearch目錄執行,生成文件 # `iPhone,蘋果手機 => iPhone,蘋果手機`與`iPhone,蘋果手機`的效果是同樣的 # 內容中的雙引號`"`和行末的逗號`,`不是必須的(沒有的話需要有換行符), 這裏只是爲了和和內嵌式的保持一致才這麼寫的 $ echo '"iPhone,蘋果手機", "2233,22娘,33娘 => bilibili,B站,二次元"' > config/analysis/synonyms.txt
# 使新的同義詞生效 POST /goods2/_reload_search_analyzers
GET goods2/_analyze { "analyzer": "my_synonyms_analyzer", "text": "2233" }
結果插件
{ "tokens" : [ { "token" : "bilibili", "start_offset" : 0, "end_offset" : 4, "type" : "SYNONYM", "position" : 0 }, { "token" : "b", "start_offset" : 0, "end_offset" : 4, "type" : "SYNONYM", "position" : 0 }, { "token" : "二次元", "start_offset" : 0, "end_offset" : 4, "type" : "SYNONYM", "position" : 0 }, { "token" : "站", "start_offset" : 0, "end_offset" : 4, "type" : "SYNONYM", "position" : 1 } ] }
同義詞配置就在synonyms
屬性裏
PUT /goods3 { "settings": { "analysis": { "filter": { "my_synonym_filter": { "type": "synonym", "synonyms": [ "iPhone,蘋果手機 => iPhone,蘋果手機", "2233,22娘,33娘 => bilibili,B站" ] } }, "analyzer": { "my_synonyms_analyzer": { "tokenizer": "ik_smart", "filter": [ "my_synonym_filter" ] } } } }, "mappings": { "properties": { "title": { "type": "text", "analyzer": "ik_smart", "search_analyzer": "my_synonyms_analyzer" } } } }
下面的結果跟同義詞文件方式的是同樣的
GET goods3/_analyze { "analyzer": "my_synonyms_analyzer", "text": "iphone" } GET goods3/_analyze { "analyzer": "my_synonyms_analyzer", "text": "2233" }
# 需要先關閉索引才能變動設置 POST /goods3/_close PUT /goods3/_settings/ { "analysis": { "filter": { "my_synonym_filter": { "type": "synonym", "synonyms": [ "iPhone,蘋果手機", "2233,22娘,33娘 => bilibili,B站,二次元" ] } } } } # 從新開啓索引 POST /goods3/_open
以索引goods2
爲例
# 插入一條數據 POST /goods2/_doc/1 { "title":"bilibili是個好平臺" } # 經過`2233`關鍵詞查找 GET /goods2/_search { "query": { "match": { "title": "2233" } } }
結果
{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 1, "relation" : "eq" }, "max_score" : 0.2876821, "hits" : [ { "_index" : "goods2", "_type" : "_doc", "_id" : "1", "_score" : 0.2876821, "_source" : { "title" : "bilibili是個好平臺" } } ] } }
內嵌式
和同義詞文件式
兩種同義詞文件式
能夠在不關閉索引的狀況下動態更新同義詞