這是我參與8月更文挑戰的第11天,活動詳情查看:8月更文挑戰
本Elasticsearch相關文章的版本爲:7.4.2markdown
測試數據:app
POST /match_phrase_test/_doc/1
{
"my_text": "my favorite dialet is cold porridge"
}
POST /match_phrase_test/_doc/2
{
"my_text": "when it's cold his favorite food is porridge"
}
複製代碼
match_phrase查詢會對待查詢的文本進行分詞,而後對所獲得的分詞進行phrase查詢。post
例子:測試
POST /match_phrase_test/_search
{
"query": {
"match_phrase": {
"my_text": {
"query": "my favorite"
}
}
}
}
複製代碼
分析:spa
my favorite
通過分詞獲得["my", "favorite"]
;my
後面緊跟favorite
, 但doc2只具備favorite
, 不知足短語要求;{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.6520334,
"hits" : [
{
"_index" : "match_phrase_test",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.6520334,
"_source" : {
"my_text" : "my favorite dialet is cold porridge"
}
}
]
}
}
複製代碼
slop
參數能夠設置容許調換文本順序的最大調換次數,此值是2的倍數。假如文檔裏記錄的是favorite food
,輸入的查詢文本是food favorite
, 那麼調整到和文檔favorite food
的順序同樣須要調換步驟:code
food
放到 favorite
所在的位置;favorite
放到 food
所在的位子。總結:
因此調換一個分詞須要2個slop,調換兩個分詞就須要4個slop,調換n個分詞須要最少2*n個slop, 也能夠理解爲使用(順序錯亂的分詞的個數-1)*2
。
例子:
假如輸入my dialet favorite
,那麼要命中doc1的my favorite dialet is cold porridge
,由於dialet favorite
的順序是錯亂的,只須要調換其中一個便可,所須要的最少slop就是1*2即2. 也能夠這樣計算:(順序錯亂的分詞的個數-1)*2 ==> (2-1)*2orm
POST /match_phrase_test/_search
{
"query": {
"match_phrase": {
"my_text": {
"query": "my dialet favorite is",
"slop": 2
}
}
}
}
複製代碼
查詢結果:索引
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.9197583,
"hits" : [
{
"_index" : "match_phrase_test",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.9197583,
"_source" : {
"my_text" : "my favorite dialet is cold porridge"
}
}
]
}
}
複製代碼
也能夠使用analyzer這個參數指定在進行分詞時的分詞器,默認是使用所查詢的字段的mapping時所顯式指定的search_analyzer或索引的默認analyzer。ip
POST /match_phrase_test/_search
{
"query": {
"match_phrase": {
"my_text": {
"query": "favorite Dialet",
"analyzer": "whitespace"
}
}
}
}
複製代碼
由於指定analyzer爲whitespace,亦即按空格進行分詞,獲得["favorite", "Dialet"]
,
doc1的my_text在進行倒排索引分詞所使用的analyzer爲standard分詞器(以空格分詞,而後統一爲小寫字母),獲得的是["my", "favorite", "dialect", "is", "cold", "porridge"]
,
由於Dialet
並存在doc1的倒排索引裏,因此doc1並不會被命中,因此查詢結果爲空。文檔
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
複製代碼