大數據利器Elasticsearch之全文本查詢之match_phrase_prefix查詢

這是我參與8月更文挑戰的第12天，活動詳情查看：8月更文挑戰
本Elasticsearch相關文章的版本爲：7.4.2markdown

Elasticsearch的match_phrase_prefix查詢是基於match_phrase查詢的基礎上再添加一個prefix查詢組合而成的。返回包含提供的文本的單詞且以相同順序出現的文檔。提供的文本的最後一個分詞被視爲前綴，匹配以該分詞開頭的任何單詞。app

假設有如下文檔：post

POST /match_phrase_prefix_test/_doc/1
{
  "message": "my name is ridingroad"
}


POST /match_phrase_prefix_test/_doc/2
{
  "message": "my last name is ridingroad"
}
複製代碼

如下查詢會返回message字段包含短語my name且緊跟着是以i開頭的文檔：spa

POST /match_phrase_prefix_test/_search
{
  "query": {
    "match_phrase_prefix": {
      "message": "my name i"
    }
  }
}
複製代碼

返回的數據：code

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.5730107,
    "hits" : [
      {
        "_index" : "match_phrase_prefix_test",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.5730107,
        "_source" : {
          "message" : "my name is ridingroad"
        }
      }
    ]
  }
}
複製代碼

同理，如下查詢會返回message字段包含短語my last且緊跟着是以n開頭的文檔：orm

POST /match_phrase_prefix_test/_search
{
  "query": {
    "match_phrase_prefix": {
      "message": "my last n"
    }
  }
}
複製代碼

返回的數據：排序

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0117995,
    "hits" : [
      {
        "_index" : "match_phrase_prefix_test",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0117995,
        "_source" : {
          "message" : "my last name is ridingroad"
        }
      }
    ]
  }
}
複製代碼

和match_pharse同樣，能夠經過slop參數控制容許調換分詞位置的數量，具體可見match_phrase查詢。索引

經過添加slop參數，下面的查詢將能夠匹配包含my last短語且緊跟着以n開頭的doc2：ip

POST /match_phrase_prefix_test/_search
{
  "query": {
    "match_phrase_prefix": {
      "message": {
        "query": "my name last",
        "slop": 2
      }
    }
  }
}
複製代碼

返回的數據：文檔

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.47492626,
    "hits" : [
      {
        "_index" : "match_phrase_prefix_test",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.47492626,
        "_source" : {
          "message" : "my last name is ridingroad"
        }
      }
    ]
  }
}

複製代碼

一樣，也能夠爲查詢的字段在進行查詢前指定analyzer進行分詞。默認爲使用建立mapping時的查詢字段的analyzer，若是建立mapping沒有顯示指定analyzer，則將使用索引級別的默認analyzer。

注意： 假設你查詢my last n,最後的一個分詞n,將會進行查詢已n開頭的分詞，可是有可能你想要的那個以n開頭的單詞並不在你的查詢結果裏面。由於Elasticsearch會去以n開頭的已排序的分詞裏面取前50個放到查詢中，加入你想查的是my last nzz, 那麼頗有可能nzz不在前50個分詞裏面。