Multi Match Query

時間 2019-11-30

標籤 multi match query 简体版

原文原文鏈接

Multi Match Query

　　multi_match查詢建議在match query之上，並容許多字段查詢：html

GET /_search
{
  "query": {
    "multi_match" : {
      "query":    "this is a test", 　　【1】
      "fields": [ "subject", "message" ]  【2】
    }
  }
}

　　【1】查詢字符串java

　　【2】被查詢的字段elasticsearch

`fields` and per-field boosting

　　字段能夠經過通配符指定，例如：ide

GET /_search
{
  "query": {
    "multi_match" : {
      "query":    "Will Smith",
      "fields": [ "title", "*_name" ] 【1】
    }
  }
}

　　【1】查詢title,first_name和last_name字段。ui

　　個別字段能夠經過插入符號（^）來提高：this

GET /_search
{
  "query": {
    "multi_match" : {
      "query" : "this is a test",
      "fields" : [ "subject^3", "message" ] 【1】
    }
  }
}

　　【1】subject字段是message字段的3倍。spa

Types of `multi_match` query:

　　內部執行multi_match查詢的方式依賴於type參數，它能夠被設置成：code

　　best_fields 　　（默認）查找與任何字段匹配的文檔，但使用最佳字段中的_score。看best_fields.htm

　　most_fields　　查找與任何字段匹配的文檔，並聯合每一個字段的_score.blog

　　cross_fields　　採用相同分析器處理字段，就好像他們是一個大的字段。在每一個字段中查找每一個單詞。看cross_fields。

　　phrase　　　　在每一個字段上運行match_phrase查詢並和每一個字段的_score組合。看phrase and phrase_prefix。

　　phrase_prefix 在每一個字段上運行match_phrase_prefix查詢並和每一個字段的_score組合。看phrase and phrase_prefix。

`best_fields`

　　當你在同一個字段中搜索最佳查找的多個單詞時，bese_fields類型是最有效的。例如，"brown fox"單獨在一個字段中比"brown"在一個字段中和"for"在另一個字段中更有意義。

　　best_fields爲每個字段生成match query並在dis_max查詢中包含他們，以發現單個最匹配的字段。例如這個查詢：

GET /_search
{
  "query": {
    "multi_match" : {
      "query":      "brown fox",
      "type":       "best_fields",
      "fields":     [ "subject", "message" ],
      "tie_breaker": 0.3
    }
  }
}

　　也能夠這樣執行：

GET /_search
{
  "query": {
    "dis_max": {
      "queries": [
        { "match": { "subject": "brown fox" }},
        { "match": { "message": "brown fox" }}
      ],
      "tie_breaker": 0.3
    }
  }
}

　　一般best_fields類型使用單個最佳匹配字段的score，可是假如tie_breaker被指定，則它經過如下計算score:

來自最佳匹配字段的score
相加全部其餘匹配字段的tie_breaker * _score。

　　同時也接受analyzer, boost, operator, minimum_should_match, fuzziness, lenient, prefix_length, max_expansions, rewrite, zero_terms_query和cutoff_frequency做爲匹配查詢的解釋。

　　重要：operator 和 minimum_should_match

　　　　best_fields和most_fields類型是field-centric（他們爲每個字段生成匹配查詢）。這意味着爲每個字段單獨提供operator和minimum_should_match參數，這可能不是你想要的。

　　　　以此查詢爲例：

GET /_search
{
  "query": {
    "multi_match" : {
      "query":      "Will Smith",
      "type":       "best_fields",
      "fields":     [ "first_name", "last_name" ],
      "operator":   "and" 【1】
    }
  }
}

　　　　【1】全部的項必須存在

　　　　該查詢也能夠這樣執行：

  (+first_name:will +first_name:smith)
| (+last_name:will  +last_name:smith)

　　換句話說，全部項必須在單個字段中存在，以匹配文檔。查看cross_fields以尋找更好的解決方案。

`most_fields`

　　當查詢使用不一樣方式包含相同文本分析的多個字段時，most_fields類型是很是有用的。例如，main字段可能包含synonyms，stemming 和沒有變音符的項，second字段可能包含original項和third字段包含shingles。經過組合來自三個字段的score，咱們能儘量多的經過main字段匹配文檔，可是使用second和third字段將最類似的結果推送到列表的頂部。

　　該查詢：　

GET /_search
{
  "query": {
    "multi_match" : {
      "query":      "quick brown fox",
      "type":       "most_fields",
      "fields":     [ "title", "title.original", "title.shingles" ]
    }
  }
}

　　可能執行以下：

GET /_search
{
  "query": {
    "bool": {
      "should": [
        { "match": { "title":          "quick brown fox" }},
        { "match": { "title.original": "quick brown fox" }},
        { "match": { "title.shingles": "quick brown fox" }}
      ]
    }
  }
}

　　每個match子句的score將被加在一塊兒，而後經過match子句的數量來分割。

　　也接受analyzer, boost, operator, minimum_should_match, fuzziness, lenient, prefix_length, max_expansions, rewrite, zero_terms_query和cutoff_frequency，做爲match query中的解釋，但請看operator and minimum_should_match。

`phrase` and `phrase_prefix`

　　phrase和phrase_prefix類型行爲就像best_fields，但他們使用match_phrase或者match_phrase_prefix查詢代替match查詢。

　　該查詢：

GET /_search
{
  "query": {
    "multi_match" : {
      "query":      "quick brown f",
      "type":       "phrase_prefix",
      "fields":     [ "subject", "message" ]
    }
  }
}

　　可能執行以下：

GET /_search
{
  "query": {
    "dis_max": {
      "queries": [
        { "match_phrase_prefix": { "subject": "quick brown f" }},
        { "match_phrase_prefix": { "message": "quick brown f" }}
      ]
    }
  }
}

　　也接受analyzer, boost, lenient, slop 和zero_terms_query做爲在match query中的解釋。phrase_prefix類型此外接受max_expansions。

　　重要：phrase,phrase_prefix和fuzziness：fuzziness參數不能被phrase和phrase_prefix類型使用

`cross_fields`

　　cross_fields類型對於多個字段應該匹配的結構文檔特別有用。例如，當爲「Will Smith」查詢first_name和last_name字段時，最佳匹配應該是"Will"在一個字段中而且"Smith"在另一個字段中。

   這聽起來像most_fields的工做，但這種方法有兩個問題。第一個問題是operator和minimum_should_match在每一個前綴字段中做用，以代替前綴項（請參考explanation above）。

　　第二個問題是與關聯性有關：在first_name和last_name字段中不一樣的項頻率可能致使不可預期的結果。

　　例如，想像咱們有兩我的，「Will Smith」和"Smith Jones"。「Smith」做爲姓是很是常見的（因此重要性很低），可是「Smith」做爲名字是很是不常見的（因此重要性很高）。

　　假如咱們搜索「Will Smith」,則「Smith Jones」文檔可能顯示在更加匹配的"Will Smith"上，由於first_name:smith的得分已經賽過first_name:will加last_name:smith的總分。

　　處理該種類型查詢的一種方式是簡單的將first_name和last_name索引字段放入單個full_name字段中。固然，這隻能在索引時間完成。

　　cross_field類型嘗試經過採用term-centric方法在查詢時解決這些問題。首先把查詢字符串分解成當個項，而後在任何字段中查詢每一個項，就好像它們是一個大的字段。

　　查詢就像這樣：

GET /_search
{
  "query": {
    "multi_match" : {
      "query":      "Will Smith",
      "type":       "cross_fields",
      "fields":     [ "first_name", "last_name" ],
      "operator":   "and"
    }
  }
}

　　被執行爲：

+(first_name:will  last_name:will)
+(first_name:smith last_name:smith)

　　換一種說法，全部的項必須至少在匹配文檔中一個字段中出現（比較the logic used for best_fields and most_fields）。

　　解決了兩個問題中的一個。經過混合全部字段項的頻率解決不一樣項匹配的問題，以便平衡差別。

　　在實踐中，first_name:smith將被視爲和last_name:smith具備相同的頻率，加1。這將使得在first_name和last_name上的匹配具備可比較的分數，對於last_name具備微小的優點，由於它是最有可能包含simth的字段。

　　注意，cross_fields一般僅做用與獲得1提高的短字符串字段。不然增長，項頻率和長度正常化有助於得分，使得項統計的混合再也不有任何意義。

　　假如你經過Validata API運行上面的查詢，將返回這樣的解釋：

+blended("will",  fields: [first_name, last_name])
+blended("smith", fields: [first_name, last_name])

　　也接受analyzer, boost, operator, minimum_should_match, lenient, zero_terms_query 和cutoff_frequency,做爲match query的解釋。

`cross_field` and analysis

　　cross_field類型只能在具備相同分析器的字段上以term-centric模式工做。具備相同分析器的字段如上述實例組合在一塊兒。假若有多個組，則他們使用bool查詢相結合。

　　例如，假如咱們有相同分析器的first和last字段，增長一個同時使用edge_ngram分析器的first.edge和last.edge，該查詢：

GET /_search
{
  "query": {
    "multi_match" : {
      "query":      "Jon",
      "type":       "cross_fields",
      "fields":     [
        "first", "first.edge",
        "last",  "last.edge"
      ]
    }
  }
}

　　可能被執行爲：

    blended("jon", fields: [first, last])
| (
    blended("j",   fields: [first.edge, last.edge])
    blended("jo",  fields: [first.edge, last.edge])
    blended("jon", fields: [first.edge, last.edge])
)

　　換句話說，first和last可能被組合在一塊兒並被當作一個字段來對待，同時first.edge和last.edge可能被組合在一塊兒並當作一個字段來對待。

　　具備多個組是好的，當使用operator或者minimum_should_match關聯的時候，它可能遭受和most_fields和best_fields相同的問題。

　　你能夠容易的將該查詢重寫爲兩個獨立的cross_fields查詢與bool查詢相結合，並將minimum_should_match參數應用於其中一個：

GET /_search
{
  "query": {
    "bool": {
      "should": [
        {
          "multi_match" : {
            "query":      "Will Smith",
            "type":       "cross_fields",
            "fields":     [ "first", "last" ],
            "minimum_should_match": "50%" 【1】
          }
        },
        {
          "multi_match" : {
            "query":      "Will Smith",
            "type":       "cross_fields",
            "fields":     [ "*.edge" ]
          }
        }
      ]
    }
  }
}

　　【1】will或smith必須存在於first或last字段。

　　你能夠經過在查詢中指定analyzer參數強制把全部字段放入相同組中。

GET /_search
{
  "query": {
   "multi_match" : {
      "query":      "Jon",
      "type":       "cross_fields",
      "analyzer":   "standard", 【1】
      "fields":     [ "first", "last", "*.edge" ]
    }
  }
}

　　【1】對全部字段使用standard分析器

將執行以下：

blended("will",  fields: [first, first.edge, last.edge, last])
blended("smith", fields: [first, first.edge, last.edge, last])

`tie_breaker`

　　默認狀況，每個per-term混合查詢將使用組中任何字段的最佳分數，而後將這些分數相加，以得出最終分數。tie_breaker參數能夠改變per-term混合查詢的默認行爲，它接受：

　　0.0 　　　　　　獲取最好的分數（舉例）first_name：will和last_name:will（default）

　　1.0　　　　　　全部分數相加（舉例）first_name:will和last_name:will　　

　　0.0 < n < 1.0　將單個最佳分數加上tie_breaker乘以其它每一個匹配字段的分數。　

　　重要：cross_fields and fuzziness

　　　　fuzziness參數不能被cross_fields類型使用。

原文地址：https://www.elastic.co/guide/en/elasticsearch/reference/5.0/query-dsl-multi-match-query.html

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。

Multi Match Query