[轉]23個最有用的Elasticsearch檢索技巧

前言

本文主要介紹 Elasticsearch 23種最有用的檢索技巧,提供了詳盡的源碼舉例,並配有相應的Java API實現,是不可多得的 Elasticsearch 學習&實戰資料html

數據準備

爲了講解不一樣類型 ES 檢索,咱們將要對包含如下類型的文檔集合進行檢索:git

title               標題
authors             做者
summary             摘要
publish_date        發佈日期
num_reviews         評論數
publisher           出版社
複製代碼

首先,咱們藉助 bulk API 批量建立新的索引並提交數據github

# 設置索引 settings
PUT /bookdb_index
{ "settings": { "number_of_shards": 1 }}

# bulk 提交數據
POST /bookdb_index/book/_bulk
{"index":{"_id":1}}
{"title":"Elasticsearch: The Definitive Guide","authors":["clinton gormley","zachary tong"],"summary":"A distibuted real-time search and analytics engine","publish_date":"2015-02-07","num_reviews":20,"publisher":"oreilly"}
{"index":{"_id":2}}
{"title":"Taming Text: How to Find, Organize, and Manipulate It","authors":["grant ingersoll","thomas morton","drew farris"],"summary":"organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization","publish_date":"2013-01-24","num_reviews":12,"publisher":"manning"}
{"index":{"_id":3}}
{"title":"Elasticsearch in Action","authors":["radu gheorge","matthew lee hinman","roy russo"],"summary":"build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms","publish_date":"2015-12-03","num_reviews":18,"publisher":"manning"}
{"index":{"_id":4}}
{"title":"Solr in Action","authors":["trey grainger","timothy potter"],"summary":"Comprehensive guide to implementing a scalable search engine using Apache Solr","publish_date":"2014-04-05","num_reviews":23,"publisher":"manning"}
複製代碼

注意:本文實驗使用的ES版本是 ES 6.3.0正則表達式

一、基本匹配檢索( Basic Match Query)

1.1 全文檢索

有兩種方式能夠執行全文檢索:緩存

1)使用包含參數的檢索API,參數做爲URL的一部分bash

舉例:如下對 "guide" 執行全文檢索服務器

GET bookdb_index/book/_search?q=guide

[Results]
  "hits": {
    "total": 2,
    "max_score": 1.3278645,
    "hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "4",
        "_score": 1.3278645,
        "_source": {
          "title": "Solr in Action",
          "authors": [
            "trey grainger",
            "timothy potter"
          ],
          "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
          "publish_date": "2014-04-05",
          "num_reviews": 23,
          "publisher": "manning"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "1",
        "_score": 1.2871116,
        "_source": {
          "title": "Elasticsearch: The Definitive Guide",
          "authors": [
            "clinton gormley",
            "zachary tong"
          ],
          "summary": "A distibuted real-time search and analytics engine",
          "publish_date": "2015-02-07",
          "num_reviews": 20,
          "publisher": "oreilly"
        }
      }
    ]
  }
複製代碼

2)使用完整的ES DSL,其中Json body做爲請求體 其執行結果如方式 1)結果一致.微信

GET bookdb_index/book/_search
{
  "query": {
    "multi_match": {
      "query": "guide",
      "fields" : ["_all"]
    }
  }
}
複製代碼

解讀: 使用multi_match關鍵字代替match關鍵字,做爲對多個字段運行相同查詢的方便的簡寫方式。 fields屬性指定要查詢的字段,在這種狀況下,咱們要對文檔中的全部字段進行查詢app

注意:ES 6.x 默認不啓用 _all 字段, 不指定 fields 默認搜索爲全部字段elasticsearch

1.2 指定特定字段檢索

這兩個API也容許您指定要搜索的字段。
例如,要在標題字段(title)中搜索帶有 "in action" 字樣的圖書

1)URL檢索方式

GET bookdb_index/book/_search?q=title:in action

[Results]
  "hits": {
    "total": 2,
    "max_score": 1.6323128,
    "hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "3",
        "_score": 1.6323128,
        "_source": {
          "title": "Elasticsearch in Action",
          "authors": [
            "radu gheorge",
            "matthew lee hinman",
            "roy russo"
          ],
          "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
          "publish_date": "2015-12-03",
          "num_reviews": 18,
          "publisher": "manning"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "4",
        "_score": 1.6323128,
        "_source": {
          "title": "Solr in Action",
          "authors": [
            "trey grainger",
            "timothy potter"
          ],
          "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
          "publish_date": "2014-04-05",
          "num_reviews": 23,
          "publisher": "manning"
        }
      }
    ]
  }
複製代碼

2)DSL檢索方式 然而,full body的DSL爲您提供了建立更復雜查詢的更多靈活性(咱們將在後面看到)以及指定您但願的返回結果。在下面的示例中,咱們指定要返回的結果數、偏移量(對分頁有用)、咱們要返回的文檔字段以及屬性的高亮顯示。

結果數的表示方式:size
偏移值的表示方式:from
指定返回字段 的表示方式 :_source
高亮顯示 的表示方式 :highliaght

GET bookdb_index/book/_search
{
  "query": {
    "match": {
      "title": "in action"
    }
  },
  "size": 2,
  "from": 0,
  "_source": ["title", "summary", "publish_date"],
  "highlight": {
    "fields": {
      "title": {}
    }
  }
}

[Results]
  "hits": {
    "total": 2,
    "max_score": 1.6323128,
    "hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "3",
        "_score": 1.6323128,
        "_source": {
          "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
          "title": "Elasticsearch in Action",
          "publish_date": "2015-12-03"
        },
        "highlight": {
          "title": [
            "Elasticsearch <em>in</em> <em>Action</em>"
          ]
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "4",
        "_score": 1.6323128,
        "_source": {
          "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
          "title": "Solr in Action",
          "publish_date": "2014-04-05"
        },
        "highlight": {
          "title": [
            "Solr <em>in</em> <em>Action</em>"
          ]
        }
      }
    ]
  }
複製代碼

注意:

  1. 對於 multi-word 檢索,匹配查詢容許您指定是否使用 and 運算符, 而不是使用默認 or 運算符 ---> "operator" : "and"
  2. 您還能夠指定 minimum_should_match 選項來調整返回結果的相關性,詳細信息能夠在Elasticsearch指南中查詢Elasticsearch guide獲取。

二、多字段檢索 (Multi-field Search)

如咱們已經看到的,要在搜索中查詢多個文檔字段(例如在標題和摘要中搜索相同的查詢字符串),請使用multi_match查詢

GET bookdb_index/book/_search
{
  "query": {
    "multi_match": {
      "query": "guide", 
      "fields": ["title", "summary"]
    }
  }
}

[Results]
  "hits": {
    "total": 3,
    "max_score": 2.0281231,
    "hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "1",
        "_score": 2.0281231,
        "_source": {
          "title": "Elasticsearch: The Definitive Guide",
          "authors": [
            "clinton gormley",
            "zachary tong"
          ],
          "summary": "A distibuted real-time search and analytics engine",
          "publish_date": "2015-02-07",
          "num_reviews": 20,
          "publisher": "oreilly"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "4",
        "_score": 1.3278645,
        "_source": {
          "title": "Solr in Action",
          "authors": [
            "trey grainger",
            "timothy potter"
          ],
          "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
          "publish_date": "2014-04-05",
          "num_reviews": 23,
          "publisher": "manning"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "3",
        "_score": 1.0333893,
        "_source": {
          "title": "Elasticsearch in Action",
          "authors": [
            "radu gheorge",
            "matthew lee hinman",
            "roy russo"
          ],
          "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
          "publish_date": "2015-12-03",
          "num_reviews": 18,
          "publisher": "manning"
        }
      }
    ]
  }
複製代碼

注意:以上結果中文檔4(_id=4)匹配的緣由是guide在summary存在。

三、 Boosting提高某字段得分的檢索( Boosting)

因爲咱們正在多個字段進行搜索,咱們可能但願提升某一字段的得分。 在下面的例子中,咱們將「摘要」字段的得分提升了3倍,以增長「摘要」字段的重要性,從而提升文檔 4 的相關性。

GET bookdb_index/book/_search
{
  "query": {
    "multi_match": {
      "query": "elasticsearch guide", 
      "fields": ["title", "summary^3"]
    }
  },
  "_source": ["title", "summary", "publish_date"]
}

[Results]
  "hits": {
    "total": 3,
    "max_score": 3.9835935,
    "hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "4",
        "_score": 3.9835935,
        "_source": {
          "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
          "title": "Solr in Action",
          "publish_date": "2014-04-05"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "3",
        "_score": 3.1001682,
        "_source": {
          "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
          "title": "Elasticsearch in Action",
          "publish_date": "2015-12-03"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "1",
        "_score": 2.0281231,
        "_source": {
          "summary": "A distibuted real-time search and analytics engine",
          "title": "Elasticsearch: The Definitive Guide",
          "publish_date": "2015-02-07"
        }
      }
    ]
  }
複製代碼

注意:Boosting不只意味着計算得分乘法以增長因子。 實際的提高得分值是經過歸一化和一些內部優化。參考 Elasticsearch guide查看更多

四、Bool檢索( Bool Query)

可使用 AND / OR / NOT 運算符來微調咱們的搜索查詢,以提供更相關或指定的搜索結果。

在搜索API中是經過bool查詢來實現的。 bool查詢接受 must 參數(等效於AND),一個 must_not 參數(至關於NOT)或者一個 should 參數(等同於OR)。

例如,若是我想在標題中搜索一本名爲 "Elasticsearch" 或 "Solr" 的書,AND由 "clinton gormley" 創做,但NOT由 "radu gheorge" 創做

GET bookdb_index/book/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "bool": {
            "should": [
              {"match": {"title": "Elasticsearch"}},
              {"match": {"title": "Solr"}}
            ]
          }
        },
        {
          "match": {"authors": "clinton gormely"}
        }
      ],
      "must_not": [
        {
          "match": {"authors": "radu gheorge"}
        }
      ]
    }
  }
}

[Results]
  "hits": {
    "total": 1,
    "max_score": 2.0749094,
    "hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "1",
        "_score": 2.0749094,
        "_source": {
          "title": "Elasticsearch: The Definitive Guide",
          "authors": [
            "clinton gormley",
            "zachary tong"
          ],
          "summary": "A distibuted real-time search and analytics engine",
          "publish_date": "2015-02-07",
          "num_reviews": 20,
          "publisher": "oreilly"
        }
      }
    ]
  }
複製代碼

關於bool查詢中的should, 有兩種狀況:

  • 當should的同級存在must的時候,should中的條件能夠知足也能夠不知足,知足的越多得分越高
  • 當沒有must的時候,默認should中的條件至少要知足一個

注意:您能夠看到,bool查詢能夠包含任何其餘查詢類型,包括其餘布爾查詢,以建立任意複雜或深度嵌套的查詢

五、 Fuzzy 模糊檢索( Fuzzy Queries)

在 Match檢索 和多匹配檢索中能夠啓用模糊匹配來捕捉拼寫錯誤。 基於與原始詞的 Levenshtein 距離來指定模糊度

GET bookdb_index/book/_search
{
  "query": {
    "multi_match": {
      "query": "comprihensiv guide",
      "fields": ["title","summary"],
      "fuzziness": "AUTO"
    }
  },
  "_source": ["title","summary","publish_date"],
  "size": 2
}

[Results]
  "hits": {
    "total": 2,
    "max_score": 2.4344182,
    "hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "4",
        "_score": 2.4344182,
        "_source": {
          "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
          "title": "Solr in Action",
          "publish_date": "2014-04-05"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "1",
        "_score": 1.2871116,
        "_source": {
          "summary": "A distibuted real-time search and analytics engine",
          "title": "Elasticsearch: The Definitive Guide",
          "publish_date": "2015-02-07"
        }
      }
    ]
  }
複製代碼

"AUTO" 的模糊值至關於當字段長度大於5時指定值2。可是,設置80%的拼寫錯誤的編輯距離爲1,將模糊度設置爲1可能會提升總體搜索性能。 有關更多信息, Typos and Misspellingsch

六、 Wildcard Query 通配符檢索

通配符查詢容許您指定匹配的模式,而不是整個詞組(term)檢索

  • ? 匹配任何字符
    • 匹配零個或多個字符

舉例,要查找具備以 "t" 字母開頭的做者的全部記錄,以下所示

GET bookdb_index/book/_search
{
  "query": {
    "wildcard": {
      "authors": {
        "value": "t*"
      }
    }
  },
  "_source": ["title", "authors"],
  "highlight": {
    "fields": {
      "authors": {}
    }
  }
}

[Results]
  "hits": {
    "total": 3,
    "max_score": 1,
    "hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "1",
        "_score": 1,
        "_source": {
          "title": "Elasticsearch: The Definitive Guide",
          "authors": [
            "clinton gormley",
            "zachary tong"
          ]
        },
        "highlight": {
          "authors": [
            "zachary <em>tong</em>"
          ]
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "2",
        "_score": 1,
        "_source": {
          "title": "Taming Text: How to Find, Organize, and Manipulate It",
          "authors": [
            "grant ingersoll",
            "thomas morton",
            "drew farris"
          ]
        },
        "highlight": {
          "authors": [
            "<em>thomas</em> morton"
          ]
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "4",
        "_score": 1,
        "_source": {
          "title": "Solr in Action",
          "authors": [
            "trey grainger",
            "timothy potter"
          ]
        },
        "highlight": {
          "authors": [
            "<em>trey</em> grainger",
            "<em>timothy</em> potter"
          ]
        }
      }
    ]
  }
複製代碼

七、正則表達式檢索( Regexp Query)

正則表達式能指定比通配符檢索更復雜的檢索模式,舉例以下:

POST bookdb_index/book/_search
{
  "query": {
    "regexp": {
      "authors": "t[a-z]*y"
    }
  },
  "_source": ["title", "authors"],
  "highlight": {
    "fields": {
      "authors": {}
    }
  }
}

[Results]
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "4",
        "_score": 1,
        "_source": {
          "title": "Solr in Action",
          "authors": [
            "trey grainger",
            "timothy potter"
          ]
        },
        "highlight": {
          "authors": [
            "<em>trey</em> grainger",
            "<em>timothy</em> potter"
          ]
        }
      }
    ]
  }
複製代碼

八、匹配短語檢索( Match Phrase Query)

匹配短語查詢要求查詢字符串中的全部詞都存在於文檔中,按照查詢字符串中指定的順序而且彼此靠近

默認狀況下,這些詞必須徹底相鄰,但您能夠指定偏離值(slop value),該值指示在仍然考慮文檔匹配的狀況下詞與詞之間的偏離值。

GET bookdb_index/book/_search
{
  "query": {
    "multi_match": {
      "query": "search engine",
      "fields": ["title", "summary"],
      "type": "phrase",
      "slop": 3
    }
  },
  "_source": [ "title", "summary", "publish_date" ]
}

[Results]
  "hits": {
    "total": 2,
    "max_score": 0.88067603,
    "hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "4",
        "_score": 0.88067603,
        "_source": {
          "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
          "title": "Solr in Action",
          "publish_date": "2014-04-05"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "1",
        "_score": 0.51429313,
        "_source": {
          "summary": "A distibuted real-time search and analytics engine",
          "title": "Elasticsearch: The Definitive Guide",
          "publish_date": "2015-02-07"
        }
      }
    ]
  }
複製代碼

注意:在上面的示例中,對於非短語類型查詢,文檔_id 1一般具備較高的分數,而且顯示在文檔_id 4以前,由於其字段長度較短。

然而,做爲一個短語查詢,詞與詞之間的接近度被考慮在內,因此文檔_id 4分數更好

九、匹配詞組前綴檢索

匹配詞組前綴查詢在查詢時提供搜索即時類型或 "相對簡單" "的自動完成版本,而無需以任何方式準備數據。

像match_phrase查詢同樣,它接受一個斜率參數,使得單詞的順序和相對位置沒有那麼 "嚴格"。 它還接受max_expansions參數來限制匹配的條件數以減小資源強度

GET bookdb_index/book/_search
{
  "query": {
    "match_phrase_prefix": {
      "summary": {
        "query": "search en",
        "slop": 3,
        "max_expansions": 10
      }
    }
  },
  "_source": ["title","summary","publish_date"]
}
複製代碼

注意:查詢時間搜索類型具備性能成本。 一個更好的解決方案是將時間做爲索引類型。 更多相關API查詢 Completion Suggester API 或者 Edge-Ngram filters 。

十、字符串檢索( Query String)

query_string查詢提供了以簡明的簡寫語法執行多匹配查詢 multi_match queries ,布爾查詢 bool queries ,提高得分 boosting ,模糊匹配 fuzzy matching ,通配符 wildcards ,正則表達式 regexp 和範圍查詢 range queries 的方式。

在下面的例子中,咱們對 "search algorithm" 一詞執行模糊搜索,其中一本做者是 "grant ingersoll" 或 "tom morton"。 咱們搜索全部字段,但將提高應用於文檔2的摘要字段

GET bookdb_index/book/_search
{
  "query": {
    "query_string": {
      "query": "(saerch~1 algorithm~1) AND (grant ingersoll) OR (tom morton)",
      "fields": ["summary^2","title","authors","publisher"]
    }
  },
  "_source": ["title","summary","authors"],
  "highlight": {
    "fields": {
      "summary": {}
    }
  }
}

[Results]
  "hits": {
    "total": 1,
    "max_score": 3.571021,
    "hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "2",
        "_score": 3.571021,
        "_source": {
          "summary": "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization",
          "title": "Taming Text: How to Find, Organize, and Manipulate It",
          "authors": [
            "grant ingersoll",
            "thomas morton",
            "drew farris"
          ]
        },
        "highlight": {
          "summary": [
            "organize text using approaches such as full-text <em>search</em>, proper name recognition, clustering, tagging"
          ]
        }
      }
    ]
  }
複製代碼

十一、簡化的字符串檢索 (Simple Query String)

simple_query_string 查詢是 query_string 查詢的一個版本,更適合用於暴露給用戶的單個搜索框, 由於它分別用 + / | / - 替換了 AND / OR / NOT 的使用,並放棄查詢的無效部分,而不是在用戶出錯時拋出異常。

GET bookdb_index/book/_search
{
  "query": {
    "simple_query_string": {
      "query": "(saerch~1 algorithm~1) + (grant ingersoll) | (tom morton)",
      "fields": ["summary^2","title","authors","publisher"]
    }
  },
  "_source": ["title","summary","authors"],
  "highlight": {
    "fields": {
      "summary": {}
    }
  }
}

[Results]
# 結果同上
複製代碼

十二、Term/Terms檢索(指定字段檢索)

上面1-11小節的例子是全文搜索的例子。 有時咱們對結構化搜索更感興趣,咱們但願在其中找到徹底匹配並返回結果

在下面的例子中,咱們搜索 Manning Publications 發佈的索引中的全部圖書(藉助 term和terms查詢 )

GET bookdb_index/book/_search
{
  "query": {
    "term": {
      "publisher": {
        "value": "manning"
      }
    }
  },
  "_source" : ["title","publish_date","publisher"]
}

[Results]
  "hits": {
    "total": 3,
    "max_score": 0.35667494,
    "hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "2",
        "_score": 0.35667494,
        "_source": {
          "publisher": "manning",
          "title": "Taming Text: How to Find, Organize, and Manipulate It",
          "publish_date": "2013-01-24"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "3",
        "_score": 0.35667494,
        "_source": {
          "publisher": "manning",
          "title": "Elasticsearch in Action",
          "publish_date": "2015-12-03"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "4",
        "_score": 0.35667494,
        "_source": {
          "publisher": "manning",
          "title": "Solr in Action",
          "publish_date": "2014-04-05"
        }
      }
    ]
  }
複製代碼

Multiple terms可指定多個關鍵詞進行檢索

GET bookdb_index/book/_search
{
  "query": {
    "terms": {
      "publisher": ["oreilly", "manning"]
    }
  }
}
複製代碼

1三、Term排序檢索-(Term Query - Sorted)

Term查詢和其餘查詢同樣,輕鬆的實現排序。多級排序也是容許的

GET bookdb_index/book/_search
{
  "query": {
    "term": {
      "publisher": {
        "value": "manning"
      }
    }
  },
  "_source" : ["title","publish_date","publisher"],
  "sort": [{"publisher.keyword": { "order": "desc"}},
    {"title.keyword": {"order": "asc"}}]
}

[Results]
  "hits": {
    "total": 3,
    "max_score": null,
    "hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "3",
        "_score": null,
        "_source": {
          "publisher": "manning",
          "title": "Elasticsearch in Action",
          "publish_date": "2015-12-03"
        },
        "sort": [
          "manning",
          "Elasticsearch in Action"
        ]
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "4",
        "_score": null,
        "_source": {
          "publisher": "manning",
          "title": "Solr in Action",
          "publish_date": "2014-04-05"
        },
        "sort": [
          "manning",
          "Solr in Action"
        ]
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "2",
        "_score": null,
        "_source": {
          "publisher": "manning",
          "title": "Taming Text: How to Find, Organize, and Manipulate It",
          "publish_date": "2013-01-24"
        },
        "sort": [
          "manning",
          "Taming Text: How to Find, Organize, and Manipulate It"
        ]
      }
    ]
  }
複製代碼

注意:Elasticsearch 6.x 全文搜索用text類型的字段,排序用不用 text 類型的字段

1四、範圍檢索(Range query)

另外一個結構化檢索的例子是範圍檢索。下面的舉例中,咱們檢索了2015年發佈的書籍。

GET bookdb_index/book/_search
{
  "query": {
    "range": {
      "publish_date": {
        "gte": "2015-01-01",
        "lte": "2015-12-31"
      }
    }
  },
  "_source" : ["title","publish_date","publisher"]
}

[Results]
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "1",
        "_score": 1,
        "_source": {
          "publisher": "oreilly",
          "title": "Elasticsearch: The Definitive Guide",
          "publish_date": "2015-02-07"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "3",
        "_score": 1,
        "_source": {
          "publisher": "manning",
          "title": "Elasticsearch in Action",
          "publish_date": "2015-12-03"
        }
      }
    ]
  }
複製代碼

注意:範圍查詢適用於日期,數字和字符串類型字段

1五、過濾檢索(Filtered query)

(5.0版本起已再也不存在,沒必要關注)

過濾的查詢容許您過濾查詢的結果。 以下的例子,咱們在標題或摘要中查詢名爲「Elasticsearch」的圖書,可是咱們但願將結果過濾到只有20個或更多評論的結果。

POST /bookdb_index/book/_search
{
    "query": {
        "filtered": {
            "query" : {
                "multi_match": {
                    "query": "elasticsearch",
                    "fields": ["title","summary"]
                }
            },
            "filter": {
                "range" : {
                    "num_reviews": {
                        "gte": 20
                    }
                }
            }
        }
    },
    "_source" : ["title","summary","publisher", "num_reviews"]
}


[Results]
"hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "1",
        "_score": 0.5955761,
        "_source": {
          "summary": "A distibuted real-time search and analytics engine",
          "publisher": "oreilly",
          "num_reviews": 20,
          "title": "Elasticsearch: The Definitive Guide"
        }
      }
    ]
複製代碼

注意:已過濾的查詢不要求存在要過濾的查詢。 若是沒有指定查詢,則運行 match_all 查詢,基本上返回索引中的全部文檔,而後對其進行過濾。 實際上,首先運行過濾器,減小須要查詢的表面積。 此外,過濾器在第一次使用後被緩存,這使得它很是有效

更新: 已篩選的查詢已推出的Elasticsearch 5.X版本中移除,有利於布爾查詢。 這是與上面重寫的使用bool查詢相同的示例。 返回的結果是徹底同樣的。

GET bookdb_index/book/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "multi_match": {
            "query": "elasticsearch",
            "fields": ["title","summary"]
          }
        }
      ],
      "filter": {
        "range": {
          "num_reviews": {
            "gte": 20
          }
        }
      }
    }
  },
  "_source" : ["title","summary","publisher", "num_reviews"]
}
複製代碼

1六、多個過濾器檢索(Multiple Filters)

(5.x再也不支持,無需關注) 多個過濾器能夠經過使用布爾過濾器進行組合。

在下一個示例中,過濾器肯定返回的結果必須至少包含20個評論,不得在2015年以前發佈,而且應該由oreilly發佈

POST /bookdb_index/book/_search
{
    "query": {
        "filtered": {
            "query" : {
                "multi_match": {
                    "query": "elasticsearch",
                    "fields": ["title","summary"]
                }
            },
            "filter": {
                "bool": {
                    "must": {
                        "range" : { "num_reviews": { "gte": 20 } }
                    },
                    "must_not": {
                        "range" : { "publish_date": { "lte": "2014-12-31" } }
                    },
                    "should": {
                        "term": { "publisher": "oreilly" }
                    }
                }
            }
        }
    },
    "_source" : ["title","summary","publisher", "num_reviews", "publish_date"]
}


[Results]
"hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "1",
        "_score": 0.5955761,
        "_source": {
          "summary": "A distibuted real-time search and analytics engine",
          "publisher": "oreilly",
          "num_reviews": 20,
          "title": "Elasticsearch: The Definitive Guide",
          "publish_date": "2015-02-07"
        }
      }
    ]
複製代碼

1七、 Function 得分:Field值因子( Function Score: Field Value Factor)

可能有一種狀況,您想要將文檔中特定字段的值歸入相關性分數的計算。 這在您但願基於其受歡迎程度提高文檔的相關性的狀況下是有表明性的場景

在咱們的例子中,咱們但願增長更受歡迎的書籍(按評論數量判斷)。 這可使用field_value_factor函數得分

GET bookdb_index/book/_search
{
  "query": {
    "function_score": {
      "query": {
        "multi_match": {
          "query": "search engine",
          "fields": ["title","summary"]
        }
      },
      "field_value_factor": {
        "field": "num_reviews",
        "modifier": "log1p",
        "factor": 2
      }
    }
  },
  "_source": ["title", "summary", "publish_date", "num_reviews"]
}

[Results]
    "hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "1",
        "_score": 1.5694137,
        "_source": {
          "summary": "A distibuted real-time search and analytics engine",
          "num_reviews": 20,
          "title": "Elasticsearch: The Definitive Guide",
          "publish_date": "2015-02-07"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "4",
        "_score": 1.4725765,
        "_source": {
          "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
          "num_reviews": 23,
          "title": "Solr in Action",
          "publish_date": "2014-04-05"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "3",
        "_score": 0.14181662,
        "_source": {
          "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
          "num_reviews": 18,
          "title": "Elasticsearch in Action",
          "publish_date": "2015-12-03"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "2",
        "_score": 0.13297246,
        "_source": {
          "summary": "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization",
          "num_reviews": 12,
          "title": "Taming Text: How to Find, Organize, and Manipulate It",
          "publish_date": "2013-01-24"
        }
      }
    ]
  }
複製代碼

注1:咱們能夠運行一個常規的multi_match查詢,並按num_reviews字段排序,可是咱們失去了相關性得分的好處。
注2:有許多附加參數能夠調整對原始相關性分數 (如「 modifier 」,「 factor 」,「boost_mode」等)的加強效果的程度。
詳見 Elasticsearch guide.

1八、 Function 得分:衰減函數( Function Score: Decay Functions )

假設,咱們不是想經過一個字段的值逐漸增長得分,以獲取理想的結果。 舉例:價格範圍、數字字段範圍、日期範圍。 在咱們的例子中,咱們正在搜索2014年6月左右出版的「 search engines 」的書籍。

GET bookdb_index/book/_search
{
  "query": {
    "function_score": {
      "query": {
        "multi_match": {
          "query": "search engine",
          "fields": ["title", "summary"]
        }
      },
      "functions": [
        {
          "exp": {
            "publish_date": {
              "origin": "2014-06-15",
              "scale": "30d",
              "offset": "7d"
            }
          }
        }
      ],
      "boost_mode": "replace"
    }
  },
  "_source": ["title", "summary", "publish_date", "num_reviews"]
}

[Results]
  "hits": {
    "total": 4,
    "max_score": 0.22793062,
    "hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "4",
        "_score": 0.22793062,
        "_source": {
          "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
          "num_reviews": 23,
          "title": "Solr in Action",
          "publish_date": "2014-04-05"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "1",
        "_score": 0.0049215667,
        "_source": {
          "summary": "A distibuted real-time search and analytics engine",
          "num_reviews": 20,
          "title": "Elasticsearch: The Definitive Guide",
          "publish_date": "2015-02-07"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "2",
        "_score": 0.000009612435,
        "_source": {
          "summary": "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization",
          "num_reviews": 12,
          "title": "Taming Text: How to Find, Organize, and Manipulate It",
          "publish_date": "2013-01-24"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "3",
        "_score": 0.0000049185574,
        "_source": {
          "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
          "num_reviews": 18,
          "title": "Elasticsearch in Action",
          "publish_date": "2015-12-03"
        }
      }
    ]
  }
複製代碼

1九、Function得分:腳本得分( Function Score: Script Scoring )

在內置計分功能不符合您需求的狀況下,能夠選擇指定用於評分的Groovy腳本

在咱們的示例中,咱們要指定一個考慮到publish_date的腳本,而後再決定考慮多少評論。 較新的書籍可能沒有這麼多的評論,因此他們不該該爲此付出「代價」

得分腳本以下所示:

publish_date = doc['publish_date'].value
num_reviews = doc['num_reviews'].value

if (publish_date > Date.parse('yyyy-MM-dd', threshold).getTime()) {
  my_score = Math.log(2.5 + num_reviews)
} else {
  my_score = Math.log(1 + num_reviews)
}
return my_score
複製代碼

要動態使用評分腳本,咱們使用script_score參數

GET /bookdb_index/book/_search
{
  "query": {
    "function_score": {
      "query": {
        "multi_match": {
          "query": "search engine",
          "fields": ["title","summary"]
        }
      },
      "functions": [
        {
          "script_score": {
            "script": {
              "params": {
                "threshold": "2015-07-30"
              },  
              "lang": "groovy", 
              "source": "publish_date = doc['publish_date'].value; num_reviews = doc['num_reviews'].value; if (publish_date > Date.parse('yyyy-MM-dd', threshold).getTime()) { return log(2.5 + num_reviews) }; return log(1 + num_reviews);"
            }
          }
        }
      ]
    }
  },
  "_source": ["title","summary","publish_date", "num_reviews"]
}
複製代碼

注1:要使用動態腳本,必須爲config / elasticsearch.yml文件中的Elasticsearch實例啓用它。 也可使用已經存儲在Elasticsearch服務器上的腳本。 查看 Elasticsearch reference docs 以獲取更多信息。
注2: JSON不能包含嵌入的換行符,所以分號用於分隔語句。
原文做者: by Tim Ojo Aug. 05, 16 · Big Data Zone
原文地址:dzone.com/articles/23…

注意:ES6.3 怎樣啓用 groovy 腳本?配置未成功
script.allowed_types: inline & script.allowed_contexts: search, update

Java API 實現

Java API 實現上面的查詢,代碼見 github.com/whirlys/ela…

參考文章:
銘毅天下:[譯]你必須知道的23個最有用的Elasticseaerch檢索技巧
英文原文:23 Useful Elasticsearch Example Queries


更多內容請訪問個人我的博客:laijianfeng.org

打開微信掃一掃,關注【小旋鋒】微信公衆號,及時接收博文推送

小旋鋒的微信公衆號
相關文章
相關標籤/搜索