整理一篇經常使用的CRUD查詢語句,以前這篇文件是在17年左右發表的,從英文翻譯過來,如今採用7.x 版本進行實驗,棄用的功能或者參數,我這邊會進行更新,一塊兒來學習吧。html
爲了演示不一樣類型的 ElasticSearch 的查詢,咱們將使用書文檔信息的集合(有如下字段:title(標題), authors(做者), summary(摘要), publish_date(發佈日期)和 num_reviews(瀏覽數))。數組
在這以前,首先咱們應該先建立一個新的索引(index),並批量導入一些文檔:緩存
建立索引:服務器
PUT /bookdb_index { "settings": { "number_of_shards": 1 }}
批量上傳文檔:app
注意:如今7.x 已經啓用types 類型了,對應的操做語句也要修改下,
POST /bookdb_index/book/_bulk
換成POST /bookdb_index/_bulk
,而後進行操做。elasticsearch
POST /bookdb_index/_bulk { "index": { "_id": 1 }} { "title": "Elasticsearch: The Definitive Guide", "authors": ["clinton gormley", "zachary tong"], "summary" : "A distibuted real-time search and analytics engine", "publish_date" : "2015-02-07", "num_reviews": 20, "publisher": "oreilly" } { "index": { "_id": 2 }} { "title": "Taming Text: How to Find, Organize, and Manipulate It", "authors": ["grant ingersoll", "thomas morton", "drew farris"], "summary" : "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization", "publish_date" : "2013-01-24", "num_reviews": 12, "publisher": "manning" } { "index": { "_id": 3 }} { "title": "Elasticsearch in Action", "authors": ["radu gheorge", "matthew lee hinman", "roy russo"], "summary" : "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms", "publish_date" : "2015-12-03", "num_reviews": 18, "publisher": "manning" } { "index": { "_id": 4 }} { "title": "Solr in Action", "authors": ["trey grainger", "timothy potter"], "summary" : "Comprehensive guide to implementing a scalable search engine using Apache Solr", "publish_date" : "2014-04-05", "num_reviews": 23, "publisher": "manning" }
有兩種方式來執行一個全文匹配查詢:ide
url
中讀取全部的查詢參數下面是一個基本的匹配查詢,查詢任一字段包含 Guide 的記錄函數
GET /bookdb_index/_search?q=guide [Results] "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "1", "_score": 0.28168046, "_source": { "title": "Elasticsearch: The Definitive Guide", "authors": ["clinton gormley", "zachary tong"], "summary": "A distibuted real-time search and analytics engine", "publish_date": "2015-02-07", "num_reviews": 20, "publisher": "manning" } }, { "_index": "bookdb_index", "_type": "book", "_id": "4", "_score": 0.24144039, "_source": { "title": "Solr in Action", "authors": ["trey grainger", "timothy potter"], "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr", "publish_date": "2014-04-05", "num_reviews": 23, "publisher": "manning" } } ]
下面是完整 Search Profiler版本的查詢,生成相同的內容:性能
{ "query": { "multi_match" : { "query" : "guide", "fields" : [ "*" ] } } }
multi_match
是 match
的做爲在多個字段運行相同操做的一個速記法。fields
屬性用來指定查詢針對的字段,*
表明全部字段,同時也可使用單個字段進行查詢,用逗號分隔開就能夠。學習
在這個例子中,咱們想要對文檔的全部字段進行匹配。兩個 API 都容許你指定要查詢的字段。例如,查詢 title
字段中包含 in Action 的書:
GET /bookdb_index/_search?q=title:in action [Results] "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "4", "_score": 0.6259885, "_source": { "title": "Solr in Action", "authors": [ "trey grainger", "timothy potter" ], "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr", "publish_date": "2014-04-05", "num_reviews": 23, "publisher": "manning" } }, { "_index": "bookdb_index", "_type": "book", "_id": "3", "_score": 0.5975345, "_source": { "title": "Elasticsearch in Action", "authors": [ "radu gheorge", "matthew lee hinman", "roy russo" ], "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms", "publish_date": "2015-12-03", "num_reviews": 18, "publisher": "manning" } } ]
然而, 完整的 DSL 給予你靈活建立更復雜查詢和指定返回結果的能力(後面,咱們會一一闡述)。在下面例子中,咱們指定 size
限定返回的結果條數,from 指定起始位子,_source
指定要返回的字段,以及語法高亮
POST /bookdb_index/_search { "query": { "match" : { "title" : "in action" } }, "size": 2, "from": 0, "_source": [ "title", "summary", "publish_date" ], "highlight": { "fields" : { "title" : {} } } } [Results] "hits": { "total": 2, "max_score": 0.9105287, "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "3", "_score": 0.9105287, "_source": { "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms", "title": "Elasticsearch in Action", "publish_date": "2015-12-03" }, "highlight": { "title": [ "Elasticsearch <em>in</em> <em>Action</em>" ] } }, { "_index": "bookdb_index", "_type": "book", "_id": "4", "_score": 0.9105287, "_source": { "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr", "title": "Solr in Action", "publish_date": "2014-04-05" }, "highlight": { "title": [ "Solr <em>in</em> <em>Action</em>" ] } } ] }
注意:對於多個詞查詢,match
容許指定是否使用 and
操做符來取代默認的 or
操做符。你還能夠指定 mininum_should_match
選項來調整返回結果的相關程度。具體看後面的例子。
正如咱們已經看到來的,爲了根據多個字段檢索(e.g. 在 title
和 summary
字段都是相同的查詢字符串的結果),你可使用 multi_match
語句
POST /bookdb_index/_search { "query": { "multi_match" : { "query" : "elasticsearch guide", "fields": ["title", "summary"] } } } [Results] "hits": { "total": 3, "max_score": 0.9448582, "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "1", "_score": 0.9448582, "_source": { "title": "Elasticsearch: The Definitive Guide", "authors": [ "clinton gormley", "zachary tong" ], "summary": "A distibuted real-time search and analytics engine", "publish_date": "2015-02-07", "num_reviews": 20, "publisher": "manning" } }, { "_index": "bookdb_index", "_type": "book", "_id": "3", "_score": 0.17312013, "_source": { "title": "Elasticsearch in Action", "authors": [ "radu gheorge", "matthew lee hinman", "roy russo" ], "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms", "publish_date": "2015-12-03", "num_reviews": 18, "publisher": "manning" } }, { "_index": "bookdb_index", "_type": "book", "_id": "4", "_score": 0.14965448, "_source": { "title": "Solr in Action", "authors": [ "trey grainger", "timothy potter" ], "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr", "publish_date": "2014-04-05", "num_reviews": 23, "publisher": "manning" } } ] }
注:第三條被匹配,由於 guide
在 summary
字段中被找到。
因爲咱們是多個字段查詢,咱們可能須要提升某一個字段的分值。在下面的例子中,咱們把 summary
字段的分數提升三倍,爲了提高 summary
字段的重要度;所以,咱們把文檔 4 的相關度提升了。
POST /bookdb_index/_search { "query": { "multi_match" : { "query" : "elasticsearch guide", "fields": ["title", "summary^3"] } }, "_source": ["title", "summary", "publish_date"] } [Results] "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "1", "_score": 0.31495273, "_source": { "summary": "A distibuted real-time search and analytics engine", "title": "Elasticsearch: The Definitive Guide", "publish_date": "2015-02-07" } }, { "_index": "bookdb_index", "_type": "book", "_id": "4", "_score": 0.14965448, "_source": { "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr", "title": "Solr in Action", "publish_date": "2014-04-05" } }, { "_index": "bookdb_index", "_type": "book", "_id": "3", "_score": 0.13094766, "_source": { "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms", "title": "Elasticsearch in Action", "publish_date": "2015-12-03" } } ]
注:提高不是簡簡單單經過提高因子把計算分數加成。實際的 boost
值經過歸一化和一些內部優化給出的。相關信息請見 Elasticsearch guide
爲了提供更相關或者特定的結果,AND
/OR
/NOT
操做符能夠用來調整咱們的查詢。它是以 布爾查詢 的方式來實現的。布爾查詢 接受以下參數:
must
等同於 AND
must_not
等同於 NOT
should
等同於 OR
上面的關鍵字中在一個query中只能出現一次
打比方,若是我想要查詢這樣類型的書:書名包含 ElasticSearch 或者(OR
) Solr,而且(AND
)它的做者是 Clinton Gormley 不是(NOT
)Radu Gheorge
POST /bookdb_index/_search { "query": { "bool": { "must": { "match": { "authors": "clinton gormely" }}, "must_not": { "match": { "authors": "radu gheorge" }}, "should": [ { "match": { "title": "Elasticsearch" }}, { "match": { "title": "Solr" }} ] } } } 格式化版本: POST /bookdb_index/_search { "query": { "bool": { "must": { "match": { "authors": "clinton gormely" } }, "must_not": { "match": { "authors": "radu gheorge" } }, "should": [ { "match": { "title": "Elasticsearch" } }, { "match": { "title": "Solr" } } ] } } } [Results] "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "1", "_score": 0.3672021, "_source": { "title": "Elasticsearch: The Definitive Guide", "authors": [ "clinton gormley", "zachary tong" ], "summary": "A distibuted real-time search and analytics engine", "publish_date": "2015-02-07", "num_reviews": 20, "publisher": "oreilly" } } ]
注:正如你所看到的,布爾查詢 能夠包裝任何其餘查詢類型,包括其餘布爾查詢,以建立任意複雜或深度嵌套的查詢。
在進行匹配和多項匹配時,能夠啓用模糊匹配來捕捉拼寫錯誤,模糊度是基於原始單詞的編輯距離來指定的。
POST /bookdb_index/_search { "query": { "multi_match" : { "query" : "comprihensiv guide", "fields": ["title", "summary"], "fuzziness": "AUTO" } }, "_source": ["title", "summary", "publish_date"], "size": 1 } [Results] "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "4", "_score": 0.5961596, "_source": { "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr", "title": "Solr in Action", "publish_date": "2014-04-05" } } ]
注:當術語長度大於 5 個字符時,AUTO
的模糊值等同於指定值 「2」。可是,80% 拼寫錯誤的編輯距離爲 1,因此,將模糊值設置爲 1
可能會提升您的總體搜索性能。更多詳細信息,請參閱Elasticsearch指南中的「排版和拼寫錯誤」(Typos and Misspellings)。
通配符查詢 容許你指定匹配的模式,而不是整個術語。
?
匹配任何字符*
匹配零個或多個字符。例如,要查找名稱以字母’t’開頭的全部做者的記錄:
POST /bookdb_index/_search { "query": { "wildcard" : { "authors" : "t*" } }, "_source": ["title", "authors"], "highlight": { "fields" : { "authors" : {} } } } [Results] "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "1", "_score": 1, "_source": { "title": "Elasticsearch: The Definitive Guide", "authors": [ "clinton gormley", "zachary tong" ] }, "highlight": { "authors": [ "zachary <em>tong</em>" ] } }, { "_index": "bookdb_index", "_type": "book", "_id": "2", "_score": 1, "_source": { "title": "Taming Text: How to Find, Organize, and Manipulate It", "authors": [ "grant ingersoll", "thomas morton", "drew farris" ] }, "highlight": { "authors": [ "<em>thomas</em> morton" ] } }, { "_index": "bookdb_index", "_type": "book", "_id": "4", "_score": 1, "_source": { "title": "Solr in Action", "authors": [ "trey grainger", "timothy potter" ] }, "highlight": { "authors": [ "<em>trey</em> grainger", "<em>timothy</em> potter" ] } } ]
正則查詢 讓你可使用比 通配符查詢 更復雜的模式進行查詢:
POST /bookdb_index/_search { "query": { "regexp" : { "authors" : "t[a-z]*y" } }, "_source": ["title", "authors"], "highlight": { "fields" : { "authors" : {} } } } [Results] "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "4", "_score": 1, "_source": { "title": "Solr in Action", "authors": [ "trey grainger", "timothy potter" ] }, "highlight": { "authors": [ "<em>trey</em> grainger", "<em>timothy</em> potter" ] } } ]
短語匹配查詢 要求在請求字符串中的全部查詢項必須都在文檔中存在,文中順序也得和請求字符串一致,且彼此相連。默認狀況下,查詢項之間必須緊密相連,但能夠設置 slop
值來指定查詢項之間能夠分隔多遠的距離,結果仍將被看成一次成功的匹配。
POST /bookdb_index/_search { "query": { "multi_match" : { "query": "search engine", "fields": ["title", "summary"], "type": "phrase", "slop": 3 } }, "_source": [ "title", "summary", "publish_date" ] } [Results] "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "4", "_score": 0.22327082, "_source": { "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr", "title": "Solr in Action", "publish_date": "2014-04-05" } }, { "_index": "bookdb_index", "_type": "book", "_id": "1", "_score": 0.16113183, "_source": { "summary": "A distibuted real-time search and analytics engine", "title": "Elasticsearch: The Definitive Guide", "publish_date": "2015-02-07" } } ]
注:在上述例子中,對於非整句類型的查詢,_id
爲 1 的文檔通常會比 _id
爲 4 的文檔得分高,結果位置也更靠前,由於它的字段長度較短,可是對於 短語匹配類型 查詢,因爲查詢項之間的接近程度是一個計算因素,所以 _id
爲 4 的文檔得分更高。
短語前綴式查詢 可以進行 即時搜索(search-as-you-type) 類型的匹配,或者說提供一個查詢時的初級自動補全功能,無需以任何方式準備你的數據。和 match_phrase
查詢相似,它接收slop
參數(用來調整單詞順序和不太嚴格的相對位置)和 max_expansions
參數(用來限制查詢項的數量,下降對資源需求的強度)。
POST /bookdb_index/_search { "query": { "match_phrase_prefix" : { "summary": { "query": "search en", "slop": 3, "max_expansions": 10 } } }, "_source": [ "title", "summary", "publish_date" ] } [Results] "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "4", "_score": 0.5161346, "_source": { "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr", "title": "Solr in Action", "publish_date": "2014-04-05" } }, { "_index": "bookdb_index", "_type": "book", "_id": "1", "_score": 0.37248808, "_source": { "summary": "A distibuted real-time search and analytics engine", "title": "Elasticsearch: The Definitive Guide", "publish_date": "2015-02-07" } } ]
注:採用 查詢時即時搜索 具備較大的性能成本。更好的解決方案是採用 索引時即時搜索。更多信息,請查看 自動補齊接口(Completion Suggester API) 或 邊緣分詞器(Edge-Ngram filters)的用法。
查詢字符串 類型(query_string)的查詢提供了一個方法,用簡潔的簡寫語法來執行 多匹配查詢、 布爾查詢 、 提權查詢、 模糊查詢、 通配符查詢、 正則查詢 和範圍查詢。下面的例子中,咱們在那些做者是 「grant ingersoll」 或 「tom morton」 的某本書當中,使用查詢項 「search algorithm」 進行一次模糊查詢,搜索所有字段,但給 summary
的權重提高 2 倍。
POST /bookdb_index/_search { "query": { "query_string": { "query": "(saerch~1 algorithm~1) AND (grant ingersoll) OR (tom morton)", "fields": [ "*", "summary^2" ] } }, "_source": [ "title", "summary", "authors" ], "highlight": { "fields": { "summary": {} } } } [Results] "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "2", "_score": 0.14558059, "_source": { "summary": "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization", "title": "Taming Text: How to Find, Organize, and Manipulate It", "authors": [ "grant ingersoll", "thomas morton", "drew farris" ] }, "highlight": { "summary": [ "organize text using approaches such as full-text <em>search</em>, proper name recognition, clustering, tagging, information extraction, and summarization" ] } } ]
簡單請求字符串 類型(simple_query_string)的查詢是請求字符串類型(query_string)查詢的一個版本,它更適合那種僅暴露給用戶一個簡單搜索框的場景;由於它用 +/\|/-
分別替換了 AND/OR/NOT
,而且自動丟棄了請求中無效的部分,不會在用戶出錯時,拋出異常。
POST /bookdb_index/_search { "query": { "simple_query_string" : { "query": "(saerch~1 algorithm~1) + (grant ingersoll) | (tom morton)", "fields": ["*", "summary^2"] } }, "_source": [ "title", "summary", "authors" ], "highlight": { "fields" : { "summary" : {} } } } [Results] "hits" : { "total" : { "value" : 1, "relation" : "eq" }, "max_score" : 3.5710216, "hits" : [ { "_index" : "bookdb_index", "_type" : "book", "_id" : "2", "_score" : 3.5710216, "_source" : { "summary" : "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization", "title" : "Taming Text: How to Find, Organize, and Manipulate It", "authors" : [ "grant ingersoll", "thomas morton", "drew farris" ] }, "highlight" : { "summary" : [ "organize text using approaches such as full-text <em>search</em>, proper name recognition, clustering, tagging" ] } } ] }
以上例子均爲 full-text
(全文檢索) 的示例。有時咱們對結構化查詢更感興趣,但願獲得更準確的匹配並返回結果,詞條查詢 和 多詞條查詢 可幫咱們實現。在下面的例子中,咱們要在索引中找到全部由 Manning 出版的圖書。
POST /bookdb_index/_search { "query": { "term" : { "publisher": "manning" } }, "_source" : ["title","publish_date","publisher"] } [Results] "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "2", "_score": 1.2231436, "_source": { "publisher": "manning", "title": "Taming Text: How to Find, Organize, and Manipulate It", "publish_date": "2013-01-24" } }, { "_index": "bookdb_index", "_type": "book", "_id": "3", "_score": 1.2231436, "_source": { "publisher": "manning", "title": "Elasticsearch in Action", "publish_date": "2015-12-03" } }, { "_index": "bookdb_index", "_type": "book", "_id": "4", "_score": 1.2231436, "_source": { "publisher": "manning", "title": "Solr in Action", "publish_date": "2014-04-05" } } ]
可以使用詞條關鍵字來指定多個詞條,將搜索項用數組傳入。
{ "query": { "terms" : { "publisher": ["oreilly", "packt"] } } }
詞條查詢 的結果(和其餘查詢結果同樣)能夠被輕易排序,多級排序也被容許:
POST /bookdb_index/_search { "query": { "term": { "publisher": "manning" } }, "_source": [ "publish_date", "publisher" ], "sort": [ { "publish_date": { "order": "desc" } } ] } [Results] "hits" : { "total" : { "value" : 3, "relation" : "eq" }, "max_score" : null, "hits" : [ { "_index" : "bookdb_index", "_type" : "book", "_id" : "3", "_score" : null, "_source" : { "publisher" : "manning", "publish_date" : "2015-12-03" }, "sort" : [ 1449100800000 ] }, { "_index" : "bookdb_index", "_type" : "book", "_id" : "4", "_score" : null, "_source" : { "publisher" : "manning", "publish_date" : "2014-04-05" }, "sort" : [ 1396656000000 ] }, { "_index" : "bookdb_index", "_type" : "book", "_id" : "2", "_score" : null, "_source" : { "publisher" : "manning", "publish_date" : "2013-01-24" }, "sort" : [ 1358985600000 ] } ] }
另外一個結構化查詢的例子是 範圍查詢。在這個例子中,咱們要查找 2015 年出版的書。
POST /bookdb_index/_search { "query": { "range": { "publish_date": { "gte": "2015-01-01", "lte": "2015-12-31" } } }, "_source": [ "title", "publish_date", "publisher" ] } [Results] "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "1", "_score": 1, "_source": { "publisher": "oreilly", "title": "Elasticsearch: The Definitive Guide", "publish_date": "2015-02-07" } }, { "_index": "bookdb_index", "_type": "book", "_id": "3", "_score": 1, "_source": { "publisher": "manning", "title": "Elasticsearch in Action", "publish_date": "2015-12-03" } } ]
注:範圍查詢 用於日期、數字和字符串類型的字段。
過濾查詢容許你能夠過濾查詢結果。對於咱們的例子中,要在標題或摘要中檢索一些書,查詢項爲 Elasticsearch,但咱們又想篩出那些僅有 20 個以上評論的。
新版本不支持filtered 查詢,已經棄用這個關鍵字
POST /bookdb_index/_search { "query": { "filtered": { "query" : { "multi_match": { "query": "elasticsearch", "fields": ["title","summary"] } }, "filter": { "range" : { "num_reviews": { "gte": 20 } } } } }, "_source" : ["title","summary","publisher", "num_reviews"] } [Results] "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "1", "_score": 0.5955761, "_source": { "summary": "A distibuted real-time search and analytics engine", "publisher": "oreilly", "num_reviews": 20, "title": "Elasticsearch: The Definitive Guide" } } ]
注:過濾查詢 並不強制它做用於其上的查詢必須存在。若是未指定查詢,match_all
基本上會返回索引內的所有文檔。實際上,過濾只在第一次運行,以減小所需的查詢面積,而且,在第一次使用後過濾會被緩存,大大提升了性能。
更新:過濾查詢 將在 ElasticSearch 5
中移除,使用 布爾查詢 替代。 下面有個例子使用 布爾查詢 重寫上面的例子:
POST /bookdb_index/_search { "query": { "bool": { "must" : { "multi_match": { "query": "elasticsearch", "fields": ["title","summary"] } }, "filter": { "range" : { "num_reviews": { "gte": 20 } } } } }, "_source" : ["title","summary","publisher", "num_reviews"] }
在後續的例子中,咱們將會把它使用在 多重過濾 中。
多重過濾 能夠結合 布爾查詢 使用,下一個例子中,過濾查詢決定只返回那些包含至少20條評論,且必須在 2015 年前出版,且由 O’Reilly 出版的結果。
POST /bookdb_index/_search { "query": { "bool": { "must": [ { "match": { "title": "Elasticsearch" } } ], "filter": [ { "term": { "publisher": "oreilly" } }, { "range": { "publish_date": { "gte": "2014-12-31" } } } ] } }, "_source": [ "title", "publisher", "publish_date" ] } [Results] "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "1", "_score": 0.5955761, "_source": { "summary": "A distibuted real-time search and analytics engine", "publisher": "oreilly", "num_reviews": 20, "title": "Elasticsearch: The Definitive Guide", "publish_date": "2015-02-07" } } ]
也許在某種狀況下,你想把文檔中的某個特定域做爲計算相關性分值的一個因素,比較典型的場景是你想根據普及程度來提升一個文檔的相關性。在咱們的示例中,咱們想把最受歡迎的書(基於評論數判斷)的權重進行提升,可以使用 field_value_factor
用以影響分值。
POST /bookdb_index/_search { "query": { "function_score": { "query": { "multi_match" : { "query" : "search engine", "fields": ["title", "summary"] } }, "field_value_factor": { "field" : "num_reviews", "modifier": "log1p", "factor" : 2 } } }, "_source": ["title", "summary", "publish_date", "num_reviews"] } [Results] "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "1", "_score": 0.44831306, "_source": { "summary": "A distibuted real-time search and analytics engine", "num_reviews": 20, "title": "Elasticsearch: The Definitive Guide", "publish_date": "2015-02-07" } }, { "_index": "bookdb_index", "_type": "book", "_id": "4", "_score": 0.3718407, "_source": { "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr", "num_reviews": 23, "title": "Solr in Action", "publish_date": "2014-04-05" } }, { "_index": "bookdb_index", "_type": "book", "_id": "3", "_score": 0.046479136, "_source": { "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms", "num_reviews": 18, "title": "Elasticsearch in Action", "publish_date": "2015-12-03" } }, { "_index": "bookdb_index", "_type": "book", "_id": "2", "_score": 0.041432835, "_source": { "summary": "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization", "num_reviews": 12, "title": "Taming Text: How to Find, Organize, and Manipulate It", "publish_date": "2013-01-24" } } ]
注1: 咱們可能剛運行了一個常規的 multi_match
(多匹配)查詢,並對 num_reviews
域進行了排序,這讓咱們失去了評估相關性分值的好處。
注2: 有大量的附加參數可用來調整提高原始相關性分值效果的程度,好比 modifier
, factor
, boost_mode
等等,至於細節可在 Elasticsearch 指南中探索。
假設不想使用域值作遞增提高,而你有一個理想目標值,並但願用這個加權因子來對這個離你較遠的目標值進行衰減。有個典型的用途是基於經緯度、價格或日期等數值域的提高。在以下的例子中,咱們查找在2014年6月左右出版的,查詢項是 search engines 的書。
POST /bookdb_index/_search { "query": { "function_score": { "query": { "multi_match" : { "query" : "search engine", "fields": ["title", "summary"] } }, "functions": [ { "exp": { "publish_date" : { "origin": "2014-06-15", "offset": "7d", "scale" : "30d" } } } ], "boost_mode" : "replace" } }, "_source": ["title", "summary", "publish_date", "num_reviews"] } [Results] "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "4", "_score": 0.27420625, "_source": { "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr", "num_reviews": 23, "title": "Solr in Action", "publish_date": "2014-04-05" } }, { "_index": "bookdb_index", "_type": "book", "_id": "1", "_score": 0.005920768, "_source": { "summary": "A distibuted real-time search and analytics engine", "num_reviews": 20, "title": "Elasticsearch: The Definitive Guide", "publish_date": "2015-02-07" } }, { "_index": "bookdb_index", "_type": "book", "_id": "2", "_score": 0.000011564, "_source": { "summary": "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization", "num_reviews": 12, "title": "Taming Text: How to Find, Organize, and Manipulate It", "publish_date": "2013-01-24" } }, { "_index": "bookdb_index", "_type": "book", "_id": "3", "_score": 0.0000059171475, "_source": { "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms", "num_reviews": 18, "title": "Elasticsearch in Action", "publish_date": "2015-12-03" } } ]
當內置的評分函數沒法知足你的需求時,還能夠用 Groovy 腳本。在咱們的例子中,想要指定一個腳本,能在決定把 num_reviews
的因子計算多少以前,先將 publish_date
考慮在內。由於很新的書也許不會有評論,分值不該該被懲罰。
評分腳本以下:
publish_date = doc['publish_date'].value num_reviews = doc['num_reviews'].value if (publish_date > Date.parse('yyyy-MM-dd', threshold).getTime()) { my_score = Math.log(2.5 + num_reviews) } else { my_score = Math.log(1 + num_reviews) } return my_score
在 script_score
參數內動態調用評分腳本:
POST /bookdb_index/book/_search { "query": { "function_score": { "query": { "multi_match" : { "query" : "search engine", "fields": ["title", "summary"] } }, "functions": [ { "script_score": { "params" : { "threshold": "2015-07-30" }, "script": "publish_date = doc['publish_date'].value; num_reviews = doc['num_reviews'].value; if (publish_date > Date.parse('yyyy-MM-dd', threshold).getTime()) { return log(2.5 + num_reviews) }; return log(1 + num_reviews);" } } ] } }, "_source": ["title", "summary", "publish_date", "num_reviews"] } [Results] "hits": { "total": 4, "max_score": 0.8463001, "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "1", "_score": 0.8463001, "_source": { "summary": "A distibuted real-time search and analytics engine", "num_reviews": 20, "title": "Elasticsearch: The Definitive Guide", "publish_date": "2015-02-07" } }, { "_index": "bookdb_index", "_type": "book", "_id": "4", "_score": 0.7067348, "_source": { "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr", "num_reviews": 23, "title": "Solr in Action", "publish_date": "2014-04-05" } }, { "_index": "bookdb_index", "_type": "book", "_id": "3", "_score": 0.08952084, "_source": { "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms", "num_reviews": 18, "title": "Elasticsearch in Action", "publish_date": "2015-12-03" } }, { "_index": "bookdb_index", "_type": "book", "_id": "2", "_score": 0.07602123, "_source": { "summary": "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization", "num_reviews": 12, "title": "Taming Text: How to Find, Organize, and Manipulate It", "publish_date": "2013-01-24" } } ] }
注1: 要在 Elasticsearch 實例中使用動態腳本,必須在 config/elasticsearch.yaml 文件中啓用它;也可使用存儲在 Elasticsearch 服務器上的腳本。建議看看 Elasticsearch 指南文檔獲取更多信息。
注2: 因 JSON 不能包含嵌入式換行符,請使用分號來分割語句。