本篇博客是對前期工做中遇到ES坑的一些小結,順手記錄下,方便往後查閱。app
爲了講解不一樣類型ES檢索,咱們將要對包含如下類型的文檔集合進行檢索:elasticsearch
1. title 標題; 2. authors 做者; 3. summary 摘要; 4. release data 發佈日期; 5. number of reviews 評論數。
首先,讓咱們藉助 bulk API批量建立新的索引並提交數據。ide
PUT /bookdb_index { "settings": { "number_of_shards": 1 }} POST /bookdb_index/book/_bulk { "index": { "_id": 1 }} { "title": "Elasticsearch: The Definitive Guide", "authors": ["clinton gormley", "zachary tong"], "summary" : "A distibuted real-time search and analytics engine", "publish_date" : "2015-02-07", "num_reviews": 20, "publisher": "oreilly" } { "index": { "_id": 2 }} { "title": "Taming Text: How to Find, Organize, and Manipulate It", "authors": ["grant ingersoll", "thomas morton", "drew farris"], "summary" : "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization", "publish_date" : "2013-01-24", "num_reviews": 12, "publisher": "manning" } { "index": { "_id": 3 }} { "title": "Elasticsearch in Action", "authors": ["radu gheorge", "matthew lee hinman", "roy russo"], "summary" : "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms", "publish_date" : "2015-12-03", "num_reviews": 18, "publisher": "manning" } { "index": { "_id": 4 }}
有兩種方式能夠執行全文檢索:
1)使用包含參數的檢索API,參數做爲URL的一部分。性能
舉例:如下對」guide」執行全文檢索。優化
GET /bookdb_index/book/_search?q=guide
[Results] "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "1", "_score": 0.28168046, "_source": { "title": "Elasticsearch: The Definitive Guide", "authors": [ "clinton gormley", "zachary tong" ], "summary": "A distibuted real-time search and analytics engine", "publish_date": "2015-02-07", "num_reviews": 20, "publisher": "manning" } }, { "_index": "bookdb_index", "_type": "book", "_id": "4", "_score": 0.24144039, "_source": { "title": "Solr in Action", "authors": [ "trey grainger", "timothy potter" ], "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr", "publish_date": "2014-04-05", "num_reviews": 23, "publisher": "manning" } } ]
2)使用完整的ES DSL,其中Json body做爲請求體。
其執行結果如方式1)結果一致。ui
{ "query": { "multi_match" : { "query" : "guide", "fields" : ["_all"] } } }
解讀:使用multi_match關鍵字代替match關鍵字,做爲對多個字段運行相同查詢的方便的簡寫方式。 fields屬性指定要查詢的字段,在這種狀況下,咱們要對文檔中的全部字段進行查詢。spa
這兩個API也容許您指定要搜索的字段。 例如,要在標題字段中搜索帶有「in action」字樣的圖書,
1)URL檢索方式
以下所示:.net
GET /bookdb_index/book/_search?q=title:in action [Results] "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "4", "_score": 0.6259885, "_source": { "title": "Solr in Action", "authors": [ "trey grainger", "timothy potter" ], "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr", "publish_date": "2014-04-05", "num_reviews": 23, "publisher": "manning" } }, { "_index": "bookdb_index", "_type": "book", "_id": "3", "_score": 0.5975345, "_source": { "title": "Elasticsearch in Action", "authors": [ "radu gheorge", "matthew lee hinman", "roy russo" ], "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms", "publish_date": "2015-12-03", "num_reviews": 18, "publisher": "manning" } } ]
2)DSL檢索方式
然而,full body的DSL爲您提供了建立更復雜查詢的更多靈活性(咱們將在後面看到)以及指定您但願的返回結果。 在下面的示例中,咱們指定要返回的結果數、偏移量(對分頁有用)、咱們要返回的文檔字段以及屬性的高亮顯示。
結果數的表示方式:size;
偏移值的表示方式:from;
指定返回字段 的表示方式 :_source;
高亮顯示 的表示方式 :highliaght。scala
POST /bookdb_index/book/_search { "query": { "match" : { "title" : "in action" } }, "size": 2, "from": 0, "_source": [ "title", "summary", "publish_date" ], "highlight": { "fields" : { "title" : {} } } } [Results] "hits": { "total": 2, "max_score": 0.9105287, "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "3", "_score": 0.9105287, "_source": { "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms", "title": "Elasticsearch in Action", "publish_date": "2015-12-03" }, "highlight": { "title": [ "Elasticsearch <em>in</em> <em>Action</em>" ] } }, { "_index": "bookdb_index", "_type": "book", "_id": "4", "_score": 0.9105287, "_source": { "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr", "title": "Solr in Action", "publish_date": "2014-04-05" }, "highlight": { "title": [ "Solr <em>in</em> <em>Action</em>" ] } } ] }
注意:對於 multi-word 檢索,匹配查詢容許您指定是否使用‘and’運算符, code
而不是使用默認’or’運算符。
您還能夠指定minimum_should_match選項來調整返回結果的相關性。
詳細信息能夠在Elasticsearch指南中查詢Elasticsearch guide. 獲取。
如咱們已經看到的,要在搜索中查詢多個文檔字段(例如在標題和摘要中搜索相同的查詢字符串),請使用multi_match查詢。
POST /bookdb_index/book/_search { "query": { "multi_match" : { "query" : "elasticsearch guide", "fields": ["title", "summary"] } } } [Results] "hits": { "total": 3, "max_score": 0.9448582, "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "1", "_score": 0.9448582, "_source": { "title": "Elasticsearch: The Definitive Guide", "authors": [ "clinton gormley", "zachary tong" ], "summary": "A distibuted real-time search and analytics engine", "publish_date": "2015-02-07", "num_reviews": 20, "publisher": "manning" } }, { "_index": "bookdb_index", "_type": "book", "_id": "3", "_score": 0.17312013, "_source": { "title": "Elasticsearch in Action", "authors": [ "radu gheorge", "matthew lee hinman", "roy russo" ], "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms", "publish_date": "2015-12-03", "num_reviews": 18, "publisher": "manning" } }, { "_index": "bookdb_index", "_type": "book", "_id": "4", "_score": 0.14965448, "_source": { "title": "Solr in Action", "authors": [ "trey grainger", "timothy potter" ], "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr", "publish_date": "2014-04-05", "num_reviews": 23, "publisher": "manning" } } ] }
注意:以上結果3匹配的緣由是guide在summary存在。
因爲咱們正在多個字段進行搜索,咱們可能但願提升某一字段的得分。 在下面的例子中,咱們將「摘要」字段的得分提升了3倍,以增長「摘要」字段的重要性,從而提升文檔 4 的相關性。
POST /bookdb_index/book/_search { "query": { "multi_match" : { "query" : "elasticsearch guide", "fields": ["title", "summary^3"] } }, "_source": ["title", "summary", "publish_date"] } [Results] "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "1", "_score": 0.31495273, "_source": { "summary": "A distibuted real-time search and analytics engine", "title": "Elasticsearch: The Definitive Guide", "publish_date": "2015-02-07" } }, { "_index": "bookdb_index", "_type": "book", "_id": "4", "_score": 0.14965448, "_source": { "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr", "title": "Solr in Action", "publish_date": "2014-04-05" } }, { "_index": "bookdb_index", "_type": "book", "_id": "3", "_score": 0.13094766, "_source": { "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms", "title": "Elasticsearch in Action", "publish_date": "2015-12-03" } } ]
注意:Boosting不只意味着計算得分乘法以增長因子。 實際的提高得分值是經過歸一化和一些內部優化。參考 Elasticsearch guide.查看更多。
可使用AND / OR / NOT運算符來微調咱們的搜索查詢,以提供更相關或指定的搜索結果。
在搜索API中是經過bool查詢來實現的。
bool查詢接受」must」參數(等效於AND),一個must_not參數(至關於NOT)或者一個should參數(等同於OR)。
例如,若是我想在標題中搜索一本名爲「Elasticsearch」或「Solr」的書,AND由「clinton gormley」創做,但NOT由「radu gheorge」創做:
POST /bookdb_index/book/_search { "query": { "bool": { "must": { "bool" : { "should": [ { "match": { "title": "Elasticsearch" }}, { "match": { "title": "Solr" }} ] } }, "must": { "match": { "authors": "clinton gormely" }}, "must_not": { "match": {"authors": "radu gheorge" }} } } } [Results] "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "1", "_score": 0.3672021, "_source": { "title": "Elasticsearch: The Definitive Guide", "authors": [ "clinton gormley", "zachary tong" ], "summary": "A distibuted real-time search and analytics engine", "publish_date": "2015-02-07", "num_reviews": 20, "publisher": "oreilly" } } ]
注意:您能夠看到,bool查詢能夠包含任何其餘查詢類型,包括其餘布爾查詢,以建立任意複雜或深度嵌套的查詢。
在 Match檢索 和多匹配檢索中能夠啓用模糊匹配來捕捉拼寫錯誤。 基於與原始詞的Levenshtein距離來指定模糊度。
POST /bookdb_index/book/_search { "query": { "multi_match" : { "query" : "comprihensiv guide", "fields": ["title", "summary"], "fuzziness": "AUTO" } }, "_source": ["title", "summary", "publish_date"], "size": 1 } [Results] "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "4", "_score": 0.5961596, "_source": { "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr", "title": "Solr in Action", "publish_date": "2014-04-05" } } ]
「AUTO」的模糊值至關於當字段長度大於5時指定值2。可是,設置80%的拼寫錯誤的編輯距離爲1,將模糊度設置爲1可能會提升總體搜索性能。 有關更多信息, Typos and Misspellingsch 。
https://blog.csdn.net/laoyang360/article/details/76769208 從6開始