Elasticsearch 經常使用基本查詢

時間 2019-11-11

原文原文鏈接

安裝啓動很簡單，參考官網步驟：https://www.elastic.co/downloads/elasticsearchhtml

　　爲了介紹Elasticsearch中的不一樣查詢類型，咱們將對帶有下列字段的文檔進行搜索：title（標題），authors（做者），summary（摘要），release date（發佈時間）以及number of reviews（評論數量），首先，讓咱們建立一個新的索引，並經過bulk API查詢文檔：　　正則表達式

　　爲了展現Elasticsearch中不一樣查詢的用法，首先在Elasticsearch裏面建立了employee相關的documents，每本書主要涉及如下字段： first_name, last_name, age,about,interests,操做以下：
緩存

1 curl -XPUT 'localhost:9200/megacorp/employee/3' -d '{ "first_name" : "Douglas", "last_name" : "Fir", "age" : 35, "about" : "I like to build cabinets", "interests": "forestry" }'
2 curl -XPUT 'localhost:9200/megacorp/employee/2' -d '{ "first_name" : "Jane", "last_name" : "Smith", "age" : 32, "about" : "I like to collect rock albums", "interests": "music" }'
3 curl -XPUT 'localhost:9200/megacorp/employee/1' -d '{ "first_name" : "John", "last_name" : "Smith", "age" : 25, "about" : "I love to go rock climbing", "interests": [ "sports", "music" ] }'

1. 基本匹配查詢(Basic Match Query)app

　　基本匹配查詢主要有兩種形式：（1）使用Search Lite API，並將全部的搜索參數都經過URL傳遞；curl

　　　　　　　　　　　　　　　　（2）使用Elasticsearch DSL，其能夠經過傳遞一個JSON請求來獲取結果。下面是在全部的字段中搜索帶有"John"的結果elasticsearch

1 curl -XGET 'localhost:9200/megacorp/employee/_search?q=John'

若是咱們使用Query DSL來展現出上面同樣的結果能夠這麼來寫：ide

curl -XGET 'localhost:9200/megacorp/_search' -d '
{
    "query": {
        "multi_match" : {
            "query" : "John",
            "fields" : ["_all"]
        }
    }
}'

　　其輸出和上面使用/_search?q=john的輸出同樣。上面的multi_match關鍵字一般在查詢多個fields的時候做爲match關鍵字的簡寫方式。fields屬性指定須要查詢的字段，若是咱們想查詢全部的字段，這時候可使用_all關鍵字，正如上面的同樣。以上兩種方式都容許咱們指定查詢哪些字段。好比，咱們想查詢interest中出現music的員工，那麼咱們能夠這麼查詢：性能

1 curl -XGET 'localhost:9200/megacorp/employee/_search?q=interests:music'

　　然而，DSL方式提供了更加靈活的方式來構建更加複雜的查詢（咱們將在後面看到），甚至指定你想要的返回結果。下面的例子中，我將指定須要返回結果的數量，開始的偏移量（這在分頁的狀況下很是有用），須要返回document中的哪些字段以及高亮關鍵字：優化

curl -XGET 'localhost:9200/megacorp/employee/_search?pretty' -d '{"query": { "match" : { "interests" : "music" }},"size": 2,"from": 0,"_source": [ "first_name", "last_name", "interests" ],"highlight": {"fields" : { "interests" : { } } } }'

　　須要注意的是：對於查詢多個關鍵字，match關鍵字容許咱們使用and操做符來代替默認的or操做符。你也能夠指定minimum_should_match操做符來調整返回結果的相關性(tweakrelevance)。ui

2. Multi-field Search

　　正如咱們以前所看到的，想在一個搜索中查詢多個 document field （好比使用同一個查詢關鍵字同時在title和summary中查詢），你可使用multi_match查詢，使用以下：

curl -XGET 'localhost:9200/megacorp/employee/_search' -d '
{
    "query": {
        "multi_match" : {
            "query" : "rock",
            "fields": ["about", "interests"]
        }
    }
}'

3. Boosting
　　咱們上面使用同一個搜索請求在多個field中查詢，你也許想提升某個field的查詢權重,在下面的例子中，咱們把interests的權重調成3，這樣就提升了其在結果中的權重，這樣把_id=4的文檔相關性大大提升了，以下：

curl -XGET 'localhost:9200/megacorp/employee/_search' -d '
{
    "query": {
        "multi_match" : {
            "query" : "rock",
            "fields": ["about", "interests^3"]
        }
    }
}'

Boosting不只僅意味着計算出來的分數(calculated score)直接乘以boost factor，最終的boost value會通過歸一化以及其餘一些內部的優化

4. Bool Query
　　咱們能夠在查詢條件中使用AND/OR/NOT操做符，這就是布爾查詢(Bool Query)。布爾查詢能夠接受一個must參數(等價於AND)，一個must_not參數(等價於NOT)，以及一個should參數(等價於OR)。好比，我想查詢about中出現music或者climb關鍵字的員工，員工的名字是John，但姓氏不是smith，咱們能夠這麼來查詢：

curl -XGET 'localhost:9200/megacorp/employee/_search' -d '
{
    "query": {
        "bool": {
                "must": {
                    "bool" : { 
                        "should": [
                            { "match": { "about": "music" }},
                            { "match": { "about": "climb" }} ] 
                    }
                },
                "must": {
                    "match": { "first_nale": "John" }
                },
                "must_not": {
                    "match": {"last_name": "Smith" }
                }
            }
    }
}'

5. Fuzzy Queries（模糊查詢）

　　模糊查詢能夠在Match和 Multi-Match查詢中使用以便解決拼寫的錯誤，模糊度是基於Levenshteindistance計算與原單詞的距離。使用以下：

curl -XGET 'localhost:9200/megacorp/employee/_search' -d '
{
    "query": {
        "multi_match" : {
            "query" : "rock climb",
            "fields": ["about", "interests"],
            "fuzziness": "AUTO"
        }
    },
    "_source": ["about", "interests", "first_name"],
    "size": 1
}'

　　上面咱們將fuzziness的值指定爲AUTO，其在term的長度大於5的時候至關於指定值爲2，然而80%的人拼寫錯誤的編輯距離(edit distance)爲1，全部若是你將fuzziness設置爲1可能會提升你的搜索性能

6. Wildcard Query(通配符查詢)

　　通配符查詢容許咱們指定一個模式來匹配，而不須要指定完整的trem。?將會匹配如何字符；*將會匹配零個或者多個字符。好比咱們想查找全部名字中以J字符開始的記錄，咱們能夠以下使用：

curl -XGET 'localhost:9200/megacorp/employee/_search' -d '
{
    "query": {
            "wildcard" : {
                "first_name" : "s*"
            }
        },
        "_source": ["first_name", "last_name"],
    "highlight": {
            "fields" : {
                "first_name" : {}
            }
        }
}'

7. Regexp Query(正則表達式查詢)
　　ElasticSearch還支持正則表達式查詢，此方式提供了比通配符查詢更加複雜的模式。好比咱們先查找做者名字以J字符開頭，中間是若干個a-z之間的字符，而且以字符n結束的記錄，能夠以下查詢：

curl -XGET 'localhost:9200/megacorp/employee/_search' -d '
{
    "query": {
        "regexp" : {
            "first_name" : "J[a-z]*n"
        }
    },
    "_source": ["first_name", "age"],
    "highlight": {
        "fields" : {
            "first_name" : {}
        }
    }
}'

8. Match Phrase Query(匹配短語查詢)
　　匹配短語查詢要求查詢字符串中的trems要麼都出現Document中、要麼trems按照輸入順序依次出如今結果中。在默認狀況下，查詢輸入的trems必須在搜索字符串緊挨着出現，不然將查詢不到。不過咱們能夠指定slop參數，來控制輸入的trems之間有多少個單詞仍然可以搜索到，以下所示：

curl -XGET 'localhost:9200/megacorp/employee/_search' -d '
{
    "query": {
        "multi_match": {
            "query": "climb rock",
            "fields": [
                "about",
                "interests"
            ],
            "type": "phrase",
            "slop": 3
        }
    },
    "_source": [
        "title",
        "about",
        "interests"
    ]
}'

　　從上面的例子能夠看出，id爲4的document被搜索（about字段裏面精確匹配到了climb rock），而且分數比較高；而id爲1的document也被搜索到了，雖然其about中的climb和rock單詞並非緊挨着的，可是咱們指定了slop屬性，因此被搜索到了。若是咱們將"slop":3條件刪除，那麼id爲1的文檔將不會被搜索到。

9. Match Phrase Prefix Query(匹配短語前綴查詢)
　　匹配短語前綴查詢能夠指定單詞的一部分字符前綴便可查詢到該單詞，和match phrase query同樣咱們也能夠指定slop參數；同時其還支持max_expansions參數限制被匹配到的terms數量來減小資源的使用,使用以下：

curl -XGET 'localhost:9200/megacorp/employee/_search' -d '
{
    "query": {
        "match_phrase_prefix": {
            "summary": {
                "query": "cli ro",
                "slop": 3,
                "max_expansions": 10
            }
        }
    },
    "_source": [
        "about",
        "interests",
        "first_name"
    ]
}'

10. Query String
　　query_string查詢提供了一種手段可使用一種簡潔的方式運行multi_match queries, bool queries, boosting, fuzzy matching, wildcards, regexp以及range queries的組合查詢。在下面的例子中，咱們運行了一個模糊搜索(fuzzy search)，搜索關鍵字是search algorithm，而且做者包含grant ingersoll或者tom morton。而且搜索了全部的字段，其中summary字段的權重爲2：

curl -XGET 'localhost:9200/megacorp/employee/_search' -d '
{
    "query": {
        "query_string" : {
            "query": "(saerch~1 algorithm~1) AND (grant ingersoll) OR (tom morton)",
            "fields": ["_all", "summary^2"]
        }
    },
    "_source": [ "title", "summary", "authors" ],
    "highlight": {
        "fields" : {
            "summary" : {}
        }
    }
}'

11. Simple Query String(簡單查詢字符串)
　　simple_query_string是query_string的另外一種版本，其更適合爲用戶提供一個搜索框中，由於其使用+/|/- 分別替換AND/OR/NOT，若是用輸入了錯誤的查詢，其直接忽略這種狀況而不是拋出異常。使用以下：

curl -POST 'localhost:9200/megacorp/employee/_search' -d '
{
    "query": {
        "simple_query_string" : {
        "query": "(saerch~1 algorithm~1) + (grant ingersoll) | (tom morton)",
        "fields": ["_all", "summary^2"]
        }
    },
    "_source": [ "title", "summary", "authors" ],
    "highlight": {
        "fields" : {
            "summary" : {}
        }
    }
}'

12. Term/Terms Query
　　前面的例子中咱們已經介紹了全文搜索(full-text search)，但有時候咱們對結構化搜索中可以精確匹配並返回搜索結果更感興趣。這種狀況下咱們可使用term和terms查詢。在下面例子中，咱們想搜索全部興趣中有music的人：

curl -POST 'localhost:9200/megacorp/employee/_search' -d '
{
    "query": {
        "term" : {
            "interests": "music"
        }
    },
    "_source" : ["first_name","last_name","interests"]
}'

咱們還可使用terms關鍵字來指定多個terms，以下：

{
    "query": {
        "terms" : {
            "publisher": ["oreilly", "packt"]
        }
    }
}

13. Term Query - Sorted

　　查詢結果和其餘查詢結果同樣能夠很容易地對其進行排序，並且咱們能夠對輸出結果按照多層進行排序：

curl -XPOST 'localhost:9200/megacorp/employee/_search' -d '
{
    "query": {
        "term" : {
            "interests": "music"
        }
    },
    "_source" : ["interests","first_name","about"],
    "sort": [
        { "publish_date": {"order":"desc"}},
        { "id": { "order": "desc" }}
    ]
}'

14. Range Query(範圍查詢)
另外一種結構化查詢就是範圍查詢。在下面例子中，咱們搜索全部發行年份爲2015的圖書：

curl -XPOST 'localhost:9200/person/worker/_search?pretty' -d '
{
    "query": {
        "range" : {
            "birthday": {
                "gte": "2017-02-01",
                "lte": "2017-05-01"
            }
        }
    },
    "_source" : ["first_name","last_name","birthday"]
}'

範圍查詢能夠應用於日期，數字以及字符類型的字段。

15. Filtered Query(過濾查詢)
　　過濾查詢容許咱們對查詢結果進行篩選。好比：咱們查詢about和interests中包含music關鍵字的員工，可是咱們想過濾出birthday大於2017/02/01的結果，能夠以下使用：

curl -XPOST :9200/megacorp/employee/_search?pretty' -d '
{
    "query": {
        "filtered": {
            "query" : {
                "multi_match": {
                    "query": "music",
                    "fields": ["about","interests"]
                }
            },
            "filter": {
                "range" : {
                    "birthday": {
                        "gte": 2017-02-01
                    }
                }
            }
        }
    },
    "_source" : ["first_name","last_name","about", "interests"]
}'

注意：過濾查詢(Filtered queries)並不強制過濾條件中指定查詢,若是沒有指定查詢條件，則會運行match_all查詢，其將會返回index中全部文檔，而後對其進行過濾，在實際運用中，過濾器應該先被執行，這樣能夠減小須要查詢的範圍，並且，第一次使用fliter以後其將會被緩存，這樣會對性能代理提高。Filtered queries在即將發行的Elasticsearch 5.0中移除了，咱們可使用bool查詢來替換他，下面是使用bool查詢來實現上面同樣的查詢效果，返回結果同樣：

curl -XPOST 'localhost:9200/megacorp/employee/_search?pretty' -d '
{
    "query": {
        "bool": {
            "must" : {
                "multi_match": {
                    "query": "music",
                    "fields": ["about","interests"]
                }
            },
            "filter": {
                "range" : {
                    "birthday": {
                        "gte": 2017-02-01
                    }
                }
            }
        }
    },
    "_source" : ["first_name","last_name","about", "interests"]
}'

16. Multiple Filters(多過濾器查詢)
　　多過濾器查詢能夠經過結合使用bool過濾查詢實現。下面的示例中，咱們將篩選出返回的結果必須至少有20條評論，必須是在2015年以前發佈的，並且應該是由O'Reilly出版的，首先創建索引iteblog_book_index並向其插入數據，以下所示：

curl -XPOST 'localhost:9200/iteblog_book_index/book/1' -d '{ "title": "Elasticsearch: The Definitive Guide", "authors": ["clinton gormley", "zachary tong"], "summary" : "A distibuted real-time search and analytics engine", "publish_date" : "2015-02-07","num_reviews": 20, "publisher": "oreilly" }'
curl -XPOST 'localhost:9200/iteblog_book_index/book/2' -d '{ "title": "Taming Text: How to Find, Organize, and Manipulate It", "authors": ["grant ingersoll", "thomas morton", "drew farris"], "summary" : "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization", "publish_date" : "2013-01-24", "num_reviews": 12, "publisher": "manning" }'
curl -XPOST 'localhost:9200/iteblog_book_index/book/3' -d '{ "title": "Elasticsearch in Action", "authors": ["radu gheorge", "matthew lee hinman", "roy russo"], "summary" : "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms", "publish_date" : "2015-12-03", "num_reviews": 18, "publisher": "manning" }'
curl -XPOST 'localhost:9200/iteblog_book_index/book/4' -d '{ "title": "Solr in Action", "authors": ["trey grainger", "timothy potter"], "summary" : "Comprehensive guide to implementing a scalable search engine using Apache Solr", "publish_date" : "2014-04-05", "num_reviews": 23, "publisher": "manning" }'

而後執行以下查詢語句：

curl -XPOST 'localhost:9200/iteblog_book_index/book/_search?pretty' -d '
{
    "query": {
        "filtered": {
            "query" : {
                "multi_match": {
                "query": "elasticsearch",
                "fields": ["title","summary"]
                }
            },
            "filter": {
                "bool": {
                    "must": {
                        "range" : { "num_reviews": { "gte": 20 } }
                    },
                    "must_not": {
                        "range" : { "publish_date": { "lte": "2014-12-31" } }
                    },
                    "should": {
                        "term": { "publisher": "oreilly" }
                    }
                }
            }
        }
    },
    "_source" : ["title","summary","publisher", "num_reviews", "publish_date"]
}'

17. Function Score: Field Value Factor
　　在某些場景下，你可能想對某個特定字段設置一個因子(factor)，並經過這個因子計算某個文檔的相關度(relevance score)。這是典型地基於文檔(document)的重要性來擡高其相關性的方式。在下面例子中，咱們想找到更受歡迎的圖書(是經過圖書的評論實現的)，並將其權重擡高，這裏能夠經過使用field_value_factor來實現

curl -XPOST 'localhost:9200/iteblog_book_index/book/_search?pretty' -d '
{
    "query": {
        "function_score": {
            "query": {
                "multi_match" : {
                    "query" : "search engine",
                    "fields": ["title", "summary"]
                }
            },
            "field_value_factor": {
                "field" : "num_reviews",
                "modifier": "log1p",
                "factor" : 2
            }
        }
    },
    "_source": ["title", "summary", "publish_date", "num_reviews"]
}'

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。