Elasticsearch查詢和聚合基本語法

時間 2019-12-07

原文原文鏈接

1.概述

Elasticsearch主要的查詢語法包括URI查詢和body查詢，URI比較輕便快速，而body查詢做爲一種json的格式化查詢，能夠有許多限制條件。本文主要介紹結構化查詢的query，filter，aggregate的使用，本文使用的ES版本爲6.5.4，中文分詞器使用的ik，安裝和使用能夠參考：javascript

Elasticsearch 安裝和使用html

Elasticsearch中ik分詞器的使用java

在ES創建如下索引，而且導入數據
git

PUT /news
{
        "aliases": {
            "test.chixiao.news": {}
        },
        "mappings":{
            "news": {
                "dynamic": "false",
                "properties": {
                    "id": {
                        "type": "integer"
                    },
                    "title": {
                        "analyzer": "ik_max_word",
                        "type": "text"
                    },
                    "summary": {
                        "analyzer": "ik_max_word",
                        "type": "text"
                    },
                    "author": {
                        "type": "keyword"
                    },
                    "publishTime": {
                        "type": "date"
                    },
                    "modifiedTime": {
                        "type": "date"
                    },
                    "createTime": {
                        "type": "date"
                    },
                    "docId": {
                        "type": "keyword"
                    },
                    "voteCount": {
                        "type": "integer"
                    },
                    "replyCount": {
                        "type": "integer"
                    }
                }
            }
        },
        "settings":{
            "index": {
                "refresh_interval": "1s",
                "number_of_shards": 3,
                "max_result_window": "10000000",
                "mapper": {
                    "dynamic": "false"
                },
                "number_of_replicas": 1
            },
            "analysis": {
                "normalizer": {
                    "lowercase": {
                        "type": "custom",
                        "char_filter": [],
                        "filter": [
                            "lowercase",
                            "asciifolding"
                        ]
                    }
                },
                "analyzer": {
                    "1gram": {
                        "type": "custom",
                        "tokenizer": "ngram_tokenizer"
                    }
                },
                "tokenizer": {
                    "ngram_tokenizer": {
                        "type": "nGram",
                        "min_gram": "1",
                        "max_gram": "1",
                        "token_chars": [
                            "letter",
                            "digit"
                        ]
                    }
                }
            }
        }
    }複製代碼

2.查詢

2.1一個查詢的例子

一個簡單的查詢例子以下，查詢主要分爲query和filter，這兩種類型的查詢結構都在query裏面，剩下的sort標識排序，size和from用來翻頁，_source用來指定召回document返回哪些字段。sql

GET /news/_search
{
  "query": {"match_all": {}}, 
  "sort": [
    {
      "publishTime": {
        "order": "desc"
      }
    }
  ],
  "size": 2,
  "from": 0,
  "_source": ["title", "id", "summary"]
}複製代碼

返回結果：json

{
  "took" : 7,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 204,
    "max_score" : null,
    "hits" : [
      {
        "_index" : "news",
        "_type" : "news",
        "_id" : "228",
        "_score" : null,
        "_source" : {
          "summary" : "據陝西高院消息，6月11日上午，西安市中級人民法院二審公開開庭宣判了陝西省首例「套路貸」涉黑案件——韓某某等人非法放貸一案，法院駁回上訴，維持原判。西安市中級人",
          "id" : 228,
          "title" : "陝西首例套路貸涉黑案宣判:團伙對借款人噴辣椒水"
        },
        "sort" : [
          1560245097000
        ]
      },
      {
        "_index" : "news",
        "_type" : "news",
        "_id" : "214",
        "_score" : null,
        "_source" : {
          "summary" : "網易娛樂6月11日報道6月11日，有八卦媒體曝光曹雲金與妻子唐菀現身天津民政局辦理了離婚手續。對此，網易娛樂向曹雲金經紀人求證，獲得了對方獨家迴應：「確實是離婚",
          "id" : 214,
          "title" : "曹雲金認可已離婚:和平離婚 有人惡意中傷心思歹毒"
        },
        "sort" : [
          1560244657000
        ]
      }
    ]
  }
}複製代碼

返回結果中took表示耗時，_shards表示分片信息，當前index有3個分片，而且3個分片都工做正常，hits表示命中的結果，total表示命中總數，max_score表示最大的分值，hits表示命中的具體document。
緩存

查詢分爲精確過濾（filter）和全文搜索（query）兩種：精確過濾容易被緩存，所以它的執行速度很是快。bash

2.2 FIlter查詢

term

term 查找能夠精確的找到符合條件的記錄，其中的FIELD標識索引中的字段，VALUE表示須要查詢的值。app

{"term": {
    "FIELD": {
      "value": "VALUE"
    }
  }
}複製代碼

好比，查詢source爲中新經緯的新聞，那麼能夠這麼使用：less

GET /news/_search
{
  "query": {"term": {
    "source": {
      "value": "中新經緯"
    }
  }}
}複製代碼

bool

當須要多個邏輯組合查詢的時候，能夠使用bool來組各邏輯。bool能夠包含

{
   "bool" : {
      "must" :     [],
      "should" :   [],
      "must_not" : [],
   }
}複製代碼

must：搜索的結果必須匹配，相似SQL的AND
must_not: 搜索的結果必須不匹配，相似SQL的NOT
should: 搜索的結果至少匹配到一個，相似SQL的OR
當咱們須要查source爲中新經緯，而且id爲4或者75的新聞，能夠這樣使用，其中的minimun_should_match用來指定should內的條件須要匹配多少個，默認是0，0的狀況下should內容只參與打分，不作倒排過濾

GET /news/_search
{
  "query": {
    "bool": {
    "must": [
    {"term": {
      "source": {
        "value": "中新經緯"
      }
    }}
  ],
  "should": [
    {"term": {
      "id": {
        "value": "4"
      }
    }},
    {"term": {
      "id": {
        "value": "75"
      }
    }}
  ],
  "minimum_should_match": 1
  }}
}複製代碼

terms

對於上面查找多個精確值的狀況，能夠使用terms，好比查找id是4或者75的文章

GET /news/_search
{
  "query": {"terms": {
    "id": [
      "4",
      "75"
    ]
  }}
}複製代碼

range

對於須要用到範圍的查詢，能夠使用range，range和term做用的位置相同，好比查找id從1到10的文章，其中：

gt: > 大於（greater than）
lt: < 小於（less than）
gte: >= 大於或等於（greater than or equal to）
lte: <= 小於或等於（less than or equal to）

GET /news/_search
{
  "query": {"range": {
    "id": {
      "gte": 1,
      "lte": 10
    }
  }}
}複製代碼

exists

es中能夠使用exists來查找某個字段存在或者不存在的document，好比查找存在author字段的文檔，也能夠在bool內配合should和must_not使用，就能夠實現不存在或者可能存在的查詢。

GET /news/_search
{
  "query": {
    "exists": {"field": "author"}
  }
}複製代碼

2.3.Query查詢

和filter的精確匹配不同，query能夠進行一些字段的全文搜索和搜索結果打分，es中只有類型爲text的字段才能夠被分詞，類型爲keyword雖然也是字符串，但只能做爲枚舉，不能被分詞，text的分詞類型能夠在建立索引的時候指定。

match

當咱們想要搜某個字段的時候能夠使用match，好比查找文章中出現體育的新聞

GET /news/_search
{
  "query": {
    "exists": {"field": "author"}
  }
}複製代碼

在match中咱們還能夠指定分詞器，好比指定分詞器爲ik_smart對輸入的詞儘可能分大顆粒，此時召回的就是含有進口紅酒的document，若是指定分詞器爲ik_max_word則分出的詞顆粒會比較小，會召回包含口紅和紅酒的document

{
    "match": {
      "name": {
        "query": "進口紅酒",
        "analyzer": "ik_smart"
      }
    
    }
  }複製代碼

對於query的文本有可能分出好幾個詞，這個時候能夠用and鏈接，表示多個詞都命中才被召回，若是用or鏈接，則相似should能夠控制，至少命中多少個詞才被召回。好比搜索包含體育新聞內容的新聞，下面這個查詢只要包含一個體育或者新聞的document都會被召回

GET /news/_search
{
  "query": {
    "match": {
      "summary": {
        "query": "體育新聞",
        "operator": "or",
        "minimum_should_match": 1
      }
    }
  }
}複製代碼

multi_match

當須要搜索多個字段的時候，能夠使用multi_match進行查詢，好比在title或者summary中搜索含有新聞關鍵詞的document

GET /news/_search
{
  "query": {
    "multi_match": {
      "query": "新聞",
      "fields": ["title", "summary"]
    }
  }
}複製代碼

2.4.組合查詢

有了全文搜索和過濾的這些字段，配合bool就能夠實現複雜的組合查詢

GET /news/_search
{
  "query": {"bool": {
    "must": [
      {"match": {
        "summary": {
          "boost": 1,
          "query": "長安"
        }
      }
      },
      {
        "term": {
          "source": {
            "value": "中新經緯",
            "boost": 2
          }
        }
      }
    ],
    "filter": {"bool": {
      "must":[
        {"term":{
          "id":75
        }}
        ]
    }}
  }}
}複製代碼

上面請求bool中的must、must_not、should能夠使用term，range、match。這些默認都是參與打分的，能夠經過boost來控制打分的權重，若是不想要某些查詢條件參與打分，能夠在bool中添加filter，這個filter中的查詢字段都不參與打分，並且查詢的內容能夠被緩存。

3.聚合

聚合的基本格式爲：

GET /news/_search
{
  "size": 0,
  "aggs": {
    "NAME": {
      "AGG_TYPE": {}
    }
  }
}複製代碼

其中NAME表示當前聚合的名字，能夠取任意合法的字符串，AGG_TYPE表示聚合的類型，常見的爲分爲多值聚合和單值聚合

3.1.一個聚合的例子

GET /news/_search
{
 "size": 0, 
  "aggs": {
    "sum_all": {
      "sum": {
        "field": "replyCount"
      }
    }
  }
}複製代碼

上面的例子表示查詢當前庫裏面的replayCount的和，返回結果：

{
  "took" : 8,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 204,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "sum_all" : {
      "value" : 390011.0
    }
  }
}
複製代碼

返回結果中默認會包含命中的document，因此須要把size指定爲0，結果中的sum_all爲請求中指定的名字。

Elasticsearch中的聚合類型主要分爲Metrics和Bucket

3.2.Metrics

metrics主要是一些單值的返回，像avg、max、min、sum、stats等這些計算。

好比計算index裏面最多的點贊數是多少

GET /news/_search
{
  "size": 0,
  "aggs": {
    "max_replay": {
      "max": {
        "field": "replyCount"
      }
    }
  }
}複製代碼

stats

經常使用的一些統計信息，能夠用stats，好比查看某個字段的，總數，最小值，最大值，平均值等，好比查看document中新聞回覆量的基本狀況

GET /news/_search
{
 "size": 0, 
  "aggs": {
    "cate": {
      "stats": {
        "field": "replyCount"
      }
    }
  }
}
複製代碼

返回結果爲:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 204,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "cate" : {
      "count" : 202,
      "min" : 0.0,
      "max" : 32534.0,
      "avg" : 1930.7475247524753,
      "sum" : 390011.0
    }
  }
}
複製代碼

3.3.Bucket

桶相似於sql裏面的group by，使用Bucket會對內容進行分桶

terms

利用terms分桶以後，能夠查看數據的分佈，好比能夠查看index中一共有多少個source，每一個source有多少文章，size是用來指定返回最多的幾個分類

GET /news/_search
{
  "size": 0,
  "aggs": {
    "myterms": {
      "terms": {
        "field": "source",
        "size": 100
      }
    }
  }
}複製代碼

3.4.組合聚類

GET /news/_search
{
  "size": 0,
  "aggs": {
    "myterms": {
      "terms": {
        "field": "source",
        "size": 100
      },
      "aggs": {
        "replay": {
          "terms": {
            "field": "replyCount",
            "size": 10
          }
        },
        "avg_price": { 
            "avg": {
                  "field": "voteCount"
               }
            }
      }
    }
  }
}複製代碼

上面代碼首先對source分桶，在每一個souce類型裏面在對replayCount進行分桶，而且計算每一個source類裏面的voteCount的平均值

返回的某一項結果以下

{
          "key" : "中國新聞網",
          "doc_count" : 16,
          "avg_price" : {
            "value" : 1195.0
          },
          "replay" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 4,
            "buckets" : [
              {
                "key" : 0,
                "doc_count" : 3
              },
              {
                "key" : 1,
                "doc_count" : 1
              },
              {
                "key" : 5,
                "doc_count" : 1
              },
              {
                "key" : 32,
                "doc_count" : 1
              },
              {
                "key" : 97,
                "doc_count" : 1
              },
              {
                "key" : 106,
                "doc_count" : 1
              },
              {
                "key" : 133,
                "doc_count" : 1
              },
              {
                "key" : 155,
                "doc_count" : 1
              },
              {
                "key" : 156,
                "doc_count" : 1
              },
              {
                "key" : 248,
                "doc_count" : 1
              }
            ]
          }
        }複製代碼

4.查詢和聚和的組合

有了查詢和聚合，咱們就能夠對查詢的結果作聚合，好比我想查看summary中包含體育的新聞都是那些來源網站，就能夠像下面這樣查詢

GET /news/_search
{
 "size": 0, 
 "query": {"bool": {"must": [
   {"match": {
     "summary": "體育"
   }}
 ]}}, 
  "aggs": {
    "cate": {
      "terms": {
        "field": "source"
      }
    }
  }
}複製代碼