Elasticsearch的聚合查詢,跟數據庫的聚合查詢效果是同樣的,咱們能夠將兩者拿來對比學習,如求和、求平均值、求最大最小等等。java
數據分組,一些數據按照某個字段進行bucket劃分,這個字段值相同的數據放到一個bucket中。能夠理解成Java中的Map<String, List<Object>>結構,相似於Mysql中的group by後的查詢結果。mysql
對一個數據分組執行的統計,好比計算最大值,最小值,平均值等
相似於Mysql中的max(),min(),avg()函數的值,都是在group by後使用的。sql
咱們仍是以英文兒歌爲案例背景,回顧一下索引結構:數據庫
PUT /music { "mappings": { "children": { "properties": { "id": { "type": "keyword" }, "author_first_name": { "type": "text", "analyzer": "english" }, "author_last_name": { "type": "text", "analyzer": "english" }, "author": { "type": "text", "analyzer": "english", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "name": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "content": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "language": { "type": "text", "analyzer": "english", "fielddata": true }, "tags": { "type": "text", "analyzer": "english" }, "length": { "type": "long" }, "likes": { "type": "long" }, "isRelease": { "type": "boolean" }, "releaseDate": { "type": "date" } } } } }
GET /music/children/_search { "size": 0, "aggs": { "song_qty_by_language": { "terms": { "field": "language" } } } }
語法解釋:微信
響應結果以下:架構
{ "took": 2, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 5, "max_score": 0, "hits": [] }, "aggregations": { "song_qty_by_language": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "english", "doc_count": 5 } ] } } }
語法解釋:併發
默認按doc_count降序排序。app
GET /music/children/_search { "size": 0, "aggs": { "lang": { "terms": { "field": "language" }, "aggs": { "length_avg": { "avg": { "field": "length" } } } } } }
這裏演示的是兩層aggs聚合查詢,先按語種統計,獲得數據分組,再在數據分組裏算平均時長。分佈式
多個aggs嵌套語法也是如此,注意一下aggs代碼塊的位置便可。ide
最經常使用的統計:count,avg,max,min,sum,語法含義與mysql相同。
GET /music/children/_search { "size": 0, "aggs": { "color": { "terms": { "field": "language" }, "aggs": { "length_avg": { "avg": { "field": "length" } }, "length_max": { "max": { "field": "length" } }, "length_min": { "min": { "field": "length" } }, "length_sum": { "sum": { "field": "length" } } } } } }
以30秒爲一段,看各段區間的平均值。
histogram語法位置跟terms同樣,做範圍分區,搭配interval參數一塊兒使用
interval:30表示分的區間段爲[0,30),[30,60),[60,90),[90,120)
段的閉合關係是左開右閉,若是數據在某段區間內沒有,也會返回空的區間。
GET /music/children/_search { "size": 0, "aggs": { "sales_price_range": { "histogram": { "field": "length", "interval": 30 }, "aggs": { "length_avg": { "avg": { "field": "length" } } } } } }
這種數據的結果能夠用來生成柱狀圖或折線圖。
按月統計
date histogram與histogram語法相似,搭配date interval指定區間間隔
extended_bounds表示最大的時間範圍。
GET /music/children/_search { "size": 0, "aggs": { "sales": { "date_histogram": { "field": "releaseDate", "interval": "month", "format": "yyyy-MM-dd", "min_doc_count": 0, "extended_bounds": { "min": "2019-10-01", "max": "2019-12-31" } } } } }
interval的值能夠天、周、月、季度、年等。咱們能夠延伸一下,好比統計今年每一個季度的新發布歌曲的點贊數量
GET /music/children/_search { "size": 0, "aggs": { "sales": { "date_histogram": { "field": "releaseDate", "interval": "quarter", "format": "yyyy-MM-dd", "min_doc_count": 0, "extended_bounds": { "min": "2019-01-01", "max": "2019-12-31" } }, "aggs": { "lang_qty": { "terms": { "field": "language" }, "aggs": { "like_sum": { "sum": { "field": "likes" } } } }, "total" :{ "sum": { "field": "likes" } } } } } }
聚合查詢能夠和query搭配使用,至關於mysql中where與group by聯合使用
GET /music/children/_search { "size": 0, "query": { "match": { "language": "english" } }, "aggs": { "sales": { "terms": { "field": "language" } } } }
GET /music/children/_search { "size": 0, "query": { "constant_score": { "filter": { "term": { "language": "english" } } } }, "aggs": { "sales": { "terms": { "field": "language" } } } }
global:就是global bucket,會將全部的數據歸入聚合scope,不受前面的query或filter影響。
global bucket適用於同時統計指定條件的數據與所有數據的對比,如咱們創造的場景:指定做者的歌與所有歌曲的點贊數量對比。
GET /music/children/_search { "size": 0, "query": { "match": { "author": "Jean Ritchie" } }, "aggs": { "likes": { "sum": { "field": "likes" } }, "all": { "global": {}, "aggs": { "all_likes": { "sum": { "field": "likes" } } } } } }
aggs.filter針對是聚合裏的數據
bucket filter:對不一樣的bucket下的aggs,進行filter
相似於mysql的中having語法
GET /music/children/_search { "size": 0, "aggs": { "recent_60d": { "filter": { "range": { "releaseDate": { "gte": "now-60d" } } }, "aggs": { "recent_60d_likes_sum": { "sum": { "field": "likes" } } } }, "recent_30d": { "filter": { "range": { "releaseDate": { "gte": "now-30d" } } }, "aggs": { "recent_30d_likes_sum": { "avg": { "field": "likes" } } } } } }
默認按doc_count降序排序,排序規則能夠改,order裏面能夠指定aggs的別名,如length_avg,相似於mysql的order by cnt asc。
GET /music/children/_search { "size": 0, "aggs": { "group_by_lang": { "terms": { "field": "language", "order": { "length_avg": "desc" } }, "aggs": { "length_avg": { "avg": { "field": "length" } } } } } }
本篇主要介紹經常使用的聚合查詢,均以示例爲主,瞭解基本寫法後能夠快速閱讀,有很差理解的地方,多與咱們熟悉的數據庫查詢SQL做比較,謝謝。
專一Java高併發、分佈式架構,更多技術乾貨分享與心得,請關注公衆號:Java架構社區
能夠掃左邊二維碼添加好友,邀請你加入Java架構社區微信羣共同探討技術