聚合分析是數據庫中重要的功能特性,完成對一個查詢的數據集中數據的聚合計算,如:找出某字段(或計算表達式的結果)的最大值、最小值,計算和、平均值等。ES做爲搜索引擎兼數據庫,一樣提供了強大的聚合分析能力。html
對一個數據集求最大、最小、和、平均值等指標的聚合,在ES中稱爲指標聚合 metric正則表達式
而關係型數據庫中除了有聚合函數外,還能夠對查詢出的數據進行分組group by,再在組上進行指標聚合。在 ES 中group by 稱爲分桶,桶聚合 bucketing數據庫
ES中還提供了矩陣聚合(matrix)、管道聚合(pipleline),但還在完善中。 less
在查詢請求體中以aggregations節點按以下語法定義聚合分析:elasticsearch
"aggregations" : { "<aggregation_name>" : { <!--聚合的名字 --> "<aggregation_type>" : { <!--聚合的類型 --> <aggregation_body> <!--聚合體:對哪些字段進行聚合 --> } [,"meta" : { [<meta_data_body>] } ]? <!--元 --> [,"aggregations" : { [<sub_aggregation>]+ } ]? <!--在聚合裏面在定義子聚合 --> } [,"<aggregation_name_2>" : { ... } ]*<!--聚合的名字 --> }
說明:ide
aggregations 也可簡寫爲 aggs函數
聚合計算的值能夠取字段的值,也但是腳本計算的結果。ui
示例1:查詢全部記錄中年齡的最大值搜索引擎
POST /book1/_search?pretty { "size": 0, "aggs": { "maxage": { "max": { "field": "age" } } } }
結果1:spa
{ "took": 4, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 41, "max_score": 0, "hits": [] }, "aggregations": { "maxage": { "value": 54 } } }
示例2:加上查詢條件,查詢名字包含'test'的年齡最大值:
POST /book1/_search?pretty { "query":{ "term":{ "name":"test" } }, "size": 2, "sort": [ { "age": { "order": "desc" } } ], "aggs": { "maxage": { "max": { "field": "age" } } } }
結果2:
{ "took": 3, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 5, "max_score": null, "hits": [ { "_index": "book1", "_type": "english", "_id": "6IUkUmUBRzBxBrDgFok2", "_score": null, "_source": { "name": "test goog my money", "age": [ 14, 54, 45, 34 ], "class": "dsfdsf", "addr": "中國" }, "sort": [ 54 ] }, { "_index": "book1", "_type": "english", "_id": "54UiUmUBRzBxBrDgfIl9", "_score": null, "_source": { "name": "test goog my money", "age": [ 11, 13, 14 ], "class": "dsfdsf", "addr": "中國" }, "sort": [ 14 ] } ] }, "aggregations": { "maxage": { "value": 54 } } }
示例3:值來源於腳本,查詢全部記錄的平均年齡是多少,並對平均年齡加10
POST /book1/_search?pretty { "size":0, "aggs": { "avg_age": { "avg": { "script": { "source": "doc.age.value" } } }, "avg_age10": { "avg": { "script": { "source": "doc.age.value + 10" } } } } }
結果3:
{ "took": 3, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 41, "max_score": 0, "hits": [] }, "aggregations": { "avg_age": { "value": 7.585365853658536 }, "avg_age10": { "value": 17.585365853658537 } } }
示例4:指定field,在腳本中用_value 取字段的值
POST /book1/_search?pretty { "size":0, "aggs": { "sun_age": { "sum": { "field":"age", "script": { "source": "_value * 2" } } } } }
結果4:
{ "took": 4, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 41, "max_score": 0, "hits": [] }, "aggregations": { "sun_age": { "value": 942 } } }
示例5:爲沒有值字段指定值。如未指定,缺失該字段值的文檔將被忽略:
POST /book1/_search?pretty { "size":0, "aggs": { "sun_age": { "avg": { "field":"age", "missing":15 } } } }
結果5:
{ "took": 12, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 41, "max_score": 0, "hits": [] }, "aggregations": { "sun_age": { "value": 12.847826086956522 } } }
示例1:統計銀行索引book下年齡爲12的文檔數量
POST book1/english/_count { "query":{ "match":{ "age":12 } } }
結果1:
{ "count": 16, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 } }
示例1:
POST /book1/_search?size=0 { "aggs":{ "age_count":{ "value_count":{ "field":"age" } } } }
結果1:
{ "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 41, "max_score": 0, "hits": [] }, "aggregations": { "age_count": { "value": 38 } } }
示例1:
POST /book1/_search?size=0 { "aggs":{ "age_count":{ "value_count":{ "field":"age" } }, "name_count":{ "cardinality":{ "field":"age" } } } }
結果1:
{ "took": 16, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 41, "max_score": 0, "hits": [] }, "aggregations": { "name_count": { "value": 11 }, "age_count": { "value": 38 } } }
說明:有值的38個,去掉重複的以後以一共有11個。
示例1:
POST /book1/_search?size=0 { "aggs":{ "age_count":{ "stats":{ "field":"age" } } } }
結果1:
{ "took": 12, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 41, "max_score": 0, "hits": [] }, "aggregations": { "age_count": { "count": 38, "min": 1, "max": 54, "avg": 12.394736842105264, "sum": 471 } } }
高級統計,比stats多4個統計結果: 平方和、方差、標準差、平均值加/減兩個標準差的區間。
示例1:
POST /book1/_search?size=0 { "aggs":{ "age_stats":{ "extended_stats":{ "field":"age" } } } }
結果1:
{ "took": 8, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 41, "max_score": 0, "hits": [] }, "aggregations": { "age_stats": { "count": 38, "min": 1, "max": 54, "avg": 12.394736842105264, "sum": 471, "sum_of_squares": 11049, "variance": 137.13365650969527, "std_deviation": 11.710408041981085, "std_deviation_bounds": { "upper": 35.81555292606743, "lower": -11.026079241856905 } } } }
示例1:
對指定字段(腳本)的值按從小到大累計每一個值對應的文檔數的佔比(佔全部命中文檔數的百分比),返回指定佔比比例對應的值。默認返回[ 1, 5, 25, 50, 75, 95, 99 ]分位上的值。以下中間的結果,能夠理解爲:佔比爲50%的文檔的age值 <= 12,或反過來:age<=12的文檔數佔總命中文檔數的50%。
POST /book1/_search?size=0 { "aggs":{ "age_percentiles":{ "percentiles":{ "field":"age" } } } }
結果1:
{ "took": 16, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 41, "max_score": 0, "hits": [] }, "aggregations": { "age_percentiles": { "values": { "1.0": 1, "5.0": 1, "25.0": 1, "50.0": 12, "75.0": 13, "95.0": 40.600000000000016, "99.0": 54 } } } }
示例2:指定分位值(佔比50%,96%,99%的範圍值分別是多少)
POST /book1/_search?size=0 { "aggs":{ "age_percentiles":{ "percentiles":{ "field":"age", "percents" : [50,96,99] } } } }
結果2:
{ "took": 6, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 41, "max_score": 0, "hits": [] }, "aggregations": { "age_percentiles": { "values": { "50.0": 12, "96.0": 44.779999999999966, "99.0": 54 } } } }
說明:50%的數值<= 12, 96%的數值<= 96%, 99%的數值<= 54
示例1:統計年齡小於25和30的文檔的佔比,和第7項相反
POST /book1/_search?size=0 { "aggs":{ "aggs_perc_rank":{ "percentile_ranks":{ "field":"age", "values" : [12,35] } } } }
結果1:
{ "took": 8, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 41, "max_score": 0, "hits": [] }, "aggregations": { "aggs_perc_rank": { "values": { "12.0": 71.05263157894737, "35.0": 92.76315789473685 } } } }
結果說明:年齡小於12的文檔佔比爲71%,年齡小於35的文檔佔比爲92%,
參考官網連接:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-geobounds-aggregation.html
參考官網連接:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-geocentroid-aggregation.html
示例1:
POST /book1/_search?size=0 { "aggs":{ "age_terms":{ "terms":{ "field":"age" } } } }
說明:至關於group by age
結果1:
{ "took": 4, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 41, "max_score": 0, "hits": [] }, "aggregations": { "age_terms": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 1, "buckets": [ { "key": 12, "doc_count": 16 }, { "key": 1, "doc_count": 11 }, { "key": 13, "doc_count": 2 }, { "key": 14, "doc_count": 2 }, { "key": 11, "doc_count": 1 }, { "key": 16, "doc_count": 1 }, { "key": 21, "doc_count": 1 }, { "key": 33, "doc_count": 1 }, { "key": 34, "doc_count": 1 }, { "key": 45, "doc_count": 1 } ] } } }
結果說明:
"doc_count_error_upper_bound": 0:文檔計數的最大誤差值
"sum_other_doc_count": 1:未返回的其餘文檔數,不在桶裏的文檔數量
默認狀況下返回按文檔計數從高到低的前10個分組:
示例2:sizz能夠指定返回多少組數
POST /book1/_search?size=0 { "aggs":{ "age_terms":{ "terms":{ "field":"age", "size":5 } } } }
結果2:
{ "took": 4, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 41, "max_score": 0, "hits": [] }, "aggregations": { "age_terms": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 6, "buckets": [ { "key": 12, "doc_count": 16 }, { "key": 1, "doc_count": 11 }, { "key": 13, "doc_count": 2 }, { "key": 14, "doc_count": 2 }, { "key": 11, "doc_count": 1 } ] } } }
示例3:每一個分組上顯示誤差值
POST /book1/_search?size=0 { "aggs":{ "age_terms":{ "terms":{ "field":"age", "size":5, "show_term_doc_count_error": true } } } }
結果3:
{ "took": 5, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 41, "max_score": 0, "hits": [] }, "aggregations": { "age_terms": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 6, "buckets": [ { "key": 12, "doc_count": 16, "doc_count_error_upper_bound": 0 }, { "key": 1, "doc_count": 11, "doc_count_error_upper_bound": 0 }, { "key": 13, "doc_count": 2, "doc_count_error_upper_bound": 0 }, { "key": 14, "doc_count": 2, "doc_count_error_upper_bound": 0 }, { "key": 11, "doc_count": 1, "doc_count_error_upper_bound": 0 } ] } } }
示例4:shard_size 指定每一個分片上返回多少個分組
POST /book1/_search?size=0 { "aggs":{ "age_terms":{ "terms":{ "field":"age", "size":3, "shard_size": 20 } } } }
結果4:
{ "took": 3, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 41, "max_score": 0, "hits": [] }, "aggregations": { "age_terms": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 9, "buckets": [ { "key": 12, "doc_count": 16 }, { "key": 1, "doc_count": 11 }, { "key": 13, "doc_count": 2 } ] } } }
order 指定分組的排序
示例5:根據分組值"_key"排序
POST /book1/_search?size=0 { "aggs":{ "age_terms":{ "terms":{ "field":"age", "size":3, "order":{"_key":"desc"} } } } }
結果5:
{ "took": 6, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 41, "max_score": 0, "hits": [] }, "aggregations": { "age_terms": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 35, "buckets": [ { "key": 54, "doc_count": 1 }, { "key": 45, "doc_count": 1 }, { "key": 34, "doc_count": 1 } ] } } }
示例6:根據文檔計數"_count"排序
POST /book1/_search?size=0 { "aggs":{ "age_terms":{ "terms":{ "field":"age", "size":3, "order":{"_count":"desc"} } } } }
結果6:
{ "took": 91, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 41, "max_score": 0, "hits": [] }, "aggregations": { "age_terms": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 9, "buckets": [ { "key": 12, "doc_count": 16 }, { "key": 1, "doc_count": 11 }, { "key": 13, "doc_count": 2 } ] } } }
示例7:取分組指標值排序
POST /book1/_search?size=0 { "aggs":{ "age_terms":{ "terms":{ "field":"age", "order":{"max_age":"desc"} }, "aggs":{ "max_age":{ "max":{ "field":"age" } }, "min_age":{ "min":{ "field":"age" } } } } } }
說明:先根據age 分組,再計算每一個組的最大最小值,最後根據最大值倒排
示例8:篩選分組-正則表達式匹配值
POST book1/_search?size=0 { "aggs":{ "tags":{ "terms":{ "field":"name", "include":"裏*", "exclude":"test*" } } } }
結果8:
{ "took": 22, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 41, "max_score": 0, "hits": [] }, "aggregations": { "tags": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "裏", "doc_count": 13 } ] } } }
示例9:篩選分組-指定值列表
POST book1/_search?size=0 { "aggs":{ "Chinese":{ "terms":{ "field":"name", "include":["裏","國"] } }, "Test":{ "terms":{ "field":"name", "exclude":["test","the"] } } } }
結果9:
{ "took": 23, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 41, "max_score": 0, "hits": [] }, "aggregations": { "Test": { "doc_count_error_upper_bound": 6, "sum_other_doc_count": 559, "buckets": [ { "key": "裏", "doc_count": 12 }, { "key": "否", "doc_count": 11 }, { "key": "a", "doc_count": 7 }, { "key": "default", "doc_count": 7 }, { "key": "document", "doc_count": 7 }, { "key": "for", "doc_count": 7 }, { "key": "absolute", "doc_count": 6 }, { "key": "account", "doc_count": 6 }, { "key": "accurate", "doc_count": 6 }, { "key": "documents", "doc_count": 6 } ] }, "Chinese": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "國", "doc_count": 4 } ] } } }
示例10:根據腳本計算值分組
POST book1/_search?size=0 { "aggs":{ "name":{ "terms":{ "script":{ "source":"doc['age'].value + doc.age.value", "lang": "painless" } } } } }
說明:腳本取值的方式doc['age'].value 或者 doc.age.value
結果10:
示例1:在查詢命中的文檔中選取符合過濾條件的文檔進行聚合,先過濾再聚合(和上面的示例9示例9:篩選分組,區分開:先聚合再過濾)
POST book1/_search?size=0 { "aggs":{ "age_terms":{ "filter":{ "match":{"name":"test"} }, "aggs":{ "avg_age":{ "avg":{"field":"age" } } } } } }
結果1:
{ "took": 152, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 41, "max_score": 0, "hits": [] }, "aggregations": { "age_terms": { "doc_count": 5, "avg_age": { "value": 19.9 } } } }
示例1:分別統計包含‘test’,和‘裏’的文檔的個數
POST book1/_search?size=0 { "aggs":{ "age_terms":{ "filters":{ "filters":{ "test":{ "match":{"name":"test"} }, "china":{ "match":{"name":"裏"} } } } } } }
結果:
{ "took": 3, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 41, "max_score": 0, "hits": [] }, "aggregations": { "age_terms": { "buckets": { "china": { "doc_count": 13 }, "test": { "doc_count": 5 } } } } }
例如:日誌中選出 error和warning日誌的個數,做日誌預警
GET logs/_search { "size": 0, "aggs": { "messages": { "filters": { "filters": { "errors": { "match": { "body": "error" } }, "warnings": { "match": { "body": "warning" } } } } } } }
示例2:爲其餘值組指定key
POST book1/_search?size=0 { "aggs":{ "age_terms":{ "filters":{ "other_bucket_key": "other_messages", "filters":{ "test":{ "match":{"name":"test"} }, "china":{ "match":{"name":"裏"} } } } } } }
結果2:
{ "took": 9, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 41, "max_score": 0, "hits": [] }, "aggregations": { "age_terms": { "buckets": { "china": { "doc_count": 13 }, "test": { "doc_count": 5 }, "other_messages": { "doc_count": 23 } } } } }
示例1:
POST book1/_search?size=0 { "aggs":{ "age_range":{ "range":{ "field":"age", "keyed":true, "ranges":[ { "to":20, "key":"TW" }, { "from":25, "to":40, "key":"TH" }, { "from":60, "key":"SIX" } ] } } } }
結果1:
{ "took": 3, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 41, "max_score": 0, "hits": [] }, "aggregations": { "age_range": { "buckets": { "TW": { "to": 20, "doc_count": 31 }, "TH": { "from": 25, "to": 40, "doc_count": 2 }, "SIX": { "from": 60, "doc_count": 0 } } } } }
示例1:
POST /bank/_search?size=0 { "aggs": { "range": { "date_range": { "field": "date", "format": "MM-yyy", "ranges": [ { "to": "now-10M/M" }, { "from": "now-10M/M" } ] } } } }
結果1:
{ "took": 115, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 1000, "max_score": 0, "hits": [] }, "aggregations": { "range": { "buckets": [ { "key": "*-2017-08-01T00:00:00.000Z", "to": 1501545600000, "to_as_string": "2017-08-01T00:00:00.000Z", "doc_count": 0 }, { "key": "2017-08-01T00:00:00.000Z-*", "from": 1501545600000, "from_as_string": "2017-08-01T00:00:00.000Z", "doc_count": 0 } ] } } }
就是按天、月、年等進行聚合統計。可按 year (1y), quarter (1q), month (1M), week (1w), day (1d), hour (1h), minute (1m), second (1s) 間隔聚合或指定的時間間隔聚合。
示例1:
POST /bank/_search?size=0 { "aggs": { "sales_over_time": { "date_histogram": { "field": "date", "interval": "month" } } } }
結果1:
{ "took": 9, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 1000, "max_score": 0, "hits": [] }, "aggregations": { "sales_over_time": { "buckets": [] } } }
示例:統計沒有值的文檔的數量
POST /book/_search?size=0 { "aggs" : { "account_without_a_age" : { "missing" : { "field" : "age" } } } }
結果1:
{ "took": 10, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 41, "max_score": 0, "hits": [] }, "aggregations": { "account_without_age": { "doc_count": 8 } } }
參考官網連接:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-geodistance-aggregation.html