掌握聚合分析的查詢語法。
掌握指標聚合、桶聚合的用法html
ES聚合分析是什麼?正則表達式
聚合分析是數據庫中重要的功能特性,完成對一個查詢的數據集中數據的聚合計算,如:找出某字段(或計算表達式的結果)的最大值、最小值,計算和、平均值等。ES做爲搜索引擎兼數據庫,一樣提供了強大的聚合分析能力。數據庫
對一個數據集求最大、最小、和、平均值等指標的聚合,在ES中稱爲指標聚合 metric
而關係型數據庫中除了有聚合函數外,還能夠對查詢出的數據進行分組group by,再在組上進行指標聚合。在 ES 中group by 稱爲分桶,桶聚合 bucketingless
ES中還提供了矩陣聚合(matrix)、管道聚合(pipleline),但還在完善中。elasticsearch
ES聚合分析查詢的寫法ide
在查詢請求體中以aggregations節點按以下語法定義聚合分析:函數
"aggregations" : { "<aggregation_name>" : { //aggregations 也可簡寫爲 aggs "<aggregation_type>" : { <aggregation_body> } [,"meta" : { [<meta_data_body>] } ]? [,"aggregations" : { [<sub_aggregation>]+ } ]? } [,"<aggregation_name_2>" : { ... } ]* }
聚合分析的值來源學習
聚合計算的值能夠取字段的值,也但是腳本計算的結果。ui
max min sum avg搜索引擎
POST /bank/_search? { "size": 0, "aggs": { "masssbalance": { "max": { "field": "balance" } } } } 查詢全部客戶中餘額的最大值
POST /bank/_search? { "size": 2, "query": { "match": { "age": 24 } }, "sort": [ { "balance": { "order": "desc" } } ], "aggs": { "max_balance": { "max": { "field": "balance" } } } } 年齡爲24歲的客戶中的餘額最大值
POST /bank/_search?size=0 { "aggs" : { //值來源於腳本 "avg_age" : { "avg" : { "script" : { //查詢全部客戶的平均年齡是多少 "source" : "doc.age.value" } } }, "avg_age10" : { "avg" : { "script" : { "source" : "doc.age.value + 10" } } } }}
POST /bank/_search?size=0 { "aggs": { "sum_balance": { "sum": { "field": "balance", //指定field,在腳本中用_value 取字段的值 "script": { "source": "_value * 1.03" } } } } }
POST /bank/_search?size=0 { "aggs": { "avg_age": { "avg": { "field": "age", "missing": 18 } } }}
POST /bank/_search?size=0 { "aggs": { "avg_age": { "avg": { "field": "age", //爲缺失值字段,指定值。如未指定,缺失該字段值的文檔將被忽略。 "missing": 18 } } } }
文檔計數 count
POST /bank/_doc/_count { "query": { "match": { "age" : 24 } } }
cardinality 值去重計數
POST /bank/_search?size=0 { "aggs": { "age_count": { "cardinality": { "field": "age" } }, "state_count": { "cardinality": { "field": "state.keyword" } } } } state的使用它的keyword版
Value count 統計某字段有值的文檔數
POST /bank/_search?size=0 { "aggs" : { "age_count" : { "value_count" : { "field" : "age" } } } }
stats 統計 count max min avg sum 5個值
POST /bank/_search?size=0 { "aggs": { "age_stats": { "stats": { "field": "age" } } } }
Extended stats
高級統計,比stats多4個統計結果: 平方和、方差、標準差、平均值加/減兩個標準差的區間
POST /bank/_search?size=0 { "aggs": { "age_stats": { "extended_stats": { "field": "age" } } }
Percentiles 佔比百分位對應的值統計
對指定字段(腳本)的值按從小到大累計每一個值對應的文檔數的佔比(佔全部命中文檔數的百分比),返回指定佔比比例對應的值。默認返回[ 1, 5, 25, 50, 75, 95, 99 ]分位上的值。以下中間的結果,能夠理解爲:佔比爲50%的文檔的age值 <= 31,或反過來:age<=31的文檔數佔總命中文檔數的50%
POST /bank/_search?size=0 { "aggs": { "age_percents": { "percentiles": { "field": "age" } } } }
"aggregations": { "age_percents": { "values": { "1.0": 20, "5.0": 21, "25.0": 25, "50.0": 31, "75.0": 35, "95.0": 39, "99.0": 40 } } }
POST /bank/_search?size=0 { "aggs": { "age_percents": { "percentiles": { "field": "age", "percents" : [95, 99, 99.9] } } } } 指定分位值
Percentiles rank 統計值小於等於指定值的文檔佔比
POST /bank/_search?size=0 { "aggs": { "gge_perc_rank": { "percentile_ranks": { "field": "age", "values": [ 25, 30 ] } } } }
"aggregations": { "gge_perc_rank": { "values": { "25.0": 26.1, "30.0": 49.3 } } }
Geo Bounds aggregation 求文檔集中的座標點的範圍
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-geobounds-aggregation.html
Geo Centroid aggregation 求中心點座標值
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-geocentroid-aggregation.html
Terms Aggregation 根據字段值項分組聚合
POST /bank/_search?size=0 { "aggs": { "age_terms": { "terms": { "field": "age" } } } }
"aggregations": { "age_terms": { "doc_count_error_upper_bound": 0, //文檔計數的最大誤差值 "sum_other_doc_count": 463, //未返回的其餘項的文檔數 "buckets": [ //默認狀況下返回按文檔計數從高到低的前10個分組 { "key": 31, "doc_count": 61 }, { "key": 39, "doc_count": 60 }, { "key": 26, "doc_count": 59 }, …. ] } }
size 指定返回多少個分組
POST /bank/_search?size=0 { "aggs": { "age_terms": { "terms": { "field": "age", "size": 20 } } }}
POST /bank/_search?size=0 { "aggs": { "age_terms": { "terms": { "field": "age", "size": 5, "shard_size":20 //shard_size 指定每一個分片上返回多少個分組 } } }} shard_size 的默認值爲: 索引只有一個分片:= size 多分片:= size * 1.5 + 10
POST /bank/_search?size=0 { "aggs": { "age_terms": { "terms": { "field": "age", "size": 5, //每一個分組上顯示誤差值 "shard_size":20, "show_term_doc_count_error": true } } }}
order 指定分組的排序
POST /bank/_search?size=0 { "aggs": { "age_terms": { "terms": { "field": "age", "order" : { "_count" : "asc" } //根據文檔計數排序 } } } }
POST /bank/_search?size=0 { "aggs": { "age_terms": { "terms": { "field": "age", "order" : { "_key" : "asc" } //根據分組值排序 } } } }
取分組指標值
POST /bank/_search?size=0 { "aggs": { "age_terms": { "terms": { "field": "age", "order": { "max_balance": "asc" } }, "aggs": { "max_balance": { "max": { "field": "balance" } }, "min_balance": { "min": { "field": "balance" } } } } }}
根據分組指標值排序
POST /bank/_search?size=0 { "aggs": { "age_terms": { "terms": { "field": "age", "order": { "max_balance": "asc" } }, "aggs": { "max_balance": { "max": { "field": "balance" } } } } }}
POST /bank/_search?size=0 { "aggs": { "age_terms": { "terms": { "field": "age", "order": { "stats_balance.max": "asc" } }, "aggs": { "stats_balance": { "stats": { "field": "balance" } } } } }}
篩選分組
POST /bank/_search?size=0 { "aggs": { "age_terms": { "terms": { "field": "age", "min_doc_count": 60 //用文檔計數來篩選 } } } }
POST /bank/_search?size=0 { "aggs": { "age_terms": { "terms": { "field": "age", "include": [20,24] //篩選指定的值列表 } } } }
GET /_search { "aggs" : { "tags" : { "terms" : { "field" : "tags", "include" : ".*sport.*", "exclude" : "water_.*" //正則表達式匹配值 } } } }
GET /_search { "aggs" : { "JapaneseCars" : { "terms" : { "field" : "make", "include" : ["mazda", "honda"] } }, //指定值列表 "ActiveCarManufacturers" : { "terms" : { "field" : "make", "exclude" : ["rover", "jensen"] } } } }
根據腳本計算值分組
GET /_search { "aggs" : { "genres" : { "terms" : { "script" : { "source": "doc['genre'].value", "lang": "painless" } } } } }
缺失值處理
GET /_search { "aggs" : { "tags" : { "terms" : { "field" : "tags", "missing": "N/A" } } } }
filter Aggregation 對知足過濾查詢的文檔進行聚合計算
在查詢命中的文檔中選取複合過濾條件的文檔進行聚合
POST /bank/_search?size=0 { "aggs": { "age_terms": { "filter": {"match":{"gender":"F"}}, "aggs": { "avg_age": { "avg": { "field": "age" } } } } } }
Filters Aggregation 多個過濾組聚合計算
PUT /logs/_doc/_bulk?refresh { "index" : { "_id" : 1 } } { "body" : "warning: page could not be rendered" } { "index" : { "_id" : 2 } } { "body" : "authentication error" } { "index" : { "_id" : 3 } } { "body" : "warning: connection timed out" } GET logs/_search { "size": 0, "aggs" : { "messages" : { "filters" : { "filters" : { "errors" : { "match" : { "body" : "error" }}, "warnings" : { "match" : { "body" : "warning" }} } } } }}
GET logs/_search { "size": 0, "aggs" : { "messages" : { "filters" : { "other_bucket_key": "other_messages", "filters" : { "errors" : { "match" : { "body" : "error" }}, "warnings" : { "match" : { "body" : "warning" }} } } //爲其餘值組指定key } } }
Range Aggregation 範圍分組聚合
POST /bank/_search?size=0 { "aggs": { "age_range": { "range": { "field": "age", "ranges": [ {"to":25}, {"from": 25,"to": 35}, {"from": 35} ] }, "aggs": { "bmax": { "max": { "field": "balance" } } } } }}
POST /bank/_search?size=0 { "aggs": { "age_range": { "range": { "field": "age", "keyed": true, "ranges": [ {"to":25,"key": "Ld"}, {"from": 25,"to": 35,"key": "Md"}, {"from": 35,"key": "Od"} ] } } //爲組指定key } }
Date Range Aggregation 時間範圍分組聚合
POST /sales/_search?size=0 { "aggs": { "range": { "date_range": { "field": "date", "format": "MM-yyy", "ranges": [ { "to": "now-10M/M" }, { "from": "now-10M/M" } ] } } } }
Date Histogram Aggregation 時間直方圖(柱狀)聚合
就是按天、月、年等進行聚合統計。可按 year (1y), quarter (1q), month (1M), week (1w), day (1d), hour (1h), minute (1m), second (1s) 間隔聚合或指定的時間間隔聚合。
POST /sales/_search?size=0 { "aggs" : { "sales_over_time" : { "date_histogram" : { "field" : "date", "interval" : "month" } } } }
POST /sales/_search?size=0 { "aggs" : { "sales_over_time" : { "date_histogram" : { "field" : "date", "interval" : "90m" } } } }
Missing Aggregation 缺失值的桶聚合
缺失指定字段值的文檔做爲一個桶進行聚合分析
POST /bank/_search?size=0 { "aggs" : { "account_without_a_age" : { "missing" : { "field" : "age" } } } }
Geo Distance Aggregation 地理距離分區聚合
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-geodistance-aggregation.html