elasticsearch系列六:聚合分析(聚合分析簡介、指標聚合、桶聚合)

1、聚合分析簡介

 1. ES聚合分析是什麼?

聚合分析是數據庫中重要的功能特性,完成對一個查詢的數據集中數據的聚合計算,如:找出某字段(或計算表達式的結果)的最大值、最小值,計算和、平均值等。ES做爲搜索引擎兼數據庫,一樣提供了強大的聚合分析能力。html

對一個數據集求最大、最小、和、平均值等指標的聚合,在ES中稱爲指標聚合   metricgit

而關係型數據庫中除了有聚合函數外,還能夠對查詢出的數據進行分組group by,再在組上進行指標聚合。在 ES 中group by 稱爲分桶桶聚合 bucketing正則表達式

ES中還提供了矩陣聚合(matrix)、管道聚合(pipleline),但還在完善中。 數據庫

 2. ES聚合分析查詢的寫法

 在查詢請求體中以aggregations節點按以下語法定義聚合分析:app

"aggregations" : {
    "<aggregation_name>" : { <!--聚合的名字 -->
        "<aggregation_type>" : { <!--聚合的類型 -->
            <aggregation_body> <!--聚合體:對哪些字段進行聚合 -->
        }
        [,"meta" : {  [<meta_data_body>] } ]? <!---->
        [,"aggregations" : { [<sub_aggregation>]+ } ]? <!--在聚合裏面在定義子聚合 -->
    }
    [,"<aggregation_name_2>" : { ... } ]*<!--聚合的名字 -->
}

 說明:less

aggregations 也可簡寫爲 aggselasticsearch

 3. 聚合分析的值來源

聚合計算的值能夠取字段的值,也但是腳本計算的結果ide

2、指標聚合

1. max min sum avg

示例1:查詢全部客戶中餘額的最大值函數

POST /bank/_search?
{
  "size": 0, 
  "aggs": {
    "masssbalance": {
      "max": {
        "field": "balance"
      }
    }
  }
}

 結果1:ui

{
  "took": 2080,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": { "masssbalance": { "value": 49989 } }
}

示例2:查詢年齡爲24歲的客戶中的餘額最大值

POST /bank/_search?
{
  "size": 2, 
  "query": { "match": { "age": 24 } },
  "sort": [
    {
      "balance": {
        "order": "desc"
      }
    }
  ],
  "aggs": {
    "max_balance": {
      "max": {
        "field": "balance"
      }
    }
  }
}

 結果2:

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 42,
    "max_score": null,
    "hits": [
      {
        "_index": "bank",
        "_type": "_doc",
        "_id": "697",
        "_score": null,
        "_source": {
          "account_number": 697,
          "balance": 48745,
          "firstname": "Mallory",
          "lastname": "Emerson",
          "age": 24,
          "gender": "F",
          "address": "318 Dunne Court",
          "employer": "Exoplode",
          "email": "malloryemerson@exoplode.com",
          "city": "Montura",
          "state": "LA"
        },
        "sort": [
          48745
        ]
      },
      {
        "_index": "bank",
        "_type": "_doc",
        "_id": "917",
        "_score": null,
        "_source": {
          "account_number": 917,
          "balance": 47782,
          "firstname": "Parks",
          "lastname": "Hurst",
          "age": 24,
          "gender": "M",
          "address": "933 Cozine Avenue",
          "employer": "Pyramis",
          "email": "parkshurst@pyramis.com",
          "city": "Lindcove",
          "state": "GA"
        },
        "sort": [
          47782
        ]
      }
    ]
  },
  "aggregations": { "max_balance": { "value": 48745 } }
}

 示例3:值來源於腳本,查詢全部客戶的平均年齡是多少,並對平均年齡加10

POST /bank/_search?size=0
{
  "aggs": {
    "avg_age": {
      "avg": {
        "script": {
          "source": "doc.age.value"
        }
      }
    },
    "avg_age10": {
      "avg": {
        "script": {
          "source": "doc.age.value + 10"
        }
      }
    }
  }
}

 結果3:

{
  "took": 86,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": { "avg_age": { "value": 30.171 }, "avg_age10": { "value": 40.171 } }
}

 示例4:指定field,在腳本中用_value 取字段的值

POST /bank/_search?size=0
{
  "aggs": {
    "sum_balance": {
      "sum": {
        "field": "balance",
        "script": {
            "source": "_value * 1.03"
        }
      }
    }
  }
}

 結果4:

{
  "took": 165,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": { "sum_balance": { "value": 26486282.11 } }
}

 示例5:爲沒有值字段指定值。如未指定,缺失該字段值的文檔將被忽略。

POST /bank/_search?size=0
{
  "aggs": {
    "avg_age": {
      "avg": {
        "field": "age",
        "missing": 18
      }
    }
  }
}

 2. 文檔計數 count

 示例1:統計銀行索引bank下年齡爲24的文檔數量

POST /bank/_doc/_count
{
  "query": {
    "match": {
      "age" : 24
    }
  }
}

 結果1:

{
  "count": 42,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  }
}

 3. Value count 統計某字段有值的文檔數

示例1:

POST /bank/_search?size=0
{
  "aggs": {
    "age_count": {
      "value_count": {
        "field": "age"
      }
    }
  }
}

 結果1:

{
  "took": 2022,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": { "age_count": { "value": 1000 } }
}

 4. cardinality  值去重計數

示例1:

POST /bank/_search?size=0
{
  "aggs": {
    "age_count": {
      "cardinality": {
        "field": "age"
      }
    },
    "state_count": {
      "cardinality": {
        "field": "state.keyword"
      }
    }
  }
}

 說明:state的使用它的keyword版

 結果1:

{
  "took": 2074,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": { "state_count": { "value": 51 }, "age_count": { "value": 21 } }
}

 5. stats 統計 count max min avg sum 5個值

 示例1:

POST /bank/_search?size=0
{
  "aggs": {
    "age_stats": {
      "stats": {
        "field": "age"
      }
    }
  }
}

 結果1:

{
  "took": 7,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": { "age_stats": { "count": 1000, "min": 20, "max": 40, "avg": 30.171, "sum": 30171 } }
}

 6. Extended stats

高級統計,比stats多4個統計結果: 平方和、方差、標準差、平均值加/減兩個標準差的區間

 示例1:

POST /bank/_search?size=0
{
  "aggs": {
    "age_stats": {
      "extended_stats": {
        "field": "age"
      }
    }
  }
}

 結果1:

{
  "took": 7,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": { "age_stats": { "count": 1000, "min": 20, "max": 40, "avg": 30.171, "sum": 30171, "sum_of_squares": 946393, "variance": 36.10375899999996, "std_deviation": 6.008640362012022, "std_deviation_bounds": { "upper": 42.18828072402404, "lower": 18.153719275975956 } } }
}

 7. Percentiles 佔比百分位對應的值統計

對指定字段(腳本)的值按從小到大累計每一個值對應的文檔數的佔比(佔全部命中文檔數的百分比),返回指定佔比比例對應的值。默認返回[ 1, 5, 25, 50, 75, 95, 99 ]分位上的值。以下中間的結果,能夠理解爲:佔比爲50%的文檔的age值 <= 31,或反過來:age<=31的文檔數佔總命中文檔數的50%

 示例1:

POST /bank/_search?size=0
{
  "aggs": {
    "age_percents": {
      "percentiles": {
        "field": "age"
      }
    }
  }
}

結果1:

{
  "took": 87,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
 "aggregations": { "age_percents": { "values": { "1.0": 20, "5.0": 21, "25.0": 25, "50.0": 31, "75.0": 35.00000000000001, "95.0": 39, "99.0": 40 } } }
}

 結果說明:

佔比爲50%的文檔的age值 <= 31,或反過來:age<=31的文檔數佔總命中文檔數的50%

 示例2:指定分位值

POST /bank/_search?size=0
{
  "aggs": {
    "age_percents": {
      "percentiles": {
        "field": "age",
        "percents" : [95, 99, 99.9] 
      }
    }
  }
}

 結果2:

{
  "took": 8,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": { "age_percents": { "values": { "95.0": 39, "99.0": 40, "99.9": 40 } } }
}

 8. Percentiles rank 統計值小於等於指定值的文檔佔比

 示例1:統計年齡小於25和30的文檔的佔比,和第7項相反

POST /bank/_search?size=0
{
  "aggs": {
    "gge_perc_rank": {
      "percentile_ranks": {
        "field": "age",
        "values": [
          25,
          30
        ]
      }
    }
  }
}

結果2:

{
  "took": 8,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": { "gge_perc_rank": { "values": { "25.0": 26.1, "30.0": 49.2 } } }
}

 結果說明:年齡小於25的文檔佔比爲26.1%,年齡小於30的文檔佔比爲49.2%,

 9. Geo Bounds aggregation 求文檔集中的地理位置座標點的範圍

參考官網連接:

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-geobounds-aggregation.html

10. Geo Centroid aggregation  求地理位置中心點座標值

參考官網連接:

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-geocentroid-aggregation.html

3、桶聚合

 

1. Terms Aggregation  根據字段值項分組聚合 

 示例1:

POST /bank/_search?size=0
{
  "aggs": {
    "age_terms": {
      "terms": {
        "field": "age"
      }
    }
  }
}

 結果1:

{
  "took": 2000,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "age_terms": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 463,
      "buckets": [
        {
          "key": 31,
          "doc_count": 61
        },
        {
          "key": 39,
          "doc_count": 60
        },
        {
          "key": 26,
          "doc_count": 59
        },
        {
          "key": 32,
          "doc_count": 52
        },
        {
          "key": 35,
          "doc_count": 52
        },
        {
          "key": 36,
          "doc_count": 52
        },
        {
          "key": 22,
          "doc_count": 51
        },
        {
          "key": 28,
          "doc_count": 51
        },
        {
          "key": 33,
          "doc_count": 50
        },
        {
          "key": 34,
          "doc_count": 49
        }
      ]
    }
  }
}

 結果說明:

"doc_count_error_upper_bound": 0:文檔計數的最大誤差值

"sum_other_doc_count": 463:未返回的其餘項的文檔數

默認狀況下返回按文檔計數從高到低的前10個分組:

 "buckets": [
        {
          "key": 31,
          "doc_count": 61
        },
        {
          "key": 39,
          "doc_count": 60
        },
    .............
]

 年齡爲31的文檔有61個,年齡爲39的文檔有60個

 size 指定返回多少個分組:

示例2:指定返回20個分組

POST /bank/_search?size=0
{
  "aggs": {
    "age_terms": {
      "terms": {
        "field": "age",
        "size": 20
      }
    }
  }
}

 結果2:

{
  "took": 9,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "age_terms": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 35,
      "buckets": [
        {
          "key": 31,
          "doc_count": 61
        },
        {
          "key": 39,
          "doc_count": 60
        },
        {
          "key": 26,
          "doc_count": 59
        },
        {
          "key": 32,
          "doc_count": 52
        },
        {
          "key": 35,
          "doc_count": 52
        },
        {
          "key": 36,
          "doc_count": 52
        },
        {
          "key": 22,
          "doc_count": 51
        },
        {
          "key": 28,
          "doc_count": 51
        },
        {
          "key": 33,
          "doc_count": 50
        },
        {
          "key": 34,
          "doc_count": 49
        },
        {
          "key": 30,
          "doc_count": 47
        },
        {
          "key": 21,
          "doc_count": 46
        },
        {
          "key": 40,
          "doc_count": 45
        },
        {
          "key": 20,
          "doc_count": 44
        },
        {
          "key": 23,
          "doc_count": 42
        },
        {
          "key": 24,
          "doc_count": 42
        },
        {
          "key": 25,
          "doc_count": 42
        },
        {
          "key": 37,
          "doc_count": 42
        },
        {
          "key": 27,
          "doc_count": 39
        },
        {
          "key": 38,
          "doc_count": 39
        }
      ]
    }
  }
}
View Code

 示例3:每一個分組上顯示誤差值

POST /bank/_search?size=0
{
  "aggs": {
    "age_terms": {
      "terms": {
        "field": "age",
        "size": 5,
        "shard_size": 20,
        "show_term_doc_count_error": true
      }
    }
  }
}

 結果3:

{
  "took": 8,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "age_terms": {
      "doc_count_error_upper_bound": 25,
      "sum_other_doc_count": 716,
      "buckets": [
        {
          "key": 31,
          "doc_count": 61,
          "doc_count_error_upper_bound": 0
        },
        {
          "key": 39,
          "doc_count": 60,
          "doc_count_error_upper_bound": 0
        },
        {
          "key": 26,
          "doc_count": 59,
          "doc_count_error_upper_bound": 0
        },
        {
          "key": 32,
          "doc_count": 52,
          "doc_count_error_upper_bound": 0
        },
        {
          "key": 36,
          "doc_count": 52,
          "doc_count_error_upper_bound": 0
        }
      ]
    }
  }
}

 示例4:shard_size 指定每一個分片上返回多少個分組

shard_size 的默認值爲:
索引只有一個分片:= size
多分片:= size * 1.5 + 10

POST /bank/_search?size=0
{
  "aggs": {
    "age_terms": {
      "terms": {
        "field": "age",
        "size": 5,
        "shard_size": 20
      }
    }
  }
}

 結果4:

{
  "took": 8,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "age_terms": {
      "doc_count_error_upper_bound": 25,
      "sum_other_doc_count": 716,
      "buckets": [
        {
          "key": 31,
          "doc_count": 61
        },
        {
          "key": 39,
          "doc_count": 60
        },
        {
          "key": 26,
          "doc_count": 59
        },
        {
          "key": 32,
          "doc_count": 52
        },
        {
          "key": 36,
          "doc_count": 52
        }
      ]
    }
  }
}

 order  指定分組的排序

 示例5:根據文檔計數排序

POST /bank/_search?size=0
{
  "aggs": {
    "age_terms": {
      "terms": {
        "field": "age",
        "order" : { "_count" : "asc" }
      }
    }
  }
}

 結果5:

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "age_terms": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 584,
      "buckets": [
        {
          "key": 29,
          "doc_count": 35
        },
        {
          "key": 27,
          "doc_count": 39
        },
        {
          "key": 38,
          "doc_count": 39
        },
        {
          "key": 23,
          "doc_count": 42
        },
        {
          "key": 24,
          "doc_count": 42
        },
        {
          "key": 25,
          "doc_count": 42
        },
        {
          "key": 37,
          "doc_count": 42
        },
        {
          "key": 20,
          "doc_count": 44
        },
        {
          "key": 40,
          "doc_count": 45
        },
        {
          "key": 21,
          "doc_count": 46
        }
      ]
    }
  }
}

 示例6:根據分組值排序

POST /bank/_search?size=0
{
  "aggs": {
    "age_terms": {
      "terms": {
        "field": "age",
        "order" : { "_key" : "asc" }
      }
    }
  }
}

 結果6:

{
  "took": 10,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "age_terms": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 549,
      "buckets": [
        {
          "key": 20,
          "doc_count": 44
        },
        {
          "key": 21,
          "doc_count": 46
        },
        {
          "key": 22,
          "doc_count": 51
        },
        {
          "key": 23,
          "doc_count": 42
        },
        {
          "key": 24,
          "doc_count": 42
        },
        {
          "key": 25,
          "doc_count": 42
        },
        {
          "key": 26,
          "doc_count": 59
        },
        {
          "key": 27,
          "doc_count": 39
        },
        {
          "key": 28,
          "doc_count": 51
        },
        {
          "key": 29,
          "doc_count": 35
        }
      ]
    }
  }
}

示例7:取分組指標值排序

POST /bank/_search?size=0
{
  "aggs": {
    "age_terms": {
      "terms": {
        "field": "age",
        "order": { "max_balance": "asc" }
      },
      "aggs": {
        "max_balance": {
          "max": {
            "field": "balance"
          }
        },
        "min_balance": {
          "min": {
            "field": "balance"
          }
        }
      }
    }
  }
}

 結果7:

{
  "took": 28,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "age_terms": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 511,
      "buckets": [
        {
          "key": 27,
          "doc_count": 39,
          "min_balance": {
            "value": 1110
          },
          "max_balance": {
            "value": 46868
          }
        },
        {
          "key": 39,
          "doc_count": 60,
          "min_balance": {
            "value": 3589
          },
          "max_balance": {
            "value": 47257
          }
        },
        {
          "key": 37,
          "doc_count": 42,
          "min_balance": {
            "value": 1360
          },
          "max_balance": {
            "value": 47546
          }
        },
        {
          "key": 32,
          "doc_count": 52,
          "min_balance": {
            "value": 1031
          },
          "max_balance": {
            "value": 48294
          }
        },
        {
          "key": 26,
          "doc_count": 59,
          "min_balance": {
            "value": 1447
          },
          "max_balance": {
            "value": 48466
          }
        },
        {
          "key": 33,
          "doc_count": 50,
          "min_balance": {
            "value": 1314
          },
          "max_balance": {
            "value": 48734
          }
        },
        {
          "key": 24,
          "doc_count": 42,
          "min_balance": {
            "value": 1011
          },
          "max_balance": {
            "value": 48745
          }
        },
        {
          "key": 31,
          "doc_count": 61,
          "min_balance": {
            "value": 2384
          },
          "max_balance": {
            "value": 48758
          }
        },
        {
          "key": 34,
          "doc_count": 49,
          "min_balance": {
            "value": 3001
          },
          "max_balance": {
            "value": 48997
          }
        },
        {
          "key": 29,
          "doc_count": 35,
          "min_balance": {
            "value": 3596
          },
          "max_balance": {
            "value": 49119
          }
        }
      ]
    }
  }
}
View Code

 示例8:篩選分組-正則表達式匹配值

GET /_search
{
    "aggs" : {
        "tags" : {
            "terms" : {
                "field" : "tags",
                "include" : ".*sport.*", "exclude" : "water_.*"
            }
        }
    }
}

 示例9:篩選分組-指定值列表

GET /_search
{
    "aggs" : {
        "JapaneseCars" : {
             "terms" : {
                 "field" : "make",
                 "include" : ["mazda", "honda"]
             }
         },
        "ActiveCarManufacturers" : {
             "terms" : {
                 "field" : "make",
                 "exclude" : ["rover", "jensen"]
             }
         }
    }
}

 示例10:根據腳本計算值分組

GET /_search
{
    "aggs" : {
        "genres" : {
            "terms" : {
                "script" : { "source": "doc['genre'].value", "lang": "painless" }
            }
        }
    }
}

 示例1:缺失值處理

GET /_search
{
    "aggs" : {
        "tags" : {
             "terms" : {
                 "field" : "tags",
                 "missing": "N/A" 
             }
         }
    }
}

 結果10:

{
  "took": 2059,
  "timed_out": false,
  "_shards": {
    "total": 58,
    "successful": 58,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1015,
    "max_score": 1,
    "hits": [
      {
        "_index": "bank",
        "_type": "_doc",
        "_id": "25",
        "_score": 1,
        "_source": {
          "account_number": 25,
          "balance": 40540,
          "firstname": "Virginia",
          "lastname": "Ayala",
          "age": 39,
          "gender": "F",
          "address": "171 Putnam Avenue",
          "employer": "Filodyne",
          "email": "virginiaayala@filodyne.com",
          "city": "Nicholson",
          "state": "PA"
        }
      },
      {
        "_index": "bank",
        "_type": "_doc",
        "_id": "44",
        "_score": 1,
        "_source": {
          "account_number": 44,
          "balance": 34487,
          "firstname": "Aurelia",
          "lastname": "Harding",
          "age": 37,
          "gender": "M",
          "address": "502 Baycliff Terrace",
          "employer": "Orbalix",
          "email": "aureliaharding@orbalix.com",
          "city": "Yardville",
          "state": "DE"
        }
      },
      {
        "_index": "bank",
        "_type": "_doc",
        "_id": "99",
        "_score": 1,
        "_source": {
          "account_number": 99,
          "balance": 47159,
          "firstname": "Ratliff",
          "lastname": "Heath",
          "age": 39,
          "gender": "F",
          "address": "806 Rockwell Place",
          "employer": "Zappix",
          "email": "ratliffheath@zappix.com",
          "city": "Shaft",
          "state": "ND"
        }
      },
      {
        "_index": "bank",
        "_type": "_doc",
        "_id": "119",
        "_score": 1,
        "_source": {
          "account_number": 119,
          "balance": 49222,
          "firstname": "Laverne",
          "lastname": "Johnson",
          "age": 28,
          "gender": "F",
          "address": "302 Howard Place",
          "employer": "Senmei",
          "email": "lavernejohnson@senmei.com",
          "city": "Herlong",
          "state": "DC"
        }
      },
      {
        "_index": "bank",
        "_type": "_doc",
        "_id": "126",
        "_score": 1,
        "_source": {
          "account_number": 126,
          "balance": 3607,
          "firstname": "Effie",
          "lastname": "Gates",
          "age": 39,
          "gender": "F",
          "address": "620 National Drive",
          "employer": "Digitalus",
          "email": "effiegates@digitalus.com",
          "city": "Blodgett",
          "state": "MD"
        }
      },
      {
        "_index": "bank",
        "_type": "_doc",
        "_id": "145",
        "_score": 1,
        "_source": {
          "account_number": 145,
          "balance": 47406,
          "firstname": "Rowena",
          "lastname": "Wilkinson",
          "age": 32,
          "gender": "M",
          "address": "891 Elton Street",
          "employer": "Asimiline",
          "email": "rowenawilkinson@asimiline.com",
          "city": "Ripley",
          "state": "NH"
        }
      },
      {
        "_index": "bank",
        "_type": "_doc",
        "_id": "183",
        "_score": 1,
        "_source": {
          "account_number": 183,
          "balance": 14223,
          "firstname": "Hudson",
          "lastname": "English",
          "age": 26,
          "gender": "F",
          "address": "823 Herkimer Place",
          "employer": "Xinware",
          "email": "hudsonenglish@xinware.com",
          "city": "Robbins",
          "state": "ND"
        }
      },
      {
        "_index": "bank",
        "_type": "_doc",
        "_id": "190",
        "_score": 1,
        "_source": {
          "account_number": 190,
          "balance": 3150,
          "firstname": "Blake",
          "lastname": "Davidson",
          "age": 30,
          "gender": "F",
          "address": "636 Diamond Street",
          "employer": "Quantasis",
          "email": "blakedavidson@quantasis.com",
          "city": "Crumpler",
          "state": "KY"
        }
      },
      {
        "_index": "bank",
        "_type": "_doc",
        "_id": "208",
        "_score": 1,
        "_source": {
          "account_number": 208,
          "balance": 40760,
          "firstname": "Garcia",
          "lastname": "Hess",
          "age": 26,
          "gender": "F",
          "address": "810 Nostrand Avenue",
          "employer": "Quiltigen",
          "email": "garciahess@quiltigen.com",
          "city": "Brooktrails",
          "state": "GA"
        }
      },
      {
        "_index": "bank",
        "_type": "_doc",
        "_id": "222",
        "_score": 1,
        "_source": {
          "account_number": 222,
          "balance": 14764,
          "firstname": "Rachelle",
          "lastname": "Rice",
          "age": 36,
          "gender": "M",
          "address": "333 Narrows Avenue",
          "employer": "Enaut",
          "email": "rachellerice@enaut.com",
          "city": "Wright",
          "state": "AZ"
        }
      }
    ]
  },
  "aggregations": {
    "tags": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "N/A",
          "doc_count": 1014
        },
        {
          "key": "red",
          "doc_count": 1
        }
      ]
    }
  }
}
View Code

2.  filter Aggregation  對知足過濾查詢的文檔進行聚合計算

 在查詢命中的文檔中選取符合過濾條件的文檔進行聚合,先過濾再聚合

示例1:

POST /bank/_search?size=0
{
  "aggs": {
    "age_terms": {
      "filter": {"match":{"gender":"F"}},
      "aggs": {
        "avg_age": {
          "avg": {
            "field": "age"
          }
        }
      }
    }
  }
}

 結果1:

{
  "took": 163,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "age_terms": {
      "doc_count": 493,
      "avg_age": {
        "value": 30.3184584178499
      }
    }
  }
}

 3. Filters Aggregation  多個過濾組聚合計算

示例1:

 準備數據:

PUT /logs/_doc/_bulk?refresh
{"index":{"_id":1}}
{"body":"warning: page could not be rendered"}
{"index":{"_id":2}}
{"body":"authentication error"}
{"index":{"_id":3}}
{"body":"warning: connection timed out"}

獲取組合過濾後聚合的結果:

GET logs/_search
{
  "size": 0,
  "aggs": {
    "messages": {
      "filters": { "filters": { "errors": { "match": { "body": "error" } }, "warnings": { "match": { "body": "warning" } } }
      }
    }
  }
}

 上面的結果:

{
  "took": 18,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0,
    "hits": []
  },
  "aggregations": { "messages": { "buckets": { "errors": { "doc_count": 1 }, "warnings": { "doc_count": 2 } } } }
}

 示例2:爲其餘值組指定key

GET logs/_search
{
  "size": 0,
  "aggs": {
    "messages": {
      "filters": {
        "other_bucket_key": "other_messages",
        "filters": {
          "errors": {
            "match": {
              "body": "error"
            }
          },
          "warnings": {
            "match": {
              "body": "warning"
            }
          }
        }
      }
    }
  }
}

 結果2:

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "messages": {
      "buckets": {
        "errors": {
          "doc_count": 1
        },
        "warnings": {
          "doc_count": 2
        },
        "other_messages": { "doc_count": 0 }
      }
    }
  }
}

 4. Range Aggregation 範圍分組聚合

 示例1:

POST /bank/_search?size=0
{
  "aggs": {
    "age_range": {
      "range": { "field": "age", "ranges": [ { "to": 25 }, { "from": 25, "to": 35 }, { "from": 35 } ] },
      "aggs": {
        "bmax": {
          "max": {
            "field": "balance"
          }
        }
      }
    }
  }
}

 結果1:

{
  "took": 7,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "age_range": {
      "buckets": [
        {
          "key": "*-25.0",
          "to": 25,
          "doc_count": 225,
          "bmax": {
            "value": 49587
          }
        },
        {
          "key": "25.0-35.0",
          "from": 25,
          "to": 35,
          "doc_count": 485,
          "bmax": {
            "value": 49795
          }
        },
        {
          "key": "35.0-*",
          "from": 35,
          "doc_count": 290,
          "bmax": {
            "value": 49989
          }
        }
      ]
    }
  }
}

示例2:爲組指定key

POST /bank/_search?size=0
{
  "aggs": {
    "age_range": {
      "range": {
        "field": "age",
        "keyed": true,
        "ranges": [
          {
            "to": 25,
            "key": "Ld"
          },
          {
            "from": 25,
            "to": 35,
            "key": "Md"
          },
          {
            "from": 35,
            "key": "Od"
          }
        ]
      }
    }
  }
}

結果2:

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "age_range": {
      "buckets": {
        "Ld": {
          "to": 25,
          "doc_count": 225
        },
        "Md": {
          "from": 25,
          "to": 35,
          "doc_count": 485
        },
        "Od": {
          "from": 35,
          "doc_count": 290
        }
      }
    }
  }
}

5. Date Range Aggregation  時間範圍分組聚合

示例1:

POST /bank/_search?size=0
{
  "aggs": {
    "range": {
      "date_range": { "field": "date", "format": "MM-yyy", "ranges": [ { "to": "now-10M/M" }, { "from": "now-10M/M" } ] }
    }
  }
}

結果1:

{
  "took": 115,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "range": {
      "buckets": [
        {
          "key": "*-2017-08-01T00:00:00.000Z",
          "to": 1501545600000,
          "to_as_string": "2017-08-01T00:00:00.000Z",
          "doc_count": 0
        },
        {
          "key": "2017-08-01T00:00:00.000Z-*",
          "from": 1501545600000,
          "from_as_string": "2017-08-01T00:00:00.000Z",
          "doc_count": 0
        }
      ]
    }
  }
}

6. Date Histogram Aggregation  時間直方圖(柱狀)聚合

就是按天、月、年等進行聚合統計。可按 year (1y), quarter (1q), month (1M), week (1w), day (1d), hour (1h), minute (1m), second (1s) 間隔聚合或指定的時間間隔聚合。

示例1:

POST /bank/_search?size=0
{
  "aggs": {
    "sales_over_time": {
      "date_histogram": {
        "field": "date",
        "interval": "month"
      }
    }
  }
}

結果1:

{
  "took": 9,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "sales_over_time": {
      "buckets": []
    }
  }
}

7. Missing Aggregation  缺失值的桶聚合

POST /bank/_search?size=0
{
    "aggs" : {
        "account_without_a_age" : {
            "missing" : { "field" : "age" }
        }
    }
}

8. Geo Distance Aggregation  地理距離分區聚合

參考官網連接:

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-geodistance-aggregation.html

相關文章
相關標籤/搜索