【Elasticsearch學習】之指標聚合(Metrics Aggregation)

時間 2020-06-10

標籤 elasticsearch 學習指標聚合 metrics aggregation 欄目日誌分析简体版

原文原文鏈接

Elasticsearch提供了幾類聚合分析方法，分別爲Bucketing Aggregation分桶聚合、Metrics Aggregation指標聚合，Matrix Aggregation矩陣聚合，Pipleline Aggregation管道聚合。java

1.Metrics Aggregations（指標聚合）git

　　計算一個文檔集合的某些指標。算法

　　1）Avg，單值指標聚合，用於計算從文檔中提取的值的平均值，計算值能夠從文檔中數值類型字段提取，也能夠經過腳本生成。　　　數組

POST /kibana_sample_data_ecommerce/_search?size=0
{
    "aggs" : {
        "avg_price" : { "avg" : { "field" : "taxful_total_price" } }
    }
}

使用腳本：app

POST /kibana_sample_data_ecommerce/_search?size=0
{
 "aggs" : {
        "avg_corrected_grade" : {
            "avg" : {
                "field" : "taxful_total_price",
                "script" : {
                    "lang": "painless",
                    "source": "_value * params.correction",
                    "params" : {
                        "correction" : 1.2
                    }
                }
            }
        }
    }
}

當計算的字段的值缺失時可使用missing參數設置默認補充值。less

　　2）Weighted Avg Aggregations 加權平均值，用於計算從文檔中提取的值的加權平均值。ui

　　　　計算公式： ∑(value * weight) / ∑(weight)spa

　　　　weighted_avg參數：code

　　　　value：提供計算值的字段配置或者腳本。value的參數：field-提取值的字段，missing-配置缺失值的默認值。orm

　　　　weight：提供權重的字段配置或者腳本。weight的參數：field-提供權重的字段，missing-配置缺失值的默認值。

　　　　format：返回的數值的格式。

　　　　value_type：關於純腳本和未映射的字段的提示。

POST /kibana_sample_data_ecommerce/_search
{
    "size": 0,
    "aggs" : {
        "weighted_grade": {
            "weighted_avg": {
                "value": {
                    "field": "taxful_total_price"
                },
                "weight": {
                    "field": "total_quantity"
                }
            }
        }
    }
}

　　3）Cardinality Aggregations 計算某個字段不一樣值的近似數量。是一個近似算法，採用HyperLogLog++算法。

POST /kibana_sample_data_ecommerce/_search?size=0
{
    "aggs" : {
        "type_count" : {
            "cardinality" : {
                "field" : "day_of_week" //計算有多少個不一樣的day_of_week字段值
            }
        }
    }
}

　　由於是近視計算因此存在精度問題，Cadinality提供了precision_threshold參數用來控制精度，採用之內存交換精度的方式，當精度越高使用的內存就越多。precision_threshold定義了一個數值，在計算的字段不一樣值的數量低於precision_threshold時，計算值是接近於準確的；若是數量高於時，精度將會降低。precision_threshold最大支持40000，默認爲3000。

　　3）Max、Min Aggregation 單指標，返回聚合文檔中某個數值字段的最大值、最小值。當計算值大於2^53時，結果可能時近似的。

POST /kibana_sample_data_ecommerce/_search?size=0
{
    "aggs" : {
        "max_price" : { "max" : { "field" : "taxful_total_price" } }
    }
}

POST /kibana_sample_data_ecommerce/_search?size=0
{
    "aggs" : {
        "min_price" : { "min" : { "field" : "taxful_total_price" } }
    }
}

　　4）Sum Aggregation 單指標，將從聚合文檔中提取的數值進行求和。

POST /kibana_sample_data_ecommerce/_search?size=0
{
    "aggs" : {
        "sum_prices" : { "sum" : { "field" : "taxful_total_price" } }
    }
}

　　5）Value Count Aggregation 單指標，計算文檔的個數。

POST /kibana_sample_data_ecommerce/_search?size=0
{
    "aggs" : {
        "types_count" : { "value_count" : { "field" : "day_of_week" } }
    }
}

"aggregations" : {
    "types_count" : {
      "value" : 4675
    }
  }

　　6）Stats Aggregation 多值指標，輸出多個統計結果，統計指標由：min，max，sum，count，avg組成。

POST /kibana_sample_data_ecommerce/_search?size=0
{
    "aggs" : {
        "price_stats" : { "stats" : { "field" : "taxful_total_price" } }
    }
}

　　返回的統計結果：

"aggregations" : {
    "price_stats" : {
      "count" : 4675,
      "min" : 6.98828125,
      "max" : 2250.0,
      "avg" : 75.05542864304813,
      "sum" : 350884.12890625
    }
  }

　7）Extended Stats Aggregation 多值指標，擴展stats的統計值，擴展的了例如sum_of_squares，variance，std_deviation，std_deviation_bound。

POST /kibana_sample_data_ecommerce/_search?size=0
{
    "aggs" : {
        "price_stats" : { "extended_stats" : { "field" : "taxful_total_price" } }
    }
}

　　返回的統計結果：

"aggregations" : {
    "price_stats" : {
      "count" : 4675,
      "min" : 6.98828125,
      "max" : 2250.0,
      "avg" : 75.05542864304813,
      "sum" : 350884.12890625,
      "sum_of_squares" : 3.9367749294174194E7,//平方和
      "variance" : 2787.59157113862, //方差
      "std_deviation" : 52.79764740155209, // 標準差
      "std_deviation_bounds" : {
        "upper" : 180.6507234461523,   //可信區間上限
        "lower" : -30.53986616005605   //可信區間下限
      }
    }
  }

　　默認狀況下，extended_stats會返回標準差的置信區間，若是須要使用不一樣的區間，則能夠定義sigma參數。

　　8）String Stats Aggregation 多值指標，用於統計string類型的值。

　　　　string stats aggregation統計的指標：

　　　　count：非空字段的數量

　　　　min_length：最短的長度

　　　　max_length：最長的長度

　　　　avg_length：平均長度

　　　　entropy：計算全部字符串的信息熵，信息熵量化了字段包含的信息量，用於肯定數據集的屬性，如多樣性、類似性、隨機型。

POST /kibana_sample_data_ecommerce/_search?size=0
{
    "aggs" : {
        "message_stats" : { "string_stats" : { "field" : "customer_last_name.keyword" } }
    }
}

"aggregations" : {
    "message_stats" : {
      "count" : 4675,
      "min_length" : 3,
      "max_length" : 10,
      "avg_length" : 6.134545454545455,
      "entropy" : 4.688464053505158
    }
  }

　　show_distribution參數：查看全部字符的機率分佈。設置 "show_distribution": true便可查看。

　　9）Top Hits Aggregation 用於跟蹤聚合文檔中匹配度最高的文檔，是一個子聚合。top_hits聚合器經過bucket聚合器按某個字段結果集高效的進行分組，能夠設置一個或多個bucket聚合器來決定結果應該被劃分到哪個分組中。

　　參數：

　　from：從何處開始抓取結果

　　size：每一個桶中最大的匹配數量，默認返回前三個。

　　sort：排序。默認根據主查詢的評分來排序。

POST /kibana_sample_data_ecommerce/_search?size=0
{
    "aggs": {
        "top_tags": {
            "terms": {
                "field": "day_of_week", //按照day_of_week進行分組
                "size": 2　　　　　　　　//總共取3個分組
            },
            "aggs": {
                "top_price_hits": {
                    "top_hits": {
                        "sort": [
                            {
                                "taxful_total_price": { //按照taxful_total_price 降序排列
                                    "order": "desc"
                                }
                            }
                        ],
                        "_source": {
                            "includes": [ "day_of_week", "customer_full_name" ,"taxful_total_price"] //返回的字段
                        },
                        "size" : 1 //每一個分組中取幾個文檔
                    }
                }
            }
        }
    }
}

"aggregations" : {
    "top_tags" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 3130,
      "buckets" : [
        {
          "key" : "Thursday", //分桶的key值
          "doc_count" : 775, //分桶中的文檔數量
          "top_price_hits" : {
            "hits" : {
              "total" : {
                "value" : 775,
                "relation" : "eq"
              },
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "kibana_sample_data_ecommerce",
                  "_type" : "_doc",
                  "_id" : "CH-j7XEB-r_IFm6PJzGx",
                  "_score" : null,
                  "_source" : {
                    "customer_full_name" : "Eddie Lambert", 
                    "day_of_week" : "Thursday",
                    "taxful_total_price" : 369.96
                  },
                  "sort" : [
                    370.0
                  ]
                }
              ]
            }
          }
        },
        {
          "key" : "Friday",
          "doc_count" : 770,
          "top_sales_hits" : {
            "hits" : {
              "total" : {
                "value" : 770,
                "relation" : "eq"
              },
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "kibana_sample_data_ecommerce",
                  "_type" : "_doc",
                  "_id" : "I3-j7XEB-r_IFm6PJjB6",
                  "_score" : null,
                  "_source" : {
                    "customer_full_name" : "Sultan Al Bryan",
                    "day_of_week" : "Friday",
                    "taxful_total_price" : 392.96
                  },
                  "sort" : [
                    393.0
                  ]
                }
              ]
            }
          }
        }
      ]
    }
  }

　　10）Geo Bound Aggregation ，計算包含全部地理值的矩形範圍。

PUT /museums
{
    "mappings": {
        "properties": {
            "location": {
                "type": "geo_point"
            }
        }
    }
}


POST /museums/_bulk?refresh
{"index":{"_id":1}}
{"location": "52.374081,4.912350", "name": "NEMO Science Museum"}
{"index":{"_id":2}}
{"location": "52.369219,4.901618", "name": "Museum Het Rembrandthuis"}
{"index":{"_id":3}}
{"location": "52.371667,4.914722", "name": "Nederlands Scheepvaartmuseum"}
{"index":{"_id":4}}
{"location": "51.222900,4.405200", "name": "Letterenhuis"}
{"index":{"_id":5}}
{"location": "48.861111,2.336389", "name": "Musée du Louvre"}
{"index":{"_id":6}}
{"location": "48.860000,2.327000", "name": "Musée d'Orsay"}


POST /museums/_search?size=0
{
    "query" : {
        "match" : { "name" : "musée" }
    },
    "aggs" : {
        "viewport" : {
            "geo_bounds" : {
                "field" : "location", 　　//用於獲取範圍的字段
                "wrap_longitude" : true 　//是否容許邊界和國際日期線重疊
            }
        }
    }
}

"aggregations" : {
    "viewport" : {
      "bounds" : {
        "top_left" : {
          "lat" : 48.86111099738628, //緯度
          "lon" : 2.3269999679178　　//經度
        },
        "bottom_right" : {
          "lat" : 48.85999997612089,
          "lon" : 2.3363889567553997
        }
      }
    }
  }

　　11）Geo Centroid Aggregation，計算聚合文檔的地理值的大概的中心點。

POST /museums/_search?size=0
{
    "aggs" : {
        "centroid" : {
            "geo_centroid" : {
                "field" : "location" 
            }
        }
    }
}

　　返回中心點：

"aggregations" : {
    "centroid" : {
      "location" : {
        "lat" : 51.00982965203002,
        "lon" : 3.9662131341174245
      },
      "count" : 6
    }
  }

　　12）Percentiles Aggregation，多值聚合，百分位聚合，用於計算文檔中數值字段的一個或多個百分點。一般用於尋找異常值。

GET kibana_sample_data_ecommerce/_search
{
    "size": 0,
    "aggs" : {
        "quantity_time_outlier" : {
            "percentiles" : {
                "field" : "total_quantity" 
            }
        }
    }
}

"aggregations" : {
    "quantity_time_outlier" : {
      "values" : {
        "1.0" : 1.0, //1%的數量爲 1
        "5.0" : 2.0, //5%的數量爲 2
        "25.0" : 2.0,
        "50.0" : 2.0,
        "75.0" : 2.0,
        "95.0" : 4.0,
        "99.0" : 4.0
      }
    }
  }

　　可以使用percents指定想要計算的百分比。　　

GET kibana_sample_data_ecommerce/_search
{
    "size": 0,
    "aggs" : {
        "quantity_time_outlier" : {
            "percentiles" : {
                "field" : "total_quantity",
                "percents" : [5, 80, 95] 
            }
        }
    }
}

　　13）Percentile Ranks Aggregation 多值指標，百分比分級聚合，用於統計數值低於某個肯定值的百分比數。

GET kibana_sample_data_ecommerce/_search
{
    "size": 0,
    "aggs" : {
        "quantity_time_ranks" : {
            "percentile_ranks" : {
                "field" : "total_quantity", 
                "values" : [3, 4]
            }
        }
    }
}

"aggregations" : {
    "quantity_time_ranks" : {
      "values" : {
        "3.0" : 91.18716577540107, 　//小於3的數量有91%多
        "4.0" : 99.78609625668449　　//小於4的數量有99%多
      }
    }
  }

　　14）Scripted Aggregation，經過腳本提供指標輸出。

　　腳本指標聚合經過執行4步進行：

　　1.init_script：在提取文檔集合前執行。運行設置任何的初始state。

　　2.map_script：在每一個文檔被採集前執行一次。

　　3.combine_script：當文檔採集完畢後在每一個分片上執行一次，能夠合併從各個分片返回的state。

　　4.reduce_script：當全部分片返回結果後，在協調節點上執行一次，提供對combine_script返回的state的訪問。

　　其餘的參數：params，能夠做爲init_script、map_script、combine_script的參數

　　腳本返回的對象或者存儲在state中的對象只能是以下幾種類型：原生ES類型、String、Map、Array。

舉例：

PUT /transactions/_bulk?refresh
{"index":{"_id":1}}
{"type": "sale","amount": 80}
{"index":{"_id":2}}
{"type": "cost","amount": 10}
{"index":{"_id":3}}
{"type": "cost","amount": 30}
{"index":{"_id":4}}
{"type": "sale","amount": 130}

　　init_script以前：state是空對象{}

　　init_script在每一個文檔執行一次初始化操做，因此在每一個分片上會有一個state副本：

　　分片A :

　　"state" : { 　　"transactions" : [] 　　}
　分片B:
　"state" : {
　　
　　"transactions" : []
 }
　每一個分片採集分片上的文檔並在每一個文檔上執行map_script
　分片A上的state變爲：

　　"state" : { 　　 "transactions" : [ 80, -30 ] 　　}

　分片B上的state變爲：

　　"state" : { 　　"transactions" : [ -10, 130 ] 　　}

　　combine_script在每一個分片採集文檔完畢後執行，減小states中全部的transactions爲一個總數，該總數被傳回給coordinate節點。
　分片A:50 分片B:120
　　reduce_script接受一個包含combine script結果的數組，如"states" : [50,120]，reduce_script將數組中的值相加獲得最終返回的聚合值。

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。