Elastic Stack 筆記（七）Elasticsearch5.6 聚合分析

時間 2019-11-06

標籤 elastic stack 筆記 elasticsearch5.6 elasticsearch 聚合分析欄目日誌分析简体版

原文原文鏈接

博客地址：http://www.moonxy.comjavascript

1、前言html

Elasticsearch 是一個分佈式的全文搜索引擎，索引和搜索是 Elasticsarch 的基本功能。同時，Elasticsearch 的聚合（Aggregations）功能也時分強大，容許在數據上作複雜的分析統計。ES 提供的聚合分析功能主要有指標聚合、桶聚合、管道聚合和矩陣聚合。須要主要掌握的是前兩個，即指標聚合和桶聚合。java

聚合分析的官方文檔：Aggregationsnode

2、聚合分析python

2.1 指標聚合編程

指標聚合官網文檔：Metricelasticsearch

指標聚合中主要包括 min、max、sum、avg、stats、extended_stats、value_count 等聚合，至關於 SQL 中的聚合函數。編程語言

指標聚合中包括以下聚合：分佈式

Aggregations that keep track and compute metrics over a set of documents.ide

在一組文檔中跟蹤和計算度量的聚合。以下以 max 聚合爲例：

Max Aggregation

max 聚合官網文檔：Max Aggregation

max 聚合用於最大值統計，與 SQL 中的聚合函數 max() 的做用相似，其中 "max_price" 爲自定義的聚合名稱。

##Max Aggregation
GET books/_search
{
  "size": 0, 
  "aggs": {
    "max_price": {
      "max":  {
        "field": "price"
      }
    }
  }
}

返回結果以下：

{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 5,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "max_price": {
      "value": 81.4
    }
  }
}

Cardinality Aggregation

基數統計聚合官網文檔：Cardinality Aggregation

Cardinality Aggregation 用於基數查詢，其做用是先執行相似 SQL 中的 distinct 操做，去掉集合中的重複項，而後統計排重後的集合長度。

##Cardinality Aggregation
GET books/_search
{
  "size": 0, 
  "aggs": {
    "all_language": {
      "cardinality":  {
        "field": "language"
      }
    }
  }
}

返回結果以下：

{
  "took": 41,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 5,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "all_language": {
      "value": 3
    }
  }
}

Stats Aggregation

基本統計聚合官網文檔：Stats Aggregation

Stats Aggregation 用於基本統計，會一次返回 count、max、min、avg 和 sum 這 5 個指標。以下：

##Stats Aggregation
GET books/_search
{
  "size": 0, 
  "aggs": {
    "stats_pirce": {
      "stats":  {
        "field": "price"
      }
    }
  }
}

返回結果以下：

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 5,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "stats_pirce": {
      "count": 5,
      "min": 46.5,
      "max": 81.4,
      "avg": 63.8,
      "sum": 319
    }
  }
}

Extended Stats Aggregation

高級統計聚合官網文檔：Extended Stats Aggregation

用於高級統計，和基本統計功能相似，可是會比基本統計多4個統計結果：平方和、方差、標準差、平均值加/減兩個標準差的區間。

##Extended Stats Aggregation
GET books/_search
{
  "size": 0, 
  "aggs": {
    "extend_stats_pirce": {
      "extended_stats":  {
        "field": "price"
      }
    }
  }
}

返回響應結果：

{
  "took": 14,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 5,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "extend_stats_pirce": {
      "count": 5,
      "min": 46.5,
      "max": 81.4,
      "avg": 63.8,
      "sum": 319,
      "sum_of_squares": 21095.46,
      "variance": 148.65199999999967,
      "std_deviation": 12.19229264740638,
      "std_deviation_bounds": {
        "upper": 88.18458529481276,
        "lower": 39.41541470518724
      }
    }
  }
}

Value Count Aggregation

文檔數量聚合官網文檔：Value Count Aggregation

Value Count Aggregation 可按字段統計文檔數量。

##Value Count Aggregation
GET books/_search
{
  "size": 0, 
  "aggs": {
    "doc_count": {
      "value_count":  {
        "field": "author"
      }
    }
  }
}

返回結果以下：

{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 5,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "doc_count": {
      "value": 5
    }
  }
}

注意：

text 類型的字段不能作排序和聚合（terms Aggregation 除外），以下對 title 字段作聚合，title 定義爲 text：

GET books/_search
{
  "size": 0, 
  "aggs": {
    "doc_count": {
      "value_count":  {
        "field": "title"
      }
    }
  }
}

返回結果以下：

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [title] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "books",
        "node": "6n3douACShiPmlA9j2soBw",
        "reason": {
          "type": "illegal_argument_exception",
          "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [title] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
        }
      }
    ]
  },
  "status": 400
}

2.2 桶聚合

桶聚合官網文檔：Bucket Aggregations

Bucket 能夠理解爲一個桶，它會遍歷文檔中的內容，凡是符合某一要求的就放入一個桶中，分桶至關與 SQL 中 SQL 中的 group by。

桶聚合包括以下聚合：

terms Aggregation 用於分組聚合，統計屬於各編程語言的書籍數量，以下：

GET books/_search
{
  "size": 0, 
  "aggs": {
    "terms_count": {
      "terms":  {
        "field": "language"
      }
    }
  }
}

返回結果以下：

{
  "took": 31,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 5,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "terms_count": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "java",
          "doc_count": 2
        },
        {
          "key": "python",
          "doc_count": 2
        },
        {
          "key": "javascript",
          "doc_count": 1
        }
      ]
    }
  }
}

在 terms 分桶的基礎上，還能夠對每一個桶進行指標聚合。例如，想統計每一類圖書的平局價格，能夠先按照 language 字段進行 Terms Aggregation，再進行 Avg Aggregattion，查詢語句以下：

GET books/_search
{
  "size": 0, 
  "aggs": {
    "terms_count": {
      "terms":  {
        "field": "language"
      },
      "aggs": {
        "avg_price": {
          "avg": {
            "field": "price"
          }
        }
      }
    }
  }
}

返回結果以下：

{
  "took": 8,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 5,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "terms_count": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "java",
          "doc_count": 2,
          "avg_price": {
            "value": 58.35
          }
        },
        {
          "key": "python",
          "doc_count": 2,
          "avg_price": {
            "value": 67.95
          }
        },
        {
          "key": "javascript",
          "doc_count": 1,
          "avg_price": {
            "value": 66.4
          }
        }
      ]
    }
  }
}

Range Aggregation

Range Aggregation 是範圍聚合，用於反映數據的分佈狀況。好比，對 books 索引中的圖書按照價格區間在 0~50、50~80、80 以上進行範圍聚合，以下：

GET books/_search
{
  "size": 0, 
  "aggs": {
    "price_range": {
      "range": {
        "field": "price",
        "ranges": [
          {"to": 50},
          {"from": 50, "to": 80},
          {"from": 80}
        ]
      }
    }
  }
}

返回結果以下：

{
  "took": 16,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 5,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "price_range": {
      "buckets": [
        {
          "key": "*-50.0",
          "to": 50,
          "doc_count": 1
        },
        {
          "key": "50.0-80.0",
          "from": 50,
          "to": 80,
          "doc_count": 3
        },
        {
          "key": "80.0-*",
          "from": 80,
          "doc_count": 1
        }
      ]
    }
  }
}

Range Aggregation 不只能夠對數值型字段進行範圍統計，也能夠做用在日期類型上。Date Range Aggregation 專門用於日期類型的範圍聚合，和 Range Aggregation 的區別在於日期的起止值可使用數學表達式。

2.3 管道聚合

管道聚合官網文檔：Pipeline Aggregations

Pipeline Aggregations 處理的對象是其餘聚合的輸出（而不是文檔）。

2.4 矩陣聚合

矩陣聚合官網文檔：Matrix Aggregations

Matrix Stats

Matrix Stats 聚合是一種面向數值型的聚合，用於計算一組文檔字段中的如下統計信息：

計數：計算過程當中每種字段的樣本數量；

平均值：每一個字段數據的平均值；

方差：每一個字段樣本數據偏離平均值的程度；

偏度：量化每一個字段樣本數據在平均值附近的非對稱分佈狀況；

峯度：量化每一個字段樣本數據分佈的形狀；

協方差：一種量化描述一個字段數據隨另外一個字段數據變化程度的矩陣；

相關性：描述兩個字段數據之間的分佈關係，其協方差矩陣取值在[-1,1]之間。

主要用於計算兩個數值型字段之間的關係。如對日誌記錄長度和 HTTP 狀態碼之間關係的計算。

GET /_search
{
    "aggs": {
        "statistics": {
            "matrix_stats": {
                "fields": ["log_size", "status_code"]
            }
        }
    }
}

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。