如何區分prometheus中Histogram和Summary類型的metrics？

時間 2019-11-17

標籤如何區分 prometheus histogram summary 類型 metrics 简体版

原文原文鏈接

要理解它們的區別，關鍵仍是告業務應用。html

但如何在學習時，如何區分呢？git

有如下幾個維度：服務器

histogram有bucket，summary在quatile。函數

summary分位數是客戶端計算上報，histogram中位數涉及服務端計算。性能

具體能夠參看以下兩個連接：學習

https://yunlzheng.gitbook.io/prometheus-book/parti-prometheus-ji-chu/promql/prometheus-metrics-typesserver

https://songjiayang.gitbooks.io/prometheus/content/concepts/metric-types.htmlhtm

Histogram
Histogram 由 <basename>_bucket{le="<upper inclusive bound>"}，<basename>_bucket{le="+Inf"}, <basename>_sum，<basename>_count 組成，主要用於表示一段時間範圍內對數據進行採樣（一般是請求持續時間或響應大小），並可以對其指定區間以及總數進行統計，一般它採集的數據展現爲直方圖。
例如 Prometheus server 中 prometheus_local_storage_series_chunks_persisted, 表示 Prometheus 中每一個時序須要存儲的 chunks 數量，咱們能夠用它計算待持久化的數據的分位數。
Summary
Summary 和 Histogram 相似，由 <basename>{quantile="<φ>"}，<basename>_sum，<basename>_count 組成，主要用於表示一段時間內數據採樣結果（一般是請求持續時間或響應大小），它直接存儲了 quantile 數據，而不是根據統計區間計算出來的。
例如 Prometheus server 中 prometheus_target_interval_length_seconds。
Histogram vs Summary
都包含 <basename>_sum，<basename>_count
Histogram 須要經過 <basename>_bucket 計算 quantile, 而 Summary 直接存儲了 quantile 的值。
====================================blog

使用Histogram和Summary分析數據分佈狀況
除了Counter和Gauge類型的監控指標之外，Prometheus還定義分別定義Histogram和Summary的指標類型。Histogram和Summary主用用於統計和分析樣本的分佈狀況。資源

在大多數狀況下人們都傾向於使用某些量化指標的平均值，例如CPU的平均使用率、頁面的平均響應時間。這種方式的問題很明顯，以系統API調用的平均響應時間爲例：若是大多數API請求都維持在100ms的響應時間範圍內，而個別請求的響應時間須要5s，那麼就會致使某些WEB頁面的響應時間落到中位數的狀況，而這種現象被稱爲長尾問題。

爲了區分是平均的慢仍是長尾的慢，最簡單的方式就是按照請求延遲的範圍進行分組。例如，統計延遲在0~10ms之間的請求數有多少而10~20ms之間的請求數又有多少。經過這種方式能夠快速分析系統慢的緣由。Histogram和Summary都是爲了可以解決這樣問題的存在，經過Histogram和Summary類型的監控指標，咱們能夠快速瞭解監控樣本的分佈狀況。

例如，指標prometheus_tsdb_wal_fsync_duration_seconds的指標類型爲Summary。它記錄了Prometheus Server中wal_fsync處理的處理時間，經過訪問Prometheus Server的/metrics地址，能夠獲取到如下監控樣本數據：

# HELP prometheus_tsdb_wal_fsync_duration_seconds Duration of WAL fsync.
# TYPE prometheus_tsdb_wal_fsync_duration_seconds summary
prometheus_tsdb_wal_fsync_duration_seconds{quantile="0.5"} 0.012352463
prometheus_tsdb_wal_fsync_duration_seconds{quantile="0.9"} 0.014458005
prometheus_tsdb_wal_fsync_duration_seconds{quantile="0.99"} 0.017316173
prometheus_tsdb_wal_fsync_duration_seconds_sum 2.888716127000002
prometheus_tsdb_wal_fsync_duration_seconds_count 216
從上面的樣本中能夠得知當前Prometheus Server進行wal_fsync操做的總次數爲216次，耗時2.888716127000002s。其中中位數（quantile=0.5）的耗時爲0.012352463，9分位數（quantile=0.9）的耗時爲0.014458005s。

在Prometheus Server自身返回的樣本數據中，咱們還能找到類型爲Histogram的監控指標prometheus_tsdb_compaction_chunk_range_bucket。

# HELP prometheus_tsdb_compaction_chunk_range Final time range of chunks on their first compaction
# TYPE prometheus_tsdb_compaction_chunk_range histogram
prometheus_tsdb_compaction_chunk_range_bucket{le="100"} 0
prometheus_tsdb_compaction_chunk_range_bucket{le="400"} 0
prometheus_tsdb_compaction_chunk_range_bucket{le="1600"} 0
prometheus_tsdb_compaction_chunk_range_bucket{le="6400"} 0
prometheus_tsdb_compaction_chunk_range_bucket{le="25600"} 0
prometheus_tsdb_compaction_chunk_range_bucket{le="102400"} 0
prometheus_tsdb_compaction_chunk_range_bucket{le="409600"} 0
prometheus_tsdb_compaction_chunk_range_bucket{le="1.6384e+06"} 260
prometheus_tsdb_compaction_chunk_range_bucket{le="6.5536e+06"} 780
prometheus_tsdb_compaction_chunk_range_bucket{le="2.62144e+07"} 780
prometheus_tsdb_compaction_chunk_range_bucket{le="+Inf"} 780
prometheus_tsdb_compaction_chunk_range_sum 1.1540798e+09
prometheus_tsdb_compaction_chunk_range_count 780
與Summary類型的指標類似之處在於Histogram類型的樣本一樣會反應當前指標的記錄的總數(以_count做爲後綴)以及其值的總量（以_sum做爲後綴）。不一樣在於Histogram指標直接反應了在不一樣區間內樣本的個數，區間經過標籤len進行定義。

同時對於Histogram的指標，咱們還能夠經過histogram_quantile()函數計算出其值的分位數。不一樣在於Histogram經過histogram_quantile函數是在服務器端計算的分位數。而Sumamry的分位數則是直接在客戶端計算完成。所以對於分位數的計算而言，Summary在經過PromQL進行查詢時有更好的性能表現，而Histogram則會消耗更多的資源。反之對於客戶端而言Histogram消耗的資源更少。在選擇這兩種方式時用戶應該按照本身的實際場景進行選擇。

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。