java prometheus的數據類型介紹

時間 2019-11-08

原文原文鏈接

1、簡介

Prometheus將全部採集到的樣本數據以時間序列（time-series）的方式保存在內存數據庫中，並定時保存在硬盤上。時間序列中的每個樣本由如下三部分組成。html

指標(metric): metric name和描述當前樣本特徵的labelsets組成，參考格式如 <metric name>{<label name>=<label value>, ...}；，其中metric name的命名規則爲：應用名稱開頭_監測對像_數值類型_單位
時間截(timestamp):一個精確到毫秒的時間截；
樣本值(value):一個float64的浮點類型數據表示當前的樣本值。

2、Prometheus的四種數據類型

2.1 Counter(計數器類型)
Counter類型的指標的工做方式和計數器同樣，只增不減（除非系統發生了重置）。Counter通常用於累計值，例如記錄請求次數、任務完成數、錯誤發生次數。counter主要有兩個方法：node

//將counter值加1.
Inc()
// 將指定值加到counter值上，若是指定值< 0會panic.
Add(float64)

在Prometheus自定義的metrics監控中，Counter的使用能夠參考以下：spring

public class PrometheusMetricsInterceptor extends HandlerInterceptorAdapter {

    static final Counter requestCounter = Counter.build()
            .name("io_namespace_http_requests_total").labelNames("path", "method", "code") //metric name建議使用_total結尾
            .help("Total requests.").register();

    @Override
    public void afterCompletion(HttpServletRequest request, HttpServletResponse response, Object handler, Exception ex) throws Exception {
        String requestURI = request.getRequestURI();
        String method = request.getMethod();
        int status = response.getStatus();

        requestCounter.labels(requestURI, method, String.valueOf(status)).inc(); //調用inc()函數，每次請求發生時計數+1
        super.afterCompletion(request, response, handler, ex);
    }
}

Counter類型數據可讓用戶方便的瞭解事件產生的速率的變化，在PromQL內置的相關操做函數能夠提供相應的分析，好比以HTTP應用請求量來進行說明：數據庫

//經過rate()函數獲取HTTP請求量的增加率
rate(http_requests_total[5m])
//查詢當前系統中，訪問量前10的HTTP地址
topk(10, http_requests_total)

2.2 Gauge(儀表盤類型)
Gauge是可增可減的指標類，能夠用於反應當前應用的狀態。好比在監控主機時，主機當前的內容大小(node_memory_MemFree)，可用內存大小（node_memory_MemAvailable）。或者時容器當前的cpu使用率，內存使用率。
Gauge指標對象主要包含兩個方法inc()以及dec()，用戶添加或者減小計數。
在Prometheus自定義的metrics監控中，Gauge的使用能夠參考以下：app

public class PrometheusMetricsInterceptor extends HandlerInterceptorAdapter {

...省略的代碼
static final Gauge inprogressRequests = Gauge.build()
        .name("io_namespace_http_inprogress_requests").labelNames("path", "method", "code")
        .help("Inprogress requests.").register();

@Override
public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) throws Exception {
    ...省略的代碼
    inprogressRequests.labels(requestURI, method, String.valueOf(status)).inc();// 計數器+1
    return super.preHandle(request, response, handler);
}

@Override
public void afterCompletion(HttpServletRequest request, HttpServletResponse response, Object handler, Exception ex) throws Exception {
    ...省略的代碼
    inprogressRequests.labels(requestURI, method, String.valueOf(status)).dec();// 計數器-1
    super.afterCompletion(request, response, handler, ex);
}
}

對於Gauge類型的監控指標，經過PromQL內置函數delta()能夠獲取樣本在一段時間內的變化狀況，好比：ide

dalta(cpu_temp_celsius{host="zeus"}[2h]) //計算CPU溫度在兩小時內的差別
predict_linear(node_filesystem_free{job="node"}[1h], 4*3600) //預測系統磁盤空間在4小時以後的剩餘狀況

2.3 Histogram(直方圖類型)
Histogram 由 < basename>_bucket{le="< upper inclusive bound>"}，< basename>_bucket{le="+Inf"}, < basename>_sum，_count 組成，主要用於表示一段時間範圍內對數據進行採樣（一般是請求持續時間或響應大小），並可以對其指定區間以及總數進行統計，一般它採集的數據展現爲直方圖。
在Prometheus自定義的metrics監控中，Histgram的使用能夠參考以下：
以請求響應時間requests_latency_seconds爲例，好比咱們須要記錄http請求響應時間符合在分佈範圍{0.005，0.01，0.025，0.05，0.075，0.1，0.25，0.5，0.75，1，2.5，5，7.5，10}中的次數時函數

public class PrometheusMetricsInterceptor extends HandlerInterceptorAdapter {

    static final Histogram requestLatencyHistogram = Histogram.build().labelNames("path", "method", "code")
            .name("io_namespace_http_requests_latency_seconds_histogram").help("Request latency in seconds.")
            .register();

    private Histogram.Timer histogramRequestTimer;

    @Override
    public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) throws Exception {
        ...省略的代碼
        histogramRequestTimer = requestLatencyHistogram.labels(requestURI, method, String.valueOf(status)).startTimer();
        ...省略的代碼
    }

    @Override
    public void afterCompletion(HttpServletRequest request, HttpServletResponse response, Object handler, Exception ex) throws Exception {
        ...省略的代碼
        histogramRequestTimer.observeDuration();
        ...省略的代碼
    }

使用Histogram構造器在建立Histogram監控指標時，默認的buckets範圍爲{0.005，0.01，0.025，0.05，0.075，0.1，0.25，0.5，0.75，1，2.5，5，7.5，10}，若是要修改默認的buckets，可使用.buckets(double… bukets)覆蓋。
Histogram會自動建立3個指標，分別爲：spring-boot

事件發生的總次數，basename_count。

# 實際含義： 當前一共發生了2次http請求
io_namespace_http_requests_latency_seconds_histogram_count{path="/",method="GET",code="200",} 2.0

全部事件產生值的大小的總和，basename_sum。

# 實際含義： 發生的2次http請求總的響應時間爲13.107670803000001 秒
io_namespace_http_requests_latency_seconds_histogram_sum{path="/",method="GET",code="200",} 13.107670803000001

事件產生的值分佈在bucket中的次數，basename_bucket{le=「上包含」}

2.4 Summary(摘要類型)
Summary類型和Histogram類型類似，由< basename>{quantile="< φ>"}，< basename>_sum，< basename>_count組成，主要用於表示一段時間內數據採樣結果（一般時請求持續時間或響應大小），它直接存儲了quantile數據，而不是根據統計區間計算出來的。Summary與Histogram相比，存在以下區別：學習

都包含 < basename>_sum和< basename>_count;
Histogram須要經過< basename>_bucket計算quantile，而Summary直接存儲了quantile的值。
在Prometheus自定義的metrics監控中，Summary的使用能夠參考以下：

public class PrometheusMetricsInterceptor extends HandlerInterceptorAdapter {

    static final Summary requestLatency = Summary.build()
            .name("io_namespace_http_requests_latency_seconds_summary")
            .quantile(0.5, 0.05)
            .quantile(0.9, 0.01)
            .labelNames("path", "method", "code")
            .help("Request latency in seconds.").register();


    @Override
    public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) throws Exception {
        ...省略的代碼
        requestTimer = requestLatency.labels(requestURI, method, String.valueOf(status)).startTimer();
        ...省略的代碼
    }

    @Override
    public void afterCompletion(HttpServletRequest request, HttpServletResponse response, Object handler, Exception ex) throws Exception {
        ...省略的代碼
        requestTimer.observeDuration();
        ...省略的代碼
    }
}

Summary類型指標中包含的數據以下：ui

事件發生總的次數

# 含義：當前http請求發生總次數爲12次
io_namespace_http_requests_latency_seconds_summary_count{path="/",method="GET",code="200",} 12.0

事件產生的值的總和

# 含義：這12次http請求的總響應時間爲 51.029495508s
io_namespace_http_requests_latency_seconds_summary_sum{path="/",method="GET",code="200",} 51.029495508

事件產生的值的分佈狀況

# 含義：這12次http請求響應時間的中位數是3.052404983s
io_namespace_http_requests_latency_seconds_summary{path="/",method="GET",code="200",quantile="0.5",} 3.052404983
# 含義：這12次http請求響應時間的9分位數是8.003261666s
io_namespace_http_requests_latency_seconds_summary{path="/",method="GET",code="200",quantile="0.9",} 8.003261666