使用Prometheus監控的Spring Boot程序

時間 2019-11-06

標籤使用 prometheus 監控 spring boot 程序欄目 Spring 简体版

原文原文鏈接

本文將會帶領讀者，在Spring Boot/Spring Cloud應用中添加對Prometheus監控的支持，以實現對應用性能以及業務相關監控指標的數據採集。同時也會介紹Prometheus中不一樣的Metrics類型的使用場景。java

添加Prometheus Java Client依賴

這裏使用0.0.24的版本，在以前的版本中Spring Boot暴露的監控地址，沒法正確的處理Prometheus Server的請求，詳情： https://github.com/prometheus/client_java/issues/265node

# build.gradle
...
dependencies {
    ...
    compile 'io.prometheus:simpleclient:0.0.24'
    compile "io.prometheus:simpleclient_spring_boot:0.0.24"
    compile "io.prometheus:simpleclient_hotspot:0.0.24"
}
...

啓用Prometheus Metrics Endpoint

添加註解@EnablePrometheusEndpoint啓用Prometheus Endpoint,這裏同時使用了simpleclient_hotspot中提供的DefaultExporter該Exporter會在metrics endpoint中放回當前應用JVM的相關信息git

@SpringBootApplication
@EnablePrometheusEndpoint
public class SpringApplication implements CommandLineRunner {

    public static void main(String[] args) {
        SpringApplication.run(GatewayApplication.class, args);
    }

    @Override
    public void run(String... strings) throws Exception {
        DefaultExports.initialize();
    }
}

默認狀況下Prometheus暴露的metrics endpoint爲 /prometheus，能夠經過endpoint配置進行修改github

endpoints:
  prometheus:
    id: metrics
  metrics:
    id: springmetrics
    sensitive: false
    enabled: true

啓動應用程序訪問 http://localhost:8080/metrics 能夠看到如下輸出：spring

# HELP jvm_gc_collection_seconds Time spent in a given JVM garbage collector in seconds.
# TYPE jvm_gc_collection_seconds summary
jvm_gc_collection_seconds_count{gc="PS Scavenge",} 11.0
jvm_gc_collection_seconds_sum{gc="PS Scavenge",} 0.18
jvm_gc_collection_seconds_count{gc="PS MarkSweep",} 2.0
jvm_gc_collection_seconds_sum{gc="PS MarkSweep",} 0.121
# HELP jvm_classes_loaded The number of classes that are currently loaded in the JVM
# TYPE jvm_classes_loaded gauge
jvm_classes_loaded 8376.0
# HELP jvm_classes_loaded_total The total number of classes that have been loaded since the JVM has started execution
# TYPE jvm_classes_loaded_total counter
...

添加攔截器，爲監控埋點作準備

除了獲取應用JVM相關的狀態之外，咱們還可能須要添加一些自定義的監控Metrics實現對系統性能，以及業務狀態進行採集，以提供往後優化的相關支撐數據。首先咱們使用攔截器處理對應用的全部請求。服務器

繼承WebMvcConfigurerAdapter類，複寫addInterceptors方法，對全部請求/**添加攔截器jvm

@SpringBootApplication
@EnablePrometheusEndpoint
public class SpringApplication extends WebMvcConfigurerAdapter implements CommandLineRunner {
    @Override
    public void addInterceptors(InterceptorRegistry registry) {
        registry.addInterceptor(new PrometheusMetricsInterceptor()).addPathPatterns("/**");
    }
}

PrometheusMetricsInterceptor集成HandlerInterceptorAdapter，經過複寫父方法，實現對請求處理前/處理完成的處理。ide

public class PrometheusMetricsInterceptor extends HandlerInterceptorAdapter {
    @Override
    public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) throws Exception {
        return super.preHandle(request, response, handler);
    }

    @Override
    public void afterCompletion(HttpServletRequest request, HttpServletResponse response, Object handler, Exception ex) throws Exception {
        super.afterCompletion(request, response, handler, ex);
    }
}

自定義Metrics

Prometheus提供了4中不一樣的Metrics類型:Counter,Gauge,Histogram,Summary函數

Counter:只增不減的計數器

計數器能夠用於記錄只會增長不會減小的指標類型,好比記錄應用請求的總量(http_requests_total)，cpu使用時間(process_cpu_seconds_total)等。性能

對於Counter類型的指標，只包含一個inc()方法，用於計數器+1

通常而言，Counter類型的metrics指標在命名中咱們使用_total結束。

public class PrometheusMetricsInterceptor extends HandlerInterceptorAdapter {

    static final Counter requestCounter = Counter.build()
            .name("io_namespace_http_requests_total").labelNames("path", "method", "code")
            .help("Total requests.").register();

    @Override
    public void afterCompletion(HttpServletRequest request, HttpServletResponse response, Object handler, Exception ex) throws Exception {
        String requestURI = request.getRequestURI();
        String method = request.getMethod();
        int status = response.getStatus();

        requestCounter.labels(requestURI, method, String.valueOf(status)).inc();
        super.afterCompletion(request, response, handler, ex);
    }
}

使用Counter.build()建立Counter metrics，name()方法，用於指定該指標的名稱 labelNames()方法，用於聲明該metrics擁有的維度label。在preHandle方法中，咱們獲取當前請求的，RequesPath，Method以及狀態碼。而且調用inc()方法，在每次請求發生時計數+1。

Counter.build()…register(),會像Collector中註冊該指標，而且當訪問/metrics地址時，返回該指標的狀態。

經過指標io_namespace_http_requests_total咱們能夠：

查詢應用的請求總量

# PromQL
sum(io_namespace_http_requests_total)

查詢每秒Http請求量

# PromQL
sum(rate(io_wise2c_gateway_requests_total[5m]))

查詢當前應用請求量Top N的URI

# PromQL
topk(10, sum(io_namespace_http_requests_total) by (path))

Gauge: 可增可減的儀表盤

對於這類可增可減的指標，能夠用於反應應用的 當前狀態 ,例如在監控主機時，主機當前空閒的內容大小(node_memory_MemFree)，可用內存大小(node_memory_MemAvailable)。或者容器當前的cpu使用率,內存使用率。

對於Gauge指標的對象則包含兩個主要的方法inc()以及dec(),用戶添加或者減小計數。在這裏咱們使用Gauge記錄當前正在處理的Http請求數量。

public class PrometheusMetricsInterceptor extends HandlerInterceptorAdapter {

    ...省略的代碼
    static final Gauge inprogressRequests = Gauge.build()
            .name("io_namespace_http_inprogress_requests").labelNames("path", "method", "code")
            .help("Inprogress requests.").register();

    @Override
    public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) throws Exception {
        ...省略的代碼
        // 計數器+1
        inprogressRequests.labels(requestURI, method, String.valueOf(status)).inc();
        return super.preHandle(request, response, handler);
    }

    @Override
    public void afterCompletion(HttpServletRequest request, HttpServletResponse response, Object handler, Exception ex) throws Exception {
        ...省略的代碼
        // 計數器-1
        inprogressRequests.labels(requestURI, method, String.valueOf(status)).dec();

        super.afterCompletion(request, response, handler, ex);
    }
}

經過指標io_namespace_http_inprogress_requests咱們能夠直接查詢應用當前正在處理中的Http請求數量:

# PromQL
io_namespace_http_inprogress_requests{}

Histogram：用於統計分佈狀況的柱狀圖

主要用於在指定分佈範圍內(Buckets)記錄大小(如http request bytes)或者事件發生的次數。

以請求響應時間requests_latency_seconds爲例，假如咱們須要記錄http請求響應時間符合在分佈範圍{.005, .01, .025, .05, .075, .1, .25, .5, .75, 1, 2.5, 5, 7.5, 10}中的次數時。

public class PrometheusMetricsInterceptor extends HandlerInterceptorAdapter {

    static final Histogram requestLatencyHistogram = Histogram.build().labelNames("path", "method", "code")
            .name("io_namespace_http_requests_latency_seconds_histogram").help("Request latency in seconds.")
            .register();

    private Histogram.Timer histogramRequestTimer;

    @Override
    public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) throws Exception {
        ...省略的代碼
        histogramRequestTimer = requestLatencyHistogram.labels(requestURI, method, String.valueOf(status)).startTimer();
        ...省略的代碼
    }

    @Override
    public void afterCompletion(HttpServletRequest request, HttpServletResponse response, Object handler, Exception ex) throws Exception {
        ...省略的代碼
        histogramRequestTimer.observeDuration();
        ...省略的代碼
    }
}

使用Histogram構造器能夠建立Histogram監控指標。默認的buckets範圍爲{.005, .01, .025, .05, .075, .1, .25, .5, .75, 1, 2.5, 5, 7.5, 10}。如何須要覆蓋默認的buckets，可使用.buckets(double… buckets)覆蓋。

Histogram會自動建立3個指標，分別爲：

事件發生總次數： basename_count

# 實際含義： 當前一共發生了2次http請求
io_namespace_http_requests_latency_seconds_histogram_count{path="/",method="GET",code="200",} 2.0

全部事件產生值的大小的總和: basename_sum

# 實際含義： 發生的2次http請求總的響應時間爲13.107670803000001 秒
io_namespace_http_requests_latency_seconds_histogram_sum{path="/",method="GET",code="200",} 13.107670803000001

事件產生的值分佈在bucket中的次數： basename_bucket{le=」上包含」}

# 在總共2次請求當中。http請求響應時間 <=0.005 秒 的請求次數爲0
io_namespace_http_requests_latency_seconds_histogram_bucket{path="/",method="GET",code="200",le="0.005",} 0.0
# 在總共2次請求當中。http請求響應時間 <=0.01 秒 的請求次數爲0
io_namespace_http_requests_latency_seconds_histogram_bucket{path="/",method="GET",code="200",le="0.01",} 0.0
# 在總共2次請求當中。http請求響應時間 <=0.025 秒 的請求次數爲0
io_namespace_http_requests_latency_seconds_histogram_bucket{path="/",method="GET",code="200",le="0.025",} 0.0
io_namespace_http_requests_latency_seconds_histogram_bucket{path="/",method="GET",code="200",le="0.05",} 0.0
io_namespace_http_requests_latency_seconds_histogram_bucket{path="/",method="GET",code="200",le="0.075",} 0.0
io_namespace_http_requests_latency_seconds_histogram_bucket{path="/",method="GET",code="200",le="0.1",} 0.0
io_namespace_http_requests_latency_seconds_histogram_bucket{path="/",method="GET",code="200",le="0.25",} 0.0
io_namespace_http_requests_latency_seconds_histogram_bucket{path="/",method="GET",code="200",le="0.5",} 0.0
io_namespace_http_requests_latency_seconds_histogram_bucket{path="/",method="GET",code="200",le="0.75",} 0.0
io_namespace_http_requests_latency_seconds_histogram_bucket{path="/",method="GET",code="200",le="1.0",} 0.0
io_namespace_http_requests_latency_seconds_histogram_bucket{path="/",method="GET",code="200",le="2.5",} 0.0
io_namespace_http_requests_latency_seconds_histogram_bucket{path="/",method="GET",code="200",le="5.0",} 0.0
io_namespace_http_requests_latency_seconds_histogram_bucket{path="/",method="GET",code="200",le="7.5",} 2.0
# 在總共2次請求當中。http請求響應時間 <=10 秒 的請求次數爲0
io_namespace_http_requests_latency_seconds_histogram_bucket{path="/",method="GET",code="200",le="10.0",} 2.0
# 在總共2次請求當中。http請求響應時間 10 秒 的請求次數爲0
io_namespace_http_requests_latency_seconds_histogram_bucket{path="/",method="GET",code="200",le="+Inf",} 2.0

Summary

Summary和Histogram很是類型類似，均可以統計事件發生的次數或者發小，以及其分佈狀況。

public class PrometheusMetricsInterceptor extends HandlerInterceptorAdapter {

    static final Summary requestLatency = Summary.build()
            .name("io_namespace_http_requests_latency_seconds_summary")
            .quantile(0.5, 0.05)
            .quantile(0.9, 0.01)
            .labelNames("path", "method", "code")
            .help("Request latency in seconds.").register();


    @Override
    public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) throws Exception {
        ...省略的代碼
        requestTimer = requestLatency.labels(requestURI, method, String.valueOf(status)).startTimer();
        ...省略的代碼
    }

    @Override
    public void afterCompletion(HttpServletRequest request, HttpServletResponse response, Object handler, Exception ex) throws Exception {
        ...省略的代碼
        requestTimer.observeDuration();
        ...省略的代碼
    }
}

使用Summary指標，會自動建立多個時間序列：

事件發生總的次數

# 含義：當前http請求發生總次數爲12次
io_namespace_http_requests_latency_seconds_summary_count{path="/",method="GET",code="200",} 12.0

事件產生的值的總和

# 含義：這12次http請求的總響應時間爲 51.029495508s
io_namespace_http_requests_latency_seconds_summary_sum{path="/",method="GET",code="200",} 51.029495508

事件產生的值的分佈狀況

# 含義：這12次http請求響應時間的中位數是3.052404983s
io_namespace_http_requests_latency_seconds_summary{path="/",method="GET",code="200",quantile="0.5",} 3.052404983
# 含義：這12次http請求響應時間的9分位數是8.003261666s
io_namespace_http_requests_latency_seconds_summary{path="/",method="GET",code="200",quantile="0.9",} 8.003261666

Summary VS Histogram

Summary和Histogram都提供了對於事件的計數_count以及值的彙總_sum。所以使用_count,和_sum時間序列能夠計算出相同的內容，例如http每秒的平均響應時間：rate(basename_sum[5m]) / rate(basename_count[5m])。

同時Summary和Histogram均可以計算和統計樣本的分佈狀況，好比中位數，9分位數等等。其中 0.0<= 分位數Quantiles <= 1.0。

不一樣在於Histogram能夠經過histogram_quantile函數在服務器端計算分位數。而Sumamry的分位數則是直接在客戶端進行定義。所以對於分位數的計算。 Summary在經過PromQL進行查詢時有更好的性能表現，而Histogram則會消耗更多的資源。相對的對於客戶端而言Histogram消耗的資源更少。

使用Collector暴露業務指標

除了在攔截器中使用Prometheus提供的Counter,Summary,Gauage等構造監控指標之外，咱們還能夠經過自定義的Collector實現對相關業務指標的暴露

@SpringBootApplication
@EnablePrometheusEndpoint
public class SpringApplication extends WebMvcConfigurerAdapter implements CommandLineRunner {

    @Autowired
    private CustomExporter customExporter;

    ...省略的代碼

    @Override
    public void run(String... args) throws Exception {
        ...省略的代碼
        customExporter.register();
    }
}

CustomExporter集成自io.prometheus.client.Collector，在調用Collector的register()方法後，當訪問/metrics時，則會自動從Collector的collection()方法中獲取採集到的監控指標。

因爲這裏CustomExporter存在於Spring的IOC容器當中，這裏能夠直接訪問業務代碼，返回須要的業務相關的指標。

import io.prometheus.client.Collector;
import io.prometheus.client.GaugeMetricFamily;
import org.springframework.stereotype.Component;

import java.util.ArrayList;
import java.util.Collections;
import java.util.List;

@Component
public class CustomExporter extends Collector {
    @Override
    public List<MetricFamilySamples> collect() {
        List<MetricFamilySamples> mfs = new ArrayList<>();

        # 建立metrics指標
        GaugeMetricFamily labeledGauge =
                new GaugeMetricFamily("io_namespace_custom_metrics", "custom metrics", Collections.singletonList("labelname"));

        # 設置指標的label以及value
        labeledGauge.addMetric(Collections.singletonList("labelvalue"), 1);

        mfs.add(labeledGauge);
        return mfs;
    }
}

固然這裏也可使用CounterMetricFamily，SummaryMetricFamily聲明其它的指標類型。