使用Prometheus搞定微服務監控

時間 2021-08-15

標籤 html git github docker shell 數據庫 api app 框架 ide 欄目 HTML 简体版

原文原文鏈接

最近對服務進行監控，而當前監控最流行的數據庫就是 Prometheus，同時 go-zero 默認接入也是這款數據庫。今天就對 go-zero 是如何接入 Prometheus ，以及開發者如何本身定義本身監控指標。html

監控接入

go-zero 框架中集成了基於 prometheus 的服務指標監控。可是沒有顯式打開，須要開發者在 config.yaml 中配置：git

Prometheus:
  Host: 127.0.0.1
  Port: 9091
  Path: /metrics

若是開發者是在本地搭建 Prometheus，須要在 Prometheus 的配置文件 prometheus.yaml 中寫入須要收集服務監控信息的配置：github

- job_name: 'file_ds'
    static_configs:
      - targets: ['your-local-ip:9091']
        labels:
          job: activeuser
          app: activeuser-api
          env: dev
          instance: your-local-ip:service-port

由於本地是用 docker 運行的。將 prometheus.yaml 放置在 docker-prometheus 目錄下：docker

docker run \
    -p 9090:9090 \
    -v dockeryml/docker-prometheus:/etc/prometheus \
    prom/prometheus

打開 localhost:9090 就能夠看到：shell

點擊 http://service-ip:9091/metrics 就能夠看到該服務的監控信息：數據庫

上圖咱們能夠看出有兩種 bucket，以及 count/sum 指標。api

那 go-zero 是如何集成監控指標？監控的又是什麼指標？咱們如何定義咱們本身的指標？下面就來解釋這些問題app

> 以上的基本接入，能夠參看咱們的另一篇：https://zeromicro.github.io/go-zero/service-monitor.html框架

如何集成

上面例子中的請求方式是 HTTP，也就是在請求服務端時，監控指標數據不斷被蒐集。很容易想到是 中間件 的功能，具體代碼：https://github.com/tal-tech/go-zero/blob/master/rest/handler/prometheushandler.go。ide

var (
	metricServerReqDur = metric.NewHistogramVec(&amp;metric.HistogramVecOpts{
		...
    // 監控指標
		Labels:    []string{"path"},
    // 直方圖分佈中，統計的桶
		Buckets:   []float64{5, 10, 25, 50, 100, 250, 500, 1000},
	})

	metricServerReqCodeTotal = metric.NewCounterVec(&amp;metric.CounterVecOpts{
		...
    // 監控指標：直接在記錄指標 incr() 便可
		Labels:    []string{"path", "code"},
	})
)

func PromethousHandler(path string) func(http.Handler) http.Handler {
	return func(next http.Handler) http.Handler {
		return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
      // 請求進入的時間
			startTime := timex.Now()
			cw := &amp;security.WithCodeResponseWriter{Writer: w}
			defer func() {
        // 請求返回的時間
				metricServerReqDur.Observe(int64(timex.Since(startTime)/time.Millisecond), path)
				metricServerReqCodeTotal.Inc(path, strconv.Itoa(cw.Code))
			}()
			// 中間件放行，執行完後續中間件和業務邏輯。從新回到這，作一個完整請求的指標上報
      // [????：洋蔥模型]
			next.ServeHTTP(cw, r)
		})
	}
}

其實整個很簡單：

HistogramVec 負責請求耗時蒐集：
- bucket 存放的就是 option 指定的耗時指標。某個請求耗時多少就會被彙集對應的桶，計數。
- 最終展現的就是一個路由在不一樣耗時的分佈，很直觀提供給開發者能夠優化的區域。
CounterVec 負責指定 labels 標籤蒐集：
- Labels: []string{"path", "code"}
- labels 至關一個 tuple。go-zero 是以(path, code)做爲總體，記錄不一樣路由不一樣狀態碼的返回次數。若是 4xx,5xx過多的時候，是否是應該看看你的服務健康程度？

如何自定義

go-zero 中也提供了 prometheus metric 基本封裝，供開發者本身開發本身 prometheus 中間件。

> 代碼：https://github.com/tal-tech/go-zero/tree/master/core/metric

名稱	用途	蒐集函數
CounterVec	單一的計數。用作：QPS統計	`CounterVec.Inc()` 指標+1
GuageVec	單純指標記錄。適用於磁盤容量，CPU/Mem使用率（可增長可減小）	`GuageVec.Inc()/GuageVec.Add()` 指標+1/指標加N，也能夠爲負數
HistogramVec	反應數值的分佈狀況。適用於：請求耗時、響應大小	`HistogramVec.Observe(val, labels)` 記錄指標當前對應值，並找到值所在的桶，+1

> 另外對 HistogramVec.Observe() 作一個基本分析： > > 咱們其實能夠看到上圖每一個 HistogramVec 統計都會有3個序列出現： > > - _count：數據個數 > - _sum：所有數據加和 > - _bucket{le=a1}：處於 [-inf, a1] 的數據個數 > > 因此咱們也猜想在統計過程當中，分3種數據進行統計： > > go > // 基本上在prometheus的統計都是使用 atomic CAS 方式進行計數的 > // 性能要比使用 Mutex 要高 > func (h *histogram) observe(v float64, bucket int) { > n := atomic.AddUint64(&h.countAndHotIdx, 1) > hotCounts := h.counts[n>>63] > > if bucket < len(h.upperBounds) { > // val 對應數據桶 +1 > atomic.AddUint64(&hotCounts.buckets[bucket], 1) > } > for { > oldBits := atomic.LoadUint64(&hotCounts.sumBits) > newBits := math.Float64bits(math.Float64frombits(oldBits) + v) > // sum指標數值 +v（畢竟是總數sum） > if atomic.CompareAndSwapUint64(&hotCounts.sumBits, oldBits, newBits) { > break > } > } > // count 統計 +1 > atomic.AddUint64(&hotCounts.count, 1) > } >

因此開發者想定義本身的監控指標：