從源碼看Prometheus的遠程寫調優

時間 2020-06-22

標籤源碼 prometheus 遠程简体版

原文原文鏈接

本週在配置Prometheus的遠端存儲的時，發現配置完運行一段時間後，日誌中有警告信息: "Skipping resharding, last successful send was beyond threshold"；排查後發現，原來Prometheus對remote write的配置在最佳實踐中早有說起相關優化建議。git

日誌信息

這裏測試把InfluxDB做爲Prometheus的遠端存儲，不作配置優化的狀況下，咱們先來看一下詳細的報錯信息：github

ts=2020-05-14T03:07:15.114Z caller=dedupe.go:112 component=remote level=warn remote_name=11a319 url="http://192.168.1.1:8086/api/v1/prom/write?db=prometheus" msg="Skipping resharding, last successful send was beyond threshold" lastSendTimestamp=1589425620 minSendTimestamp=1589425625
複製代碼

日誌信息的大意爲「上次成功發送超出閥值」；說實話，這裏的日誌提示的仍是比較晦澀；不由讓人反問：「超出什麼閥值」？提取日誌中的關鍵字，在GitHub Prometheus的源碼中搜索，咱們一步步來看下具體的代碼實現:golang

定義隊列管理器

首先定義了一個名爲"QueueMananger"的結構體，暫且稱他爲"隊列管理器"。Shou you the code:shell

type QueueManager struct {
    // https://golang.org/pkg/sync/atomic/#pkg-note-BUG
    lastSendTimestamp int64

    logger         log.Logger
    flushDeadline  time.Duration
    cfg            config.QueueConfig
    externalLabels labels.Labels
    relabelConfigs []*relabel.Config
    watcher        *wal.Watcher

    clientMtx   sync.RWMutex
    storeClient StorageClient

    seriesMtx            sync.Mutex
    seriesLabels         map[uint64]labels.Labels
    seriesSegmentIndexes map[uint64]int
    droppedSeries        map[uint64]struct{}

    shards      *shards
    numShards   int
    reshardChan chan int
    quit        chan struct{}
    wg          sync.WaitGroup

    samplesIn, samplesDropped, samplesOut, samplesOutDuration *ewmaRate

    metrics *queueManagerMetrics
}
複製代碼

經過隊列管理器(QueueManager)結構體的定義，咱們注意以下幾個字段：後端

numShards: 分片數量，int類型；
reshardChan: reshard通道，channel類型；
cfg: 對應Prometheus的配置參數，config.QueueConfig類型；

隊列管理器初始化

NewQueueManager函數是隊列管理器的初始化方法；Show you the code:api

// NewQueueManager builds a new QueueManager.
func NewQueueManager( metrics *queueManagerMetrics, watcherMetrics *wal.WatcherMetrics, readerMetrics *wal.LiveReaderMetrics, logger log.Logger, walDir string, samplesIn *ewmaRate, cfg config.QueueConfig, externalLabels labels.Labels, relabelConfigs []*relabel.Config, client StorageClient, flushDeadline time.Duration, ) *QueueManager {
    if logger == nil {
        logger = log.NewNopLogger()
    }

    logger = log.With(logger, remoteName, client.Name(), endpoint, client.Endpoint())
    t := &QueueManager{
        logger:         logger,
        flushDeadline:  flushDeadline,
        cfg:            cfg,
        externalLabels: externalLabels,
        relabelConfigs: relabelConfigs,
        storeClient:    client,
        seriesLabels:         make(map[uint64]labels.Labels),
        seriesSegmentIndexes: make(map[uint64]int),
        droppedSeries:        make(map[uint64]struct{}),
        numShards:   cfg.MinShards,
        reshardChan: make(chan int),
        quit:        make(chan struct{}),
        samplesIn:          samplesIn,
        samplesDropped:     newEWMARate(ewmaWeight, shardUpdateDuration),
        samplesOut:         newEWMARate(ewmaWeight, shardUpdateDuration),
        samplesOutDuration: newEWMARate(ewmaWeight, shardUpdateDuration),

        metrics: metrics,
    }

    t.watcher = wal.NewWatcher(watcherMetrics, readerMetrics, logger, client.Name(), t, walDir)
    t.shards = t.newShards()
    return t
}

複製代碼

經過初始化方法，咱們能夠知道以下幾點：緩存

numShards：分片數量，賦值爲cfg.MinShards，即Prometheus remote_write的配置參數min_shards的值；至關於遠程寫啓動時採用min_shards配置的數量，做爲使用分片的默認值；
reshardChan：這裏聲明瞭一個int類型的channel，且無緩衝區；上面提到的隊列和shard，實際上是依託於golang channel來實現的；咱們知道channel從根本上來講，只是一個數據結構，能夠被寫入數據，也能夠被讀取數據；所謂發送數據到channel，或者從channel讀取數據，說白了就是對一個數據結構的操做，僅此而已；

觸發reshard條件

文章開頭的日誌信息，咱們看到提示是"skipping resharding"，即跳過了reshard動做；咱們不由要發出三連問：reshard是什麼(what)？爲何須要reshard(why)？怎麼樣觸發reshard(how)？bash

下面的代碼解釋了：什麼狀況下resharding動做應該發生；return true時，表明應該發生reshard動做；數據結構

// shouldReshard returns if resharding should occur
func (t *QueueManager) shouldReshard(desiredShards int) bool {
    if desiredShards == t.numShards {
        return false
    }
    // We shouldn't reshard if Prometheus hasn't been able to send to the
    // remote endpoint successfully within some period of time.
    minSendTimestamp := time.Now().Add(-2 * time.Duration(t.cfg.BatchSendDeadline)).Unix()
    lsts := atomic.LoadInt64(&t.lastSendTimestamp)
    if lsts < minSendTimestamp {
        level.Warn(t.logger).Log("msg", "Skipping resharding, last successful send was beyond threshold", "lastSendTimestamp", lsts, "minSendTimestamp", minSendTimestamp)
        return false
    }
    return true
}
複製代碼

當須要的分片數和numShards相等時，不觸發reshard動做
最小發送數據時間戳 = 當前時間戳 - 2 * BatchSendDeadline
lsts即最近一次發送數據的時間戳
當lsts小於最小發送時間戳時，記錄日誌，不觸發reshard動做；
不知足上述1和4條件時，觸發reshard動做

從這裏咱們終於找到文章開頭處日誌信息的出處，原來是由於「最近一次發送數據的時間戳」小於「最小發送數據時間戳」，也即跟BatchSendDeadline的配置有關；函數

Prometheus的遠程寫

在理解reshard以前，咱們先要了解shard的概念。這就說到了Prometheus的Remote Write。

每一個遠程寫目的地都啓動一個隊列，該隊列從write-ahead log (WAL)中讀取數據，將樣本寫到一個由shard(即分片)擁有的內存隊列中，而後分片將請求發送到配置的端點。數據流程以下：

|-->  queue (shard_1)   --> remote endpoint
WAL --|-->  queue (shard_...) --> remote endpoint
      |-->  queue (shard_n)   --> remote endpoint
複製代碼

當一個分片備份並填滿它的隊列時，Prometheus將阻止從WAL中讀取任何分片。若是失敗了，則進行重試，其間不會丟失數據，除非遠程端點保持關閉狀態超過2小時。2小時後，WAL將被壓縮，未發送的數據將丟失。在遠程寫過程當中，Prometheus將根據輸入採樣速率、未發送的採樣數量和發送每一個採樣數據所需的時間，不斷計算出最優的分片數量(即上面提到的numShards)。