k8s與監控--prometheus的遠端存儲

prometheus的遠端存儲

前言

prometheus在容器雲的領域實力毋庸置疑,愈來愈多的雲原生組件直接提供prometheus的metrics接口,無需額外的exporter。因此採用prometheus做爲整個集羣的監控方案是合適的。可是metrics的存儲這塊,prometheus提供了本地存儲,即tsdb時序數據庫。本地存儲的優點就是運維簡單,啓動prometheus只需一個命令,下面兩個啓動參數指定了數據路徑和保存時間。git

  • storage.tsdb.path: tsdb數據庫路徑,默認 data/
  • storage.tsdb.retention: 數據保留時間,默認15天

缺點就是沒法大量的metrics持久化。固然prometheus2.0之後壓縮數據能力獲得了很大的提高。
爲了解決單節點存儲的限制,prometheus沒有本身實現集羣存儲,而是提供了遠程讀寫的接口,讓用戶本身選擇合適的時序數據庫來實現prometheus的擴展性。
prometheus經過下面兩張方式來實現與其餘的遠端存儲系統對接github

  • Prometheus 按照標準的格式將metrics寫到遠端存儲
  • prometheus 按照標準格式從遠端的url來讀取metrics

圖片描述
下面我將重點剖析遠端存儲的方案正則表達式

遠端存儲方案

配置文件

遠程寫sql

# The URL of the endpoint to send samples to.
url: <string>

# Timeout for requests to the remote write endpoint.
[ remote_timeout: <duration> | default = 30s ]

# List of remote write relabel configurations.
write_relabel_configs:
  [ - <relabel_config> ... ]

# Sets the `Authorization` header on every remote write request with the
# configured username and password.
# password and password_file are mutually exclusive.
basic_auth:
  [ username: <string> ]
  [ password: <string> ]
  [ password_file: <string> ]

# Sets the `Authorization` header on every remote write request with
# the configured bearer token. It is mutually exclusive with `bearer_token_file`.
[ bearer_token: <string> ]

# Sets the `Authorization` header on every remote write request with the bearer token
# read from the configured file. It is mutually exclusive with `bearer_token`.
[ bearer_token_file: /path/to/bearer/token/file ]

# Configures the remote write request's TLS settings.
tls_config:
  [ <tls_config> ]

# Optional proxy URL.
[ proxy_url: <string> ]

# Configures the queue used to write to remote storage.
queue_config:
  # Number of samples to buffer per shard before we start dropping them.
  [ capacity: <int> | default = 100000 ]
  # Maximum number of shards, i.e. amount of concurrency.
  [ max_shards: <int> | default = 1000 ]
  # Maximum number of samples per send.
  [ max_samples_per_send: <int> | default = 100]
  # Maximum time a sample will wait in buffer.
  [ batch_send_deadline: <duration> | default = 5s ]
  # Maximum number of times to retry a batch on recoverable errors.
  [ max_retries: <int> | default = 10 ]
  # Initial retry delay. Gets doubled for every retry.
  [ min_backoff: <duration> | default = 30ms ]
  # Maximum retry delay.
  [ max_backoff: <duration> | default = 100ms ]

遠程讀數據庫

# The URL of the endpoint to query from.
url: <string>

# An optional list of equality matchers which have to be
# present in a selector to query the remote read endpoint.
required_matchers:
  [ <labelname>: <labelvalue> ... ]

# Timeout for requests to the remote read endpoint.
[ remote_timeout: <duration> | default = 1m ]

# Whether reads should be made for queries for time ranges that
# the local storage should have complete data for.
[ read_recent: <boolean> | default = false ]

# Sets the `Authorization` header on every remote read request with the
# configured username and password.
# password and password_file are mutually exclusive.
basic_auth:
  [ username: <string> ]
  [ password: <string> ]
  [ password_file: <string> ]

# Sets the `Authorization` header on every remote read request with
# the configured bearer token. It is mutually exclusive with `bearer_token_file`.
[ bearer_token: <string> ]

# Sets the `Authorization` header on every remote read request with the bearer token
# read from the configured file. It is mutually exclusive with `bearer_token`.
[ bearer_token_file: /path/to/bearer/token/file ]

# Configures the remote read request's TLS settings.
tls_config:
  [ <tls_config> ]

# Optional proxy URL.
[ proxy_url: <string> ]

PS架構

  • 遠程寫配置中的write_relabel_configs 該配置項,充分利用了prometheus強大的relabel的功能。能夠過濾須要寫到遠端存儲的metrics。

例如:選擇指定的metrics。運維

remote_write:
      - url: "http://prometheus-remote-storage-adapter-svc:9201/write"
        write_relabel_configs:
        - action: keep
          source_labels: [__name__]
          regex: container_network_receive_bytes_total|container_network_receive_packets_dropped_total
  • global配置中external_labels,在prometheus的聯邦和遠程讀寫的能夠考慮設置該配置項,從而區分各個集羣。
global:
      scrape_interval: 20s
      # The labels to add to any time series or alerts when communicating with
      # external systems (federation, remote storage, Alertmanager).
      external_labels:
        cid: '9'

已有的遠端存儲的方案

如今社區已經實現瞭如下的遠程存儲方案post

  • AppOptics: write
  • Chronix: write
  • Cortex: read and write
  • CrateDB: read and write
  • Elasticsearch: write
  • Gnocchi: write
  • Graphite: write
  • InfluxDB: read and write
  • OpenTSDB: write
  • PostgreSQL/TimescaleDB: read and write
  • SignalFx: write

上面有些存儲是隻支持寫的。其實研讀源碼,可否支持遠程讀,
取決於該存儲是否支持正則表達式的查詢匹配。具體實現下一節,將會解讀一下prometheus-postgresql-adapter和如何實現一個本身的adapter。
同時支持遠程讀寫的性能

  • Cortex來源於weave公司,整個架構對prometheus作了上層的封裝,用到了不少組件。稍微複雜。
  • InfluxDB 開源版不支持集羣。對於metrics量比較大的,寫入壓力大,而後influxdb-relay方案並非真正的高可用。固然餓了麼開源了influxdb-proxy,有興趣的能夠嘗試一下。
  • CrateDB 基於es。具體瞭解很少
  • TimescaleDB 我的比較中意該方案。傳統運維對pgsql熟悉度高,運維靠譜。目前支持 streaming replication方案支持高可用。

後記

其實若是收集的metrics用於數據分析,能夠考慮clickhouse數據庫,集羣方案和寫入性能以及支持遠程讀寫。這塊正在研究中。待有了必定成果之後再專門寫一篇文章解讀。目前咱們的持久化方案准備用TimescaleDB。ui

相關文章
相關標籤/搜索