prometheus在容器雲的領域實力毋庸置疑,愈來愈多的雲原生組件直接提供prometheus的metrics接口,無需額外的exporter。因此採用prometheus做爲整個集羣的監控方案是合適的。可是metrics的存儲這塊,prometheus提供了本地存儲,即tsdb時序數據庫。本地存儲的優點就是運維簡單,啓動prometheus只需一個命令,下面兩個啓動參數指定了數據路徑和保存時間。git
缺點就是沒法大量的metrics持久化。固然prometheus2.0之後壓縮數據能力獲得了很大的提高。
爲了解決單節點存儲的限制,prometheus沒有本身實現集羣存儲,而是提供了遠程讀寫的接口,讓用戶本身選擇合適的時序數據庫來實現prometheus的擴展性。
prometheus經過下面兩張方式來實現與其餘的遠端存儲系統對接github
下面我將重點剖析遠端存儲的方案正則表達式
遠程寫sql
# The URL of the endpoint to send samples to. url: <string> # Timeout for requests to the remote write endpoint. [ remote_timeout: <duration> | default = 30s ] # List of remote write relabel configurations. write_relabel_configs: [ - <relabel_config> ... ] # Sets the `Authorization` header on every remote write request with the # configured username and password. # password and password_file are mutually exclusive. basic_auth: [ username: <string> ] [ password: <string> ] [ password_file: <string> ] # Sets the `Authorization` header on every remote write request with # the configured bearer token. It is mutually exclusive with `bearer_token_file`. [ bearer_token: <string> ] # Sets the `Authorization` header on every remote write request with the bearer token # read from the configured file. It is mutually exclusive with `bearer_token`. [ bearer_token_file: /path/to/bearer/token/file ] # Configures the remote write request's TLS settings. tls_config: [ <tls_config> ] # Optional proxy URL. [ proxy_url: <string> ] # Configures the queue used to write to remote storage. queue_config: # Number of samples to buffer per shard before we start dropping them. [ capacity: <int> | default = 100000 ] # Maximum number of shards, i.e. amount of concurrency. [ max_shards: <int> | default = 1000 ] # Maximum number of samples per send. [ max_samples_per_send: <int> | default = 100] # Maximum time a sample will wait in buffer. [ batch_send_deadline: <duration> | default = 5s ] # Maximum number of times to retry a batch on recoverable errors. [ max_retries: <int> | default = 10 ] # Initial retry delay. Gets doubled for every retry. [ min_backoff: <duration> | default = 30ms ] # Maximum retry delay. [ max_backoff: <duration> | default = 100ms ]
遠程讀數據庫
# The URL of the endpoint to query from. url: <string> # An optional list of equality matchers which have to be # present in a selector to query the remote read endpoint. required_matchers: [ <labelname>: <labelvalue> ... ] # Timeout for requests to the remote read endpoint. [ remote_timeout: <duration> | default = 1m ] # Whether reads should be made for queries for time ranges that # the local storage should have complete data for. [ read_recent: <boolean> | default = false ] # Sets the `Authorization` header on every remote read request with the # configured username and password. # password and password_file are mutually exclusive. basic_auth: [ username: <string> ] [ password: <string> ] [ password_file: <string> ] # Sets the `Authorization` header on every remote read request with # the configured bearer token. It is mutually exclusive with `bearer_token_file`. [ bearer_token: <string> ] # Sets the `Authorization` header on every remote read request with the bearer token # read from the configured file. It is mutually exclusive with `bearer_token`. [ bearer_token_file: /path/to/bearer/token/file ] # Configures the remote read request's TLS settings. tls_config: [ <tls_config> ] # Optional proxy URL. [ proxy_url: <string> ]
PS架構
例如:選擇指定的metrics。運維
remote_write: - url: "http://prometheus-remote-storage-adapter-svc:9201/write" write_relabel_configs: - action: keep source_labels: [__name__] regex: container_network_receive_bytes_total|container_network_receive_packets_dropped_total
global: scrape_interval: 20s # The labels to add to any time series or alerts when communicating with # external systems (federation, remote storage, Alertmanager). external_labels: cid: '9'
如今社區已經實現瞭如下的遠程存儲方案post
上面有些存儲是隻支持寫的。其實研讀源碼,可否支持遠程讀,
取決於該存儲是否支持正則表達式的查詢匹配。具體實現下一節,將會解讀一下prometheus-postgresql-adapter和如何實現一個本身的adapter。
同時支持遠程讀寫的性能
其實若是收集的metrics用於數據分析,能夠考慮clickhouse數據庫,集羣方案和寫入性能以及支持遠程讀寫。這塊正在研究中。待有了必定成果之後再專門寫一篇文章解讀。目前咱們的持久化方案准備用TimescaleDB。ui