Prometheus學習系列（二）之Prometheus FIRST STEPS

時間 2019-11-26

標籤 prometheus 學習系列 steps 简体版

原文原文鏈接

前言

說明

Prometheus是一個監控平臺，經過在監控目標上的HTTP端點來收集受監控目標的指標。本指南將向您展現如何使用Prometheus安裝，配置和監控咱們的第一個資源。您將下載，安裝並運行Prometheus。您還將下載並安裝exporter，這些工具可在主機和服務上公開時間序列數據。咱們的第一個exporter將是Prometheus自己，它提供了有關內存使用，垃圾收集等的各類主機級指標。

下載 Prometheus

在你的平臺上下載最新的版本，而後解壓它：web

tar xvfz prometheus-*.tar.gz
cd prometheus-*

Prometheus服務器是一個稱爲prometheus（或Microsoft Windows上的prometheus.exe）的二進制文件。咱們能夠運行二進制文件，並經過傳遞--help標誌來查看有關其選項的幫助。正則表達式

/usr/local/bin# ./prometheus --help
usage: prometheus [<flags>]

The Prometheus monitoring server

Flags:
#幫助
-h, --help Show context-sensitive help (also try --help-long and --help-man).
#版本
--version Show application version.
#配置文件
--config.file="prometheus.yml"
Prometheus configuration file path.
#監聽端口
--web.listen-address="0.0.0.0:9090"
Address to listen on for UI, API, and telemetry.
#空閒鏈接的超時時間
--web.read-timeout=5m Maximum duration before timing out read of the request, and closing idle connections.
#最大鏈接數
--web.max-connections=512 Maximum number of simultaneous connections.

#可從外部訪問Prometheus的URL（例如，若是Prometheus是經過反向代理提供的）。 用於生成返回到Prometheus自己的相對和絕對連接。 若是URL包含路徑部分，它將被用做Prometheus服務的全部HTTP端點的前綴。 若是省略，則會自動派生相關的URL組件。
--web.external-url=<URL> The URL under which Prometheus is externally reachable (for example, if Prometheus is served via a reverse proxy). Used for generating relative and absolute links back to
Prometheus itself. If the URL has a path portion, it will be used to prefix all HTTP endpoints served by Prometheus. If omitted, relevant URL components will be derived
automatically.
#內部路由的前綴。 默認爲--web.external-url的路徑。
--web.route-prefix=<path> Prefix for the internal routes of web endpoints. Defaults to path of --web.external-url.
#靜態資源目錄的路徑，位於/ user
--web.user-assets=<path> Path to static asset directory, available at /user.
#啓用關機並經過HTTP請求從新加載
--web.enable-lifecycle Enable shutdown and reload via HTTP request.
#管理控制操做啓用API端點
--web.enable-admin-api Enable API endpoints for admin control actions.
#模板目錄的路徑，位於/consoles
--web.console.templates="consoles"
Path to the console template directory, available at /consoles.
#控制檯庫目錄的路徑
--web.console.libraries="console_libraries"
Path to the console library directory.
#Prometheus實例頁面的文檔標題
--web.page-title="Prometheus Time Series Collection and Processing Server"
Document title of Prometheus instance.
#用於CORS來源的正則表達式。
--web.cors.origin=".*" Regex for CORS origin. It is fully anchored. Example: 'https?://(domain1|domain2)\.com'
#指標(數據）存儲的基本路徑
--storage.tsdb.path="data/"
Base path for metrics storage.
#將數據保留多長時間。 此標誌已被棄用，請改用「 storage.tsdb.retention.time」。
--storage.tsdb.retention=STORAGE.TSDB.RETENTION
[DEPRECATED] How long to retain samples in storage. This flag has been deprecated, use "storage.tsdb.retention.time" instead.
#將數據保留多長時間。默認15天
--storage.tsdb.retention.time=STORAGE.TSDB.RETENTION.TIME
How long to retain samples in storage. When this flag is set it overrides "storage.tsdb.retention". If neither this flag nor "storage.tsdb.retention" nor
"storage.tsdb.retention.size" is set, the retention time defaults to 15d.
#能夠爲塊存儲的最大字節數。 支持的單位：KB，MB，GB，TB，PB。
--storage.tsdb.retention.size=STORAGE.TSDB.RETENTION.SIZE
[EXPERIMENTAL] Maximum number of bytes that can be stored for blocks. Units supported: KB, MB, GB, TB, PB. This flag is experimental and can be changed in future releases.
#不在數據目錄中建立鎖文件
--storage.tsdb.no-lockfile
Do not create lockfile in data directory.
#容許重疊的塊，從而啓用垂直壓縮和垂直查詢合併。
--storage.tsdb.allow-overlapping-blocks
[EXPERIMENTAL] Allow overlapping blocks, which in turn enables vertical compaction and vertical query merge.
#壓縮tsdb WAL
--storage.tsdb.wal-compression
Compress the tsdb WAL.
#關閉或配置從新加載時等待刷寫數據的時間
--storage.remote.flush-deadline=<duration>
How long to wait flushing sample on shutdown or config reload.
#在單個查詢中經過遠程讀取接口返回的最大樣本總數。 0表示沒有限制。 對於流式響應類型，將忽略此限制。
--storage.remote.read-sample-limit=5e7
Maximum overall number of samples to return via the remote read interface, in a single query. 0 means no limit. This limit is ignored for streamed response types.
#併發遠程讀取調用的最大數目。 0表示沒有限制。
--storage.remote.read-concurrent-limit=10
Maximum number of concurrent remote read calls. 0 means no limit.
#用於流式傳輸遠程讀取響應類型的單個幀中的最大字節數。 請注意，客戶端也可能會限制幀大小。 1MB爲默認狀況下由protobuf推薦

--storage.remote.read-max-bytes-in-frame=1048576
Maximum number of bytes in a single frame for streaming remote read response types before marshalling. Note that client might have limit on frame size as well. 1MB as
recommended by protobuf by default.
#容忍中斷以恢復警報「 for」狀態的最長時間。
--rules.alert.for-outage-tolerance=1h
Max time to tolerate prometheus outage for restoring "for" state of alert.
#警報和恢復的「 for」狀態之間的最短持續時間。 僅對於配置的「 for」時間大於寬限期的警報，才保持此狀態。
--rules.alert.for-grace-period=10m
Minimum duration between alert and restored "for" state. This is maintained only for alerts with configured "for" time greater than grace period.
#將警報從新發送到Alertmanager以前等待的最短期。
--rules.alert.resend-delay=1m
Minimum amount of time to wait before resending an alert to Alertmanager.
#等待的Alertmanager通知的隊列容量。
--alertmanager.notification-queue-capacity=10000
The capacity of the queue for pending Alertmanager notifications.
#向Alertmanager發送警報的超時。
--alertmanager.timeout=10s
Timeout for sending alerts to Alertmanager.
#在表達式求值期間檢索指標的最大回溯持續時間。
--query.lookback-delta=5m The maximum lookback duration for retrieving metrics during expression evaluations.
#最大查詢時間。
--query.timeout=2m Maximum time a query may take before being aborted.
#最大查詢併發數
--query.max-concurrency=20
Maximum number of queries executed concurrently.
#單個查詢能夠加載到內存中的最大樣本數。 請注意，若是查詢嘗試將更多的樣本加載到內存中，則查詢將失敗，所以這也限制了查詢能夠返回的樣本數。

      --query.max-samples=50000000  
                                 Maximum number of samples a single query can load into memory. Note that queries will fail if they try to load more samples than this into memory, so this also limits the
                                 number of samples a query can return.
#日誌級別
      --log.level=info           Only log messages with the given severity or above. One of: [debug, info, warn, error]
#日誌格式
      --log.format=logfmt        Output format of log messages. One of: [logfmt, json]

在啓動Prometheus以前，讓咱們對其進行配置。express

配置 Prometheus

Prometheus配置是YAML，Prometheus下載包裏附帶一個名爲prometheus.yml的文件中的示例配置，這是一個很好的入門之處。咱們刪除了示例文件中的大部分註釋，使其更簡潔（註釋是以＃爲前綴的行）。json

global:
  scrape_interval:     15s  #控制抓取頻率
  evaluation_interval: 15s  #控制評估規則評率

rule_files:
  # - "first.rules"
  # - "second.rules"

scrape_configs:
  - job_name: prometheus
    static_configs:
      - targets: ['localhost:9090']

示例配置文件中有三個配置塊：global，rule_files和scrape_configs。api

global：控制Prometheus服務器的全局配置。第一個是scrape_interval，它控制Prometheus抓取目標的頻率，也能夠爲單個目標重寫此值。在這種例子下，全局設置是每15s抓取一次。 evaluation_interval選項控制Prometheus評估規則的頻率。 Prometheus使用規則建立新的時間序列並生成警報。
rule_files：Prometheus服務器加載的任何規則的位置。如今咱們沒有規則。
scrape_configs：Prometheus監視的資源。因爲Prometheus還將本身的數據公開爲HTTP端點，所以它能夠抓取並監控自身的健康情況。在默認配置中，有一個名爲prometheus的做業，它會抓取Prometheus服務器公開的時間序列數據。包含一個靜態配置的目標，即端口9090上的localhost。Prometheus但願指標在/metrics路徑上的目標上可用。因此這個默認的工做是經過URL抓取：http//localhost:9090/metrics。返回的時間序列數據將詳細說明Prometheus服務器的狀態和性能。

有關配置選項的完整規範，請參閱配置文檔。瀏覽器

啓動 Prometheus

要使用咱們新建立的配置文件啓動Prometheus，請切換到包含Prometheus二進制文件的目錄並運行：服務器

./prometheus --config.file = prometheus.yml

Prometheus啓動後應該可以在http//localhost:9090瀏覽到狀態頁面，給它大約30秒的時間從本身的HTTP指標端點收集有關本身的數據。還能夠經過導航到其本身的指標端點來驗證是否正在提供有關自身的指標：http//localhost:9090/metrics。併發

使用表達式瀏覽器

讓咱們試着看一下Prometheus收集的關於本身的一些數據。要使用Prometheus的內置表達式瀏覽器，請導航到http//localhost:9090/graph並在「Graph」選項卡中選擇「Console」視圖。app

Prometheus導出的一個度量標準稱爲promhttp_metric_handler_requests_total（Prometheus服務器已服務的/metrics請求的總數）。繼續並將其輸入表達式控制檯：

promhttp_metric_handler_requests_total

這應該返回許多不一樣的時間序列（以及爲每一個記錄的最新值），全部時間序列都使用度量標準名稱promhttp_metric_handler_requests_total，但具備不一樣的標籤。這些標籤指定不一樣的請求狀態。

若是咱們只對致使HTTP代碼200的請求感興趣，咱們可使用此查詢來檢索該信息：

promhttp_metric_handler_requests_total{code="200"}

要計算返回的時間序列總數，您能夠寫：

count(promhttp_metric_handler_requests_total)

有關表達式語言的更多信息，請參閱表達式語言文檔。

適用圖表接口

要繪製表達式圖表，請導航到http//localhost:9090/graph graph並使用「圖表」選項卡。

例如，輸入如下表達式來繪製在自我抓取的Prometheus中發生的返回狀態代碼200的每秒HTTP請求率：

rate(promhttp_metric_handler_requests_total{code="200"}[1m])

您能夠嘗試使用圖形範圍參數和其餘設置。

監控其餘目標

僅從Prometheus那裏收集指標並不能很好地反映Prometheus的能力。爲了更好地瞭解Prometheus能夠作什麼，咱們建議您瀏覽有關其餘exporter的文檔。使用node exporter指南監控Linux或macOS主機指標是一個很好的起點。

總結

在本指南中，您安裝了Prometheus，配置了Prometheus實例來監視資源，並學習了在Prometheus表達式瀏覽器中處理時間序列數據的一些基礎知識。要繼續瞭解Prometheus，請查看概述，瞭解接下來要探索的內容。

相關標籤/搜索

prometheus+alertmanager

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。