轉載自:https://songjiayang.gitbooks.io/prometheus/content/pushgateway/why.htmlhtml
v0.1.0node
在過去一年左右時間裏,咱們使用 Prometheus 完成了對幾個機房的基礎和業務監控,大大提升了服務質量以及 oncall 水平,在此特別感謝 Promethues 這樣優秀的開源軟件。mysql
當初選擇 Prometheus 並非偶然,由於:linux
Prometheus 是按照 Google SRE 運維之道的理念構建的,具備實用性和前瞻性。ios
Prometheus 社區很是活躍,基本穩定在 1個月1個版本的迭代速度,從 2016 年 v1.01 開始接觸使用以來,到目前發佈的 v1.8.2 以及最新最新的 v2.1 ,你會發現 Prometheus 一直在進步、在優化。git
Go 語言開發,性能不錯,安裝部署簡單,多平臺部署兼容性好。github
豐富的數據收集客戶端,官方提供了各類經常使用 exporter。golang
豐富強大的查詢能力。web
Prometheus 做爲監控後起之秀,雖然還有作的不夠好的地方,可是不妨礙咱們使用和喜好它。根據咱們長期的使用經驗來看,它足以知足大多數場景需求,只不過對於新東西,每每須要花費更多力氣才能發揮它的最大能力而已。正則表達式
本書主要根據我的過去一年多的使用經驗總結而成,內容主要包括 Prometheus 基本知識、進階、實戰以及常見問題列表等方面,但願對你們有所幫助。
本開源書籍既適用於具有基礎 Linux 知識的運維初學者,也可供渴望理解 Prometheus 原理和實現細節的高級用戶參考,同時也但願書中給出的實踐案例在實際部署監控中對你們有所幫助。
你準備好了嗎?接下來就讓咱們一塊兒開始這段神奇旅行吧!
Prometheus 是由 SoundCloud 開源監控告警解決方案,從 2012 年開始編寫代碼,再到 2015 年 github 上開源以來,已經吸引了 9k+ 關注,以及不少大公司的使用;2016 年 Prometheus 成爲繼 k8s 後,第二名 CNCF(Cloud Native Computing Foundation) 成員。
做爲新一代開源解決方案,不少理念與 Google SRE 運維之道不謀而合。
一圖勝千言,先來張官方的架構圖
從這個架構圖,也能夠看出 Prometheus 的主要模塊包含, Server, Exporters, Pushgateway, PromQL, Alertmanager, WebUI 等。
它大體使用邏輯是這樣:
在前言中,簡單介紹了咱們選擇 Prometheus 的理由,以及使用後給咱們帶來的好處。
在這裏主要和其餘監控方案對比,方便你們更好的瞭解 Prometheus。
本章將介紹 Prometheus 兩種安裝方式: 傳統二進制包安裝和 Docker 安裝方式。
咱們能夠到 Prometheus 二進制安裝包下載頁面,根據本身的操做系統選擇下載對應的安裝包。下面咱們將以 ubuntu server 做爲演示。
建立下載目錄,以便安裝事後清理掉
mkdir ~/Download cd ~/Download
使用 wget 下載 Prometheus 的安裝包
wget https://github.com/prometheus/prometheus/releases/download/v1.6.2/prometheus-1.6.2.linux-amd64.tar.gz
建立 Prometheus 目錄,用於存放全部 Prometheus 相關的運行服務
mkdir ~/Prometheus cd ~/Prometheus
使用 tar 解壓縮 prometheus-1.6.2.linux-amd64.tar.gz
tar -xvzf ~/Download/prometheus-1.6.2.linux-amd64.tar.gz cd prometheus-1.6.2.linux-amd64
當解壓縮成功後,能夠運行 version 檢查運行環境是否正常
./prometheus version
若是你看到相似輸出,表示你已安裝成功:
prometheus, version 1.6.2 (branch: master, revision: xxxx) build user: xxxx build date: xxxx go version: go1.8.1
./prometheus
若是 prometheus 正常啓動,你將看到以下信息:
INFO[0000] Starting prometheus (version=1.6.2, branch=master, revision=b38e977fd8cc2a0d13f47e7f0e17b82d1a908a9a) source=main.go:88 INFO[0000] Build context (go=go1.8.1, user=root@c99d9d650cf4, date=20170511-13:03:00) source=main.go:89 INFO[0000] Loading configuration file prometheus.yml source=main.go:251 INFO[0000] Loading series map and head chunks... source=storage.go:421 INFO[0000] 0 series loaded. source=storage.go:432 INFO[0000] Starting target manager... source=targetmanager.go:61 INFO[0000] Listening on :9090 source=web.go:259
經過啓動日誌,能夠看到 Prometheus Server 默認端口是 9090。
當 Prometheus 啓動後,你能夠經過瀏覽器來訪問 http://IP:9090
,將看到以下頁面
在默認配置中,咱們已經添加了 Prometheus Server 的監控,因此咱們如今可使用 PromQL
(Prometheus Query Language)來查看,好比:
首先確保你已安裝了最新版本的 Docker, 若是沒有安裝請點擊這裏。
下面我將以 Mac 版本的 Docker 做爲演示。
Docker 鏡像地址 Quay.io
執行命令安裝:
$ docker run --name prometheus -d -p 127.0.0.1:9090:9090 quay.io/prometheus/prometheus
若是安裝成功你能夠訪問 127.0.0.1:9090
查看到該頁面:
運行 docker ps 查看全部服務:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES e9ebc2435387 quay.io/prometheus/prometheus "/bin/prometheus -..." 26 minutes ago Up 26 minutes 127.0.0.1:9090->9090/tcp prometheus
運行 docker start prometheus
啓動服務
運行 docker stats prometheus
查看 prometheus 狀態
運行 docker stop prometheus
中止服務
本章將介紹 Prometheus 一些基礎概念,包括:
Prometheus 存儲的是時序數據, 即按照相同時序(相同的名字和標籤),以時間維度存儲連續的數據的集合。
時序(time series) 是由名字(Metric),以及一組 key/value 標籤訂義的,具備相同的名字以及標籤屬於相同時序。
時序的名字由 ASCII 字符,數字,下劃線,以及冒號組成,它必須知足正則表達式 [a-zA-Z_:][a-zA-Z0-9_:]*
, 其名字應該具備語義化,通常表示一個能夠度量的指標,例如: http_requests_total
, 能夠表示 http 請求的總數。
時序的標籤可使 Prometheus 的數據更加豐富,可以區分具體不一樣的實例,例如 http_requests_total{method="POST"}
能夠表示全部 http 中的 POST 請求。
標籤名稱由 ASCII 字符,數字,以及下劃線組成, 其中 __
開頭屬於 Prometheus 保留,標籤的值能夠是任何 Unicode 字符,支持中文。
按照某個時序以時間維度採集的數據,稱之爲樣本,其值包含:
Prometheus 時序格式與 OpenTSDB 類似:
<metric name>{<label name>=<label value>, ...}
其中包含時序名字以及時序的標籤。
Prometheus 時序數據分爲 Counter, Gauge, Histogram, Summary 四種類型。
Counter 表示收集的數據是按照某個趨勢(增長/減小)一直變化的,咱們每每用它記錄服務請求總量、錯誤總數等。
例如 Prometheus server 中 http_requests_total
, 表示 Prometheus 處理的 http 請求總數,咱們可使用 delta
, 很容易獲得任意區間數據的增量,這個會在 PromQL 一節中細講。
Gauge 表示蒐集的數據是一個瞬時的值,與時間沒有關係,能夠任意變高變低,每每能夠用來記錄內存使用率、磁盤使用率等。
例如 Prometheus server 中 go_goroutines
, 表示 Prometheus 當前 goroutines 的數量。
Histogram 由 <basename>_bucket{le="<upper inclusive bound>"}
,<basename>_bucket{le="+Inf"}
, <basename>_sum
,<basename>_count
組成,主要用於表示一段時間範圍內對數據進行採樣(一般是請求持續時間或響應大小),並可以對其指定區間以及總數進行統計,一般它採集的數據展現爲直方圖。
例如 Prometheus server 中 prometheus_local_storage_series_chunks_persisted
, 表示 Prometheus 中每一個時序須要存儲的 chunks 數量,咱們能夠用它計算待持久化的數據的分位數。
Summary 和 Histogram 相似,由 <basename>{quantile="<φ>"}
,<basename>_sum
,<basename>_count
組成,主要用於表示一段時間內數據採樣結果(一般是請求持續時間或響應大小),它直接存儲了 quantile 數據,而不是根據統計區間計算出來的。
例如 Prometheus server 中 prometheus_target_interval_length_seconds
。
<basename>_sum
,<basename>_count
<basename>_bucket
計算 quantile, 而 Summary 直接存儲了 quantile 的值。Prometheus 中,將任意一個獨立的數據源(target)稱之爲實例(instance)。包含相同類型的實例的集合稱之爲做業(job)。 以下是一個含有四個重複實例的做業:
- job: api-server - instance 1: 1.2.3.4:5670 - instance 2: 1.2.3.4:5671 - instance 3: 5.6.7.8:5670 - instance 4: 5.6.7.8:5671
Prometheus 在採集數據的同時,會自動在時序的基礎上添加標籤,做爲數據源(target)的標識,以便區分:
job: The configured job name that the target belongs to. instance: The <host>:<port> part of the target's URL that was scraped.
若是其中任一標籤已經在此前採集的數據中存在,那麼將會根據 honor_labels
設置選項來決定新標籤。詳見官網解釋: scrape configuration documentation
對每個實例而言,Prometheus 按照如下時序來存儲所採集的數據樣本:
up{job="<job-name>", instance="<instance-id>"}: 1 表示該實例正常工做 up{job="<job-name>", instance="<instance-id>"}: 0 表示該實例故障 scrape_duration_seconds{job="<job-name>", instance="<instance-id>"} 表示拉取數據的時間間隔 scrape_samples_post_metric_relabeling{job="<job-name>", instance="<instance-id>"} 表示採用重定義標籤(relabeling)操做後仍然剩餘的樣本數 scrape_samples_scraped{job="<job-name>", instance="<instance-id>"} 表示從該數據源獲取的樣本數
其中 up
時序能夠有效應用於監控該實例是否正常工做。
本章將介紹 PromQL 基本用法、示例,並將其與 SQL 進行比較。
PromQL (Prometheus Query Language) 是 Prometheus 本身開發的數據查詢 DSL 語言,語言表現力很是豐富,內置函數不少,在平常數據可視化以及rule 告警中都會使用到它。
在頁面 http://localhost:9090/graph
中,輸入下面的查詢語句,查看結果,例如:
http_requests_total{code="200"}
字符串: 在查詢語句中,字符串每每做爲查詢條件 labels 的值,和 Golang 字符串語法一致,可使用 ""
, ''
, 或者 ``
, 格式如:
"this is a string" 'these are unescaped: \n \\ \t' `these are not unescaped: \n ' " \t`
正數,浮點數: 表達式中可使用正數或浮點數,例如:
3 -2.4
PromQL 查詢結果主要有 3 種類型:
http_requests_total
http_requests_total[5m]
count(http_requests_total)
Prometheus 存儲的是時序數據,而它的時序是由名字和一組標籤構成的,其實名字也能夠寫出標籤的形式,例如 http_requests_total
等價於 {name="http_requests_total"}。
一個簡單的查詢至關因而對各類標籤的篩選,例如:
http_requests_total{code="200"} // 表示查詢名字爲 http_requests_total,code 爲 "200" 的數據
查詢條件支持正則匹配,例如:
http_requests_total{code!="200"} // 表示查詢 code 不爲 "200" 的數據 http_requests_total{code=~"2.."} // 表示查詢 code 爲 "2xx" 的數據 http_requests_total{code!~"2.."} // 表示查詢 code 不爲 "2xx" 的數據
Prometheus 查詢語句中,支持常見的各類表達式操做符,例如
算術運算符:
支持的算術運算符有 +,-,*,/,%,^
, 例如 http_requests_total * 2
表示將 http_requests_total 全部數據 double 一倍。
比較運算符:
支持的比較運算符有 ==,!=,>,<,>=,<=
, 例如 http_requests_total > 100
表示 http_requests_total 結果中大於 100 的數據。
邏輯運算符:
支持的邏輯運算符有 and,or,unless
, 例如 http_requests_total == 5 or http_requests_total == 2
表示 http_requests_total 結果中等於 5 或者 2 的數據。
聚合運算符:
支持的聚合運算符有 sum,min,max,avg,stddev,stdvar,count,count_values,bottomk,topk,quantile,
, 例如 max(http_requests_total)
表示 http_requests_total 結果中最大的數據。
注意,和四則運算類型,Prometheus 的運算符也有優先級,它們聽從(^)> (*, /, %) > (+, -) > (==, !=, <=, <, >=, >) > (and, unless) > (or) 的原則。
Prometheus 內置很多函數,方便查詢以及數據格式化,例如將結果由浮點數轉爲整數的 floor 和 ceil,
floor(avg(http_requests_total{code="200"})) ceil(avg(http_requests_total{code="200"}))
查看 http_requests_total 5分鐘內,平均每秒數據
rate(http_requests_total[5m])
更多請參見詳情。
下面將以 Prometheus server 收集的 http_requests_total
時序數據爲例子展開對比。
mysql> # 建立數據庫 create database prometheus_practice; use prometheus_practice; # 建立 http_requests_total 表 CREATE TABLE http_requests_total ( code VARCHAR(256), handler VARCHAR(256), instance VARCHAR(256), job VARCHAR(256), method VARCHAR(256), created_at DOUBLE NOT NULL, value DOUBLE NOT NULL) ENGINE=InnoDB DEFAULT CHARSET=utf8; ALTER TABLE http_requests_total ADD INDEX created_at_index (created_at); # 初始化數據 # time at 2017/5/22 14:45:27 INSERT INTO http_requests_total (code, handler, instance, job, method, created_at, value) values ("200", "query_range", "localhost:9090", "prometheus", "get", 1495435527, 3); INSERT INTO http_requests_total (code, handler, instance, job, method, created_at, value) values ("400", "query_range", "localhost:9090", "prometheus", "get", 1495435527, 5); INSERT INTO http_requests_total (code, handler, instance, job, method, created_at, value) values ("200", "prometheus", "localhost:9090", "prometheus", "get", 1495435527, 6418); INSERT INTO http_requests_total (code, handler, instance, job, method, created_at, value) values ("200", "static", "localhost:9090", "prometheus", "get", 1495435527, 9); INSERT INTO http_requests_total (code, handler, instance, job, method, created_at, value) values ("304", "static", "localhost:9090", "prometheus", "get", 1495435527, 19); INSERT INTO http_requests_total (code, handler, instance, job, method, created_at, value) values ("200", "query", "localhost:9090", "prometheus", "get", 1495435527, 87); INSERT INTO http_requests_total (code, handler, instance, job, method, created_at, value) values ("400", "query", "localhost:9090", "prometheus", "get", 1495435527, 26); INSERT INTO http_requests_total (code, handler, instance, job, method, created_at, value) values ("200", "graph", "localhost:9090", "prometheus", "get", 1495435527, 7); INSERT INTO http_requests_total (code, handler, instance, job, method, created_at, value) values ("200", "label_values", "localhost:9090", "prometheus", "get", 1495435527, 7); # time at 2017/5/22 14:48:27 INSERT INTO http_requests_total (code, handler, instance, job, method, created_at, value) values ("200", "query_range", "localhost:9090", "prometheus", "get", 1495435707, 3); INSERT INTO http_requests_total (code, handler, instance, job, method, created_at, value) values ("400", "query_range", "localhost:9090", "prometheus", "get", 1495435707, 5); INSERT INTO http_requests_total (code, handler, instance, job, method, created_at, value) values ("200", "prometheus", "localhost:9090", "prometheus", "get", 1495435707, 6418); INSERT INTO http_requests_total (code, handler, instance, job, method, created_at, value) values ("200", "static", "localhost:9090", "prometheus", "get", 1495435707, 9); INSERT INTO http_requests_total (code, handler, instance, job, method, created_at, value) values ("304", "static", "localhost:9090", "prometheus", "get", 1495435707, 19); INSERT INTO http_requests_total (code, handler, instance, job, method, created_at, value) values ("200", "query", "localhost:9090", "prometheus", "get", 1495435707, 87); INSERT INTO http_requests_total (code, handler, instance, job, method, created_at, value) values ("400", "query", "localhost:9090", "prometheus", "get", 1495435707, 26); INSERT INTO http_requests_total (code, handler, instance, job, method, created_at, value) values ("200", "graph", "localhost:9090", "prometheus", "get", 1495435707, 7); INSERT INTO http_requests_total (code, handler, instance, job, method, created_at, value) values ("200", "label_values", "localhost:9090", "prometheus", "get", 1495435707, 7);
數據初始完成後,經過查詢能夠看到以下數據:
mysql> mysql> select * from http_requests_total; +------+--------------+----------------+------------+--------+------------+-------+ | code | handler | instance | job | method | created_at | value | +------+--------------+----------------+------------+--------+------------+-------+ | 200 | query_range | localhost:9090 | prometheus | get | 1495435527 | 3 | | 400 | query_range | localhost:9090 | prometheus | get | 1495435527 | 5 | | 200 | prometheus | localhost:9090 | prometheus | get | 1495435527 | 6418 | | 200 | static | localhost:9090 | prometheus | get | 1495435527 | 9 | | 304 | static | localhost:9090 | prometheus | get | 1495435527 | 19 | | 200 | query | localhost:9090 | prometheus | get | 1495435527 | 87 | | 400 | query | localhost:9090 | prometheus | get | 1495435527 | 26 | | 200 | graph | localhost:9090 | prometheus | get | 1495435527 | 7 | | 200 | label_values | localhost:9090 | prometheus | get | 1495435527 | 7 | | 200 | query_range | localhost:9090 | prometheus | get | 1495435707 | 3 | | 400 | query_range | localhost:9090 | prometheus | get | 1495435707 | 5 | | 200 | prometheus | localhost:9090 | prometheus | get | 1495435707 | 6418 | | 200 | static | localhost:9090 | prometheus | get | 1495435707 | 9 | | 304 | static | localhost:9090 | prometheus | get | 1495435707 | 19 | | 200 | query | localhost:9090 | prometheus | get | 1495435707 | 87 | | 400 | query | localhost:9090 | prometheus | get | 1495435707 | 26 | | 200 | graph | localhost:9090 | prometheus | get | 1495435707 | 7 | | 200 | label_values | localhost:9090 | prometheus | get | 1495435707 | 7 | +------+--------------+----------------+------------+--------+------------+-------+ 18 rows in set (0.00 sec)
假設當前時間爲 2017/5/22 14:48:30
// PromQL http_requests_total // MySQL SELECT * from http_requests_total WHERE created_at BETWEEN 1495435700 AND 1495435710;
咱們查詢 MySQL 數據的時候,須要將當前時間向前推必定間隔,好比這裏的 10s (Prometheus 數據抓取間隔),這樣才能確保查詢到數據,而 PromQL 自動幫咱們實現了這個邏輯。
// PromQL http_requests_total{code="200", handler="query"} // MySQL SELECT * from http_requests_total WHERE code="200" AND handler="query" AND created_at BETWEEN 1495435700 AND 1495435710;
// PromQL http_requests_total{code~="2xx"} // MySQL SELECT * from http_requests_total WHERE code LIKE "%2%" AND created_at BETWEEN 1495435700 AND 1495435710;
// PromQL http_requests_total > 100 // MySQL SELECT * from http_requests_total WHERE value > 100 AND created_at BETWEEN 1495435700 AND 1495435710;
// PromQL http_requests_total[5m] // MySQL SELECT * from http_requests_total WHERE created_at BETWEEN 1495435410 AND 1495435710;
// PromQL count(http_requests_total) // MySQL SELECT COUNT(*) from http_requests_total WHERE created_at BETWEEN 1495435700 AND 1495435710;
// PromQL sum(http_requests_total) // MySQL SELECT SUM(value) from http_requests_total WHERE created_at BETWEEN 1495435700 AND 1495435710;
// PromQL avg(http_requests_total) // MySQL SELECT AVG(value) from http_requests_total WHERE created_at BETWEEN 1495435700 AND 1495435710;
// PromQL topk(3, http_requests_total) // MySQL SELECT * from http_requests_total WHERE created_at BETWEEN 1495435700 AND 1495435710 ORDER BY value DESC LIMIT 3;
// PromQL irate(http_requests_total[5m]) // MySQL SELECT code, handler, instance, job, method, SUM(value)/300 AS value from http_requests_total WHERE created_at BETWEEN 1495435700 AND 1495435710 GROUP BY code, handler, instance, job, method;
經過以上一些示例能夠看出,在經常使用查詢和統計方面,PromQL 比 MySQL 簡單和豐富不少,並且查詢性能也高很多。
收集到數據只是第一步,若是沒有很好作到數據可視化,有時很難發現問題。
本章將介紹使用 Prometheus 自帶的 web console 以及 grafana 來查詢和展示數據。
Prometheus 自帶了 Web Console, 安裝成功後能夠訪問 http://localhost:9090/graph
頁面,用它能夠進行任何 PromQL 查詢和調試工做,很是方便,例如:
經過上圖你不難發現,Prometheus 自帶的 Web 界面比較簡單,由於它的目的是爲了及時查詢數據,方便 PromeQL 調試。
它並非像常見的 Admin Dashboard,在一個頁面儘量展現多的數據,若是你有這方面的需求,不妨試試 Grafana。
Grafana 是一套開源的分析監視平臺,支持 Graphite, InfluxDB, OpenTSDB, Prometheus, Elasticsearch, CloudWatch 等數據源,其 UI 很是漂亮且高度定製化。
這是 Prometheus web console 不具有的,在上一節中我已經說明了選擇它的緣由。
這裏我使用 brew 安裝,命令爲
brew update brew install grafana
當安裝成功後,你可使用默認配置啓動程序
grafana-server -homepath /usr/local/Cellar/grafana/4.3.2/share/grafana/
若是順利,你將看到以下日誌
INFO[06-11|15:20:14] Starting Grafana logger=main version=4.3.2 commit=unknown-dev compiled=2017-06-01T05:47:48+0800 INFO[06-11|15:20:14] Config loaded from logger=settings file=/usr/local/Cellar/grafana/4.3.2/share/grafana/conf/defaults.ini INFO[06-11|15:20:14] Path Home logger=settings path=/usr/local/Cellar/grafana/4.3.2/share/grafana/ INFO[06-11|15:20:14] Path Data logger=settings path=/usr/local/Cellar/grafana/4.3.2/share/grafana/data INFO[06-11|15:20:14] Path Logs logger=settings path=/usr/local/Cellar/grafana/4.3.2/share/grafana/data/log INFO[06-11|15:20:14] Path Plugins logger=settings path=/usr/local/Cellar/grafana/4.3.2/share/grafana/data/plugins INFO[06-11|15:20:14] Initializing DB logger=sqlstore dbtype=sqlite3 INFO[06-11|15:20:14] Starting DB migration logger=migrator INFO[06-11|15:20:14] Executing migration logger=migrator id="copy data account to org" INFO[06-11|15:20:14] Skipping migration condition not fulfilled logger=migrator id="copy data account to org" INFO[06-11|15:20:14] Executing migration logger=migrator id="copy data account_user to org_user" INFO[06-11|15:20:14] Skipping migration condition not fulfilled logger=migrator id="copy data account_user to org_user" INFO[06-11|15:20:14] Starting plugin search logger=plugins INFO[06-11|15:20:14] Initializing Alerting logger=alerting.engine INFO[06-11|15:20:14] Initializing CleanUpService logger=cleanup INFO[06-11|15:20:14] Initializing Stream Manager INFO[06-11|15:20:14] Initializing HTTP Server logger=http.server address=0.0.0.0:3000 protocol=http subUrl= socket=
此時,你能夠打開頁面 http://localhost:3000
, 訪問 Grafana 的 web 界面。
其餘平臺安裝方案,請參考更多安裝。
Grafana 自己支持 Prometheus 數據源,故不須要安裝其餘插件。
使用默認帳號 admin/admin 登陸 grafana
在 Dashboard 首頁,點擊添加數據源
配置 Prometheus 數據源
目前爲止,Grafana 已經和 Prometheus 連上了,你將看到這樣的 Dashboard
由頂部 Manage dashboard
-> Settings
進入管理頁面
在管理頁面中取消 Hide Controls
點擊頁面底部 + ADD ROW
按鈕, 並選擇 Graph
類型
點擊 Panel Title
-> Edit
進入 Panel 編輯頁面,並在 Metrics
中 的 Metric lookup
選擇 go_goroutines
你也能夠直接在管理界面中填寫 Prometheus 的查詢語句,以及修改查詢的 step 數值。
當你修改了 Dashboard 後,記得點擊頂部的 Save dashboard
按鈕,或直接 CTRL+S
保存。
至此,咱們自定義的 Panel 已添加完成
咱們能夠經過拖拽,拉昇調節 panel 的位置和尺寸,咱們調節的目的是儘可能在一個屏幕顯示更多信息。
Grafana 是一款很是漂亮,強大的監視分析平臺,自己支持了 Prometheus 數據源,因此在作數據和監視可視化的時候,Grafana + Prometheus 是個不錯的選擇。
Prometheus 啓動的時候,能夠加載運行參數 -config.file
指定配置文件,默認爲 prometheus.yml
。
在配置文件中咱們能夠指定 global, alerting, rule_files, scrape_configs, remote_write, remote_read 等屬性。
其代碼結構體定義爲:
// Config is the top-level configuration for Prometheus's config files. type Config struct { GlobalConfig GlobalConfig `yaml:"global"` AlertingConfig AlertingConfig `yaml:"alerting,omitempty"` RuleFiles []string `yaml:"rule_files,omitempty"` ScrapeConfigs []*ScrapeConfig `yaml:"scrape_configs,omitempty"` RemoteWriteConfigs []*RemoteWriteConfig `yaml:"remote_write,omitempty"` RemoteReadConfigs []*RemoteReadConfig `yaml:"remote_read,omitempty"` // Catches all undefined fields and must be empty after parsing. XXX map[string]interface{} `yaml:",inline"` // original is the input from which the config was parsed. original string }
配置文件結構大概爲:
global: # How frequently to scrape targets by default. [ scrape_interval: <duration> | default = 1m ] # How long until a scrape request times out. [ scrape_timeout: <duration> | default = 10s ] # How frequently to evaluate rules. [ evaluation_interval: <duration> | default = 1m ] # The labels to add to any time series or alerts when communicating with # external systems (federation, remote storage, Alertmanager). external_labels: [ <labelname>: <labelvalue> ... ] # Rule files specifies a list of globs. Rules and alerts are read from # all matching files. rule_files: [ - <filepath_glob> ... ] # A list of scrape configurations. scrape_configs: [ - <scrape_config> ... ] # Alerting specifies settings related to the Alertmanager. alerting: alert_relabel_configs: [ - <relabel_config> ... ] alertmanagers: [ - <alertmanager_config> ... ] # Settings related to the experimental remote write feature. remote_write: [ - <remote_write> ... ] # Settings related to the experimental remote read feature. remote_read: [ - <remote_read> ... ]
global
屬於全局的默認配置,它主要包含 4 個屬性,
其代碼結構體定義爲:
// GlobalConfig configures values that are used across other configuration // objects. type GlobalConfig struct { // How frequently to scrape targets by default. ScrapeInterval model.Duration `yaml:"scrape_interval,omitempty"` // The default timeout when scraping targets. ScrapeTimeout model.Duration `yaml:"scrape_timeout,omitempty"` // How frequently to evaluate rules by default. EvaluationInterval model.Duration `yaml:"evaluation_interval,omitempty"` // The labels to add to any timeseries that this Prometheus instance scrapes. ExternalLabels model.LabelSet `yaml:"external_labels,omitempty"` // Catches all undefined fields and must be empty after parsing. XXX map[string]interface{} `yaml:",inline"` }
配置文件結構大概爲:
global: scrape_interval: 15s # By default, scrape targets every 15 seconds. evaluation_interval: 15s # By default, scrape targets every 15 seconds. scrape_timeout: 10s # is set to the global default (10s). # Attach these labels to any time series or alerts when communicating with # external systems (federation, remote storage, Alertmanager). external_labels: monitor: 'codelab-monitor'
一般咱們可使用運行參數 -alertmanager.xxx
來配置 Alertmanager, 可是這樣不夠靈活,沒有辦法作到動態更新加載,以及動態定義告警屬性。
因此 alerting
配置主要用來解決這個問題,它可以更好的管理 Alertmanager, 主要包含 2 個參數:
其代碼結構體定義爲:
// AlertingConfig configures alerting and alertmanager related configs. type AlertingConfig struct { AlertRelabelConfigs []*RelabelConfig `yaml:"alert_relabel_configs,omitempty"` AlertmanagerConfigs []*AlertmanagerConfig `yaml:"alertmanagers,omitempty"` // Catches all undefined fields and must be empty after parsing. XXX map[string]interface{} `yaml:",inline"` }
配置文件結構大概爲:
# Alerting specifies settings related to the Alertmanager. alerting: alert_relabel_configs: [ - <relabel_config> ... ] alertmanagers: [ - <alertmanager_config> ... ]
其中 alertmanagers 爲 alertmanager_config 數組,而 alertmanager_config 的代碼結構體爲,
// AlertmanagerConfig configures how Alertmanagers can be discovered and communicated with. type AlertmanagerConfig struct { // We cannot do proper Go type embedding below as the parser will then parse // values arbitrarily into the overflow maps of further-down types. ServiceDiscoveryConfig ServiceDiscoveryConfig `yaml:",inline"` HTTPClientConfig HTTPClientConfig `yaml:",inline"` // The URL scheme to use when talking to Alertmanagers. Scheme string `yaml:"scheme,omitempty"` // Path prefix to add in front of the push endpoint path. PathPrefix string `yaml:"path_prefix,omitempty"` // The timeout used when sending alerts. Timeout time.Duration `yaml:"timeout,omitempty"` // List of Alertmanager relabel configurations. RelabelConfigs []*RelabelConfig `yaml:"relabel_configs,omitempty"` // Catches all undefined fields and must be empty after parsing. XXX map[string]interface{} `yaml:",inline"` }
配置文件結構大概爲:
# Per-target Alertmanager timeout when pushing alerts. [ timeout: <duration> | default = 10s ] # Prefix for the HTTP path alerts are pushed to. [ path_prefix: <path> | default = / ] # Configures the protocol scheme used for requests. [ scheme: <scheme> | default = http ] # Sets the `Authorization` header on every request with the # configured username and password. basic_auth: [ username: <string> ] [ password: <string> ] # Sets the `Authorization` header on every request with # the configured bearer token. It is mutually exclusive with `bearer_token_file`. [ bearer_token: <string> ] # Sets the `Authorization` header on every request with the bearer token # read from the configured file. It is mutually exclusive with `bearer_token`. [ bearer_token_file: /path/to/bearer/token/file ] # Configures the scrape request's TLS settings. tls_config: [ <tls_config> ] # Optional proxy URL. [ proxy_url: <string> ] # List of Azure service discovery configurations. azure_sd_configs: [ - <azure_sd_config> ... ] # List of Consul service discovery configurations. consul_sd_configs: [ - <consul_sd_config> ... ] # List of DNS service discovery configurations. dns_sd_configs: [ - <dns_sd_config> ... ] # List of EC2 service discovery configurations. ec2_sd_configs: [ - <ec2_sd_config> ... ] # List of file service discovery configurations. file_sd_configs: [ - <file_sd_config> ... ] # List of GCE service discovery configurations. gce_sd_configs: [ - <gce_sd_config> ... ] # List of Kubernetes service discovery configurations. kubernetes_sd_configs: [ - <kubernetes_sd_config> ... ] # List of Marathon service discovery configurations. marathon_sd_configs: [ - <marathon_sd_config> ... ] # List of AirBnB's Nerve service discovery configurations. nerve_sd_configs: [ - <nerve_sd_config> ... ] # List of Zookeeper Serverset service discovery configurations. serverset_sd_configs: [ - <serverset_sd_config> ... ] # List of Triton service discovery configurations. triton_sd_configs: [ - <triton_sd_config> ... ] # List of labeled statically configured Alertmanagers. static_configs: [ - <static_config> ... ] # List of Alertmanager relabel configurations. relabel_configs: [ - <relabel_config> ... ]
rule_files
主要用於配置 rules 文件,它支持多個文件以及文件目錄。
其代碼結構定義爲:
RuleFiles []string `yaml:"rule_files,omitempty"`
配置文件結構大體爲:
rule_files: - "rules/node.rules" - "rules2/*.rules"
scrape_configs 主要用於配置拉取數據節點,每個拉取配置主要包含如下參數:
其代碼結構體定義爲:
// ScrapeConfig configures a scraping unit for Prometheus. type ScrapeConfig struct { // The job name to which the job label is set by default. JobName string `yaml:"job_name"` // Indicator whether the scraped metrics should remain unmodified. HonorLabels bool `yaml:"honor_labels,omitempty"` // A set of query parameters with which the target is scraped. Params url.Values `yaml:"params,omitempty"` // How frequently to scrape the targets of this scrape config. ScrapeInterval model.Duration `yaml:"scrape_interval,omitempty"` // The timeout for scraping targets of this config. ScrapeTimeout model.Duration `yaml:"scrape_timeout,omitempty"` // The HTTP resource path on which to fetch metrics from targets. MetricsPath string `yaml:"metrics_path,omitempty"` // The URL scheme with which to fetch metrics from targets. Scheme string `yaml:"scheme,omitempty"` // More than this many samples post metric-relabelling will cause the scrape to fail. SampleLimit uint `yaml:"sample_limit,omitempty"` // We cannot do proper Go type embedding below as the parser will then parse // values arbitrarily into the overflow maps of further-down types. ServiceDiscoveryConfig ServiceDiscoveryConfig `yaml:",inline"` HTTPClientConfig HTTPClientConfig `yaml:",inline"` // List of target relabel configurations. RelabelConfigs []*RelabelConfig `yaml:"relabel_configs,omitempty"` // List of metric relabel configurations. MetricRelabelConfigs []*RelabelConfig `yaml:"metric_relabel_configs,omitempty"` // Catches all undefined fields and must be empty after parsing. XXX map[string]interface{} `yaml:",inline"` }
以上配置定義中還包含 ServiceDiscoveryConfig,它的代碼定義爲:
// ServiceDiscoveryConfig configures lists of different service discovery mechanisms. type ServiceDiscoveryConfig struct { // List of labeled target groups for this job. StaticConfigs []*TargetGroup `yaml:"static_configs,omitempty"` // List of DNS service discovery configurations. DNSSDConfigs []*DNSSDConfig `yaml:"dns_sd_configs,omitempty"` // List of file service discovery configurations. FileSDConfigs []*FileSDConfig `yaml:"file_sd_configs,omitempty"` // List of Consul service discovery configurations. ConsulSDConfigs []*ConsulSDConfig `yaml:"consul_sd_configs,omitempty"` // List of Serverset service discovery configurations. ServersetSDConfigs []*ServersetSDConfig `yaml:"serverset_sd_configs,omitempty"` // NerveSDConfigs is a list of Nerve service discovery configurations. NerveSDConfigs []*NerveSDConfig `yaml:"nerve_sd_configs,omitempty"` // MarathonSDConfigs is a list of Marathon service discovery configurations. MarathonSDConfigs []*MarathonSDConfig `yaml:"marathon_sd_configs,omitempty"` // List of Kubernetes service discovery configurations. KubernetesSDConfigs []*KubernetesSDConfig `yaml:"kubernetes_sd_configs,omitempty"` // List of GCE service discovery configurations. GCESDConfigs []*GCESDConfig `yaml:"gce_sd_configs,omitempty"` // List of EC2 service discovery configurations. EC2SDConfigs []*EC2SDConfig `yaml:"ec2_sd_configs,omitempty"` // List of OpenStack service discovery configurations. OpenstackSDConfigs []*OpenstackSDConfig `yaml:"openstack_sd_configs,omitempty"` // List of Azure service discovery configurations. AzureSDConfigs []*AzureSDConfig `yaml:"azure_sd_configs,omitempty"` // List of Triton service discovery configurations. TritonSDConfigs []*TritonSDConfig `yaml:"triton_sd_configs,omitempty"` // Catches all undefined fields and must be empty after parsing. XXX map[string]interface{} `yaml:",inline"` }
ServiceDiscoveryConfig 主要用於 target 發現,大致分爲兩類,靜態配置和動態發現。
因此,一份完整的 scrape_configs 配置大體爲:
# The job name assigned to scraped metrics by default. job_name: <job_name> # How frequently to scrape targets from this job. [ scrape_interval: <duration> | default = <global_config.scrape_interval> ] # Per-scrape timeout when scraping this job. [ scrape_timeout: <duration> | default = <global_config.scrape_timeout> ] # The HTTP resource path on which to fetch metrics from targets. [ metrics_path: <path> | default = /metrics ] # honor_labels controls how Prometheus handles conflicts between labels that are # already present in scraped data and labels that Prometheus would attach # server-side ("job" and "instance" labels, manually configured target # labels, and labels generated by service discovery implementations). # # If honor_labels is set to "true", label conflicts are resolved by keeping label # values from the scraped data and ignoring the conflicting server-side labels. # # If honor_labels is set to "false", label conflicts are resolved by renaming # conflicting labels in the scraped data to "exported_<original-label>" (for # example "exported_instance", "exported_job") and then attaching server-side # labels. This is useful for use cases such as federation, where all labels # specified in the target should be preserved. # # Note that any globally configured "external_labels" are unaffected by this # setting. In communication with external systems, they are always applied only # when a time series does not have a given label yet and are ignored otherwise. [ honor_labels: <boolean> | default = false ] # Configures the protocol scheme used for requests. [ scheme: <scheme> | default = http ] # Optional HTTP URL parameters. params: [ <string>: [<string>, ...] ] # Sets the `Authorization` header on every scrape request with the # configured username and password. basic_auth: [ username: <string> ] [ password: <string> ] # Sets the `Authorization` header on every scrape request with # the configured bearer token. It is mutually exclusive with `bearer_token_file`. [ bearer_token: <string> ] # Sets the `Authorization` header on every scrape request with the bearer token # read from the configured file. It is mutually exclusive with `bearer_token`. [ bearer_token_file: /path/to/bearer/token/file ] # Configures the scrape request's TLS settings. tls_config: [ <tls_config> ] # Optional proxy URL. [ proxy_url: <string> ] # List of Azure service discovery configurations. azure_sd_configs: [ - <azure_sd_config> ... ] # List of Consul service discovery configurations. consul_sd_configs: [ - <consul_sd_config> ... ] # List of DNS service discovery configurations. dns_sd_configs: [ - <dns_sd_config> ... ] # List of EC2 service discovery configurations. ec2_sd_configs: [ - <ec2_sd_config> ... ] # List of OpenStack service discovery configurations. openstack_sd_configs: [ - <openstack_sd_config> ... ] # List of file service discovery configurations. file_sd_configs: [ - <file_sd_config> ... ] # List of GCE service discovery configurations. gce_sd_configs: [ - <gce_sd_config> ... ] # List of Kubernetes service discovery configurations. kubernetes_sd_configs: [ - <kubernetes_sd_config> ... ] # List of Marathon service discovery configurations. marathon_sd_configs: [ - <marathon_sd_config> ... ] # List of AirBnB's Nerve service discovery configurations. nerve_sd_configs: [ - <nerve_sd_config> ... ] # List of Zookeeper Serverset service discovery configurations. serverset_sd_configs: [ - <serverset_sd_config> ... ] # List of Triton service discovery configurations. triton_sd_configs: [ - <triton_sd_config> ... ] # List of labeled statically configured targets for this job. static_configs: [ - <static_config> ... ] # List of target relabel configurations. relabel_configs: [ - <relabel_config> ... ] # List of metric relabel configurations. metric_relabel_configs: [ - <relabel_config> ... ] # Per-scrape limit on number of scraped samples that will be accepted. # If more than this number of samples are present after metric relabelling # the entire scrape will be treated as failed. 0 means no limit. [ sample_limit: <int> | default = 0 ]
remote_write
主要用於可寫遠程存儲配置,主要包含如下參數:
其代碼結構體爲:
// RemoteWriteConfig is the configuration for writing to remote storage. type RemoteWriteConfig struct { URL *URL `yaml:"url,omitempty"` RemoteTimeout model.Duration `yaml:"remote_timeout,omitempty"` WriteRelabelConfigs []*RelabelConfig `yaml:"write_relabel_configs,omitempty"` // We cannot do proper Go type embedding below as the parser will then parse // values arbitrarily into the overflow maps of further-down types. HTTPClientConfig HTTPClientConfig `yaml:",inline"` // Catches all undefined fields and must be empty after parsing. XXX map[string]interface{} `yaml:",inline"` }
一份完整的配置大體爲:
# The URL of the endpoint to send samples to. url: <string> # Timeout for requests to the remote write endpoint. [ remote_timeout: <duration> | default = 30s ] # List of remote write relabel configurations. write_relabel_configs: [ - <relabel_config> ... ] # Sets the `Authorization` header on every remote write request with the # configured username and password. basic_auth: [ username: <string> ] [ password: <string> ] # Sets the `Authorization` header on every remote write request with # the configured bearer token. It is mutually exclusive with `bearer_token_file`. [ bearer_token: <string> ] # Sets the `Authorization` header on every remote write request with the bearer token # read from the configured file. It is mutually exclusive with `bearer_token`. [ bearer_token_file: /path/to/bearer/token/file ] # Configures the remote write request's TLS settings. tls_config: [ <tls_config> ] # Optional proxy URL. [ proxy_url: <string> ]
注意: remote_write 屬於試驗階段,慎用,由於在之後的版本中可能發生改變。
remote_read
主要用於可讀遠程存儲配置,主要包含如下參數:
其代碼結構體爲:
// RemoteReadConfig is the configuration for reading from remote storage. type RemoteReadConfig struct { URL *URL `yaml:"url,omitempty"` RemoteTimeout model.Duration `yaml:"remote_timeout,omitempty"` // We cannot do proper Go type embedding below as the parser will then parse // values arbitrarily into the overflow maps of further-down types. HTTPClientConfig HTTPClientConfig `yaml:",inline"` // Catches all undefined fields and must be empty after parsing. XXX map[string]interface{} `yaml:",inline"` }
一份完整的配置大體爲:
# The URL of the endpoint to query from. url: <string> # Timeout for requests to the remote read endpoint. [ remote_timeout: <duration> | default = 30s ] # Sets the `Authorization` header on every remote read request with the # configured username and password. basic_auth: [ username: <string> ] [ password: <string> ] # Sets the `Authorization` header on every remote read request with # the configured bearer token. It is mutually exclusive with `bearer_token_file`. [ bearer_token: <string> ] # Sets the `Authorization` header on every remote read request with the bearer token # read from the configured file. It is mutually exclusive with `bearer_token`. [ bearer_token_file: /path/to/bearer/token/file ] # Configures the remote read request's TLS settings. tls_config: [ <tls_config> ] # Optional proxy URL. [ proxy_url: <string> ]
注意: remote_read 屬於試驗階段,慎用,由於在之後的版本中可能發生改變。
在 Prometheus 的配置中,一個最重要的概念就是數據源 target,而數據源的配置主要分爲靜態配置和動態發現, 大體爲如下幾類:
它們具體使用以及配置模板,請參考服務發現配置模板。
它們中最重要的,也是使用最普遍的應該是 static_configs
, 其實那些動態類型均可以當作是某些通用業務使用靜態服務封裝的結果。
Prometheus 的配置參數比較多,可是我的使用較多的是 global, rules, scrap_configs, statstic_config, rebel_config 等。
我平時使用的配置文件大體爲這樣:
global: scrape_interval: 15s # By default, scrape targets every 15 seconds. evaluation_interval: 15s # By default, scrape targets every 15 seconds. rule_files: - "rules/node.rules" scrape_configs: - job_name: 'prometheus' scrape_interval: 5s static_configs: - targets: ['localhost:9090'] - job_name: 'node' scrape_interval: 8s static_configs: - targets: ['127.0.0.1:9100', '127.0.0.12:9100'] - job_name: 'mysqld' static_configs: - targets: ['127.0.0.1:9104'] - job_name: 'memcached' static_configs: - targets: ['127.0.0.1:9150']
在 Prometheus 中負責數據彙報的程序統一叫作 Exporter, 而不一樣的 Exporter 負責不一樣的業務。 它們具備統一命名格式,即 xx_exporter, 例如負責主機信息收集的 node_exporter。
Prometheus 社區已經提供了不少 exporter, 詳情請參考這裏 。
在討論 Exporter 以前,有必要先介紹一下 Prometheus 文本數據格式,由於一個 Exporter 本質上就是將收集的數據,轉化爲對應的文本格式,並提供 http 請求。
Exporter 收集的數據轉化的文本內容以行 (\n
) 爲單位,空行將被忽略, 文本內容最後一行爲空行。
文本內容,若是以 #
開頭一般表示註釋。
# HELP
開頭表示 metric 幫助說明。# TYPE
開頭表示定義 metric 類型,包含 counter
, gauge
, histogram
, summary
, 和 untyped
類型。內容若是不以 #
開頭,表示採樣數據。它一般緊挨着類型定義行,知足如下格式:
metric_name [ "{" label_name "=" `"` label_value `"` { "," label_name "=" `"` label_value `"` } [ "," ] "}" ] value [ timestamp ]
下面是一個完整的例子:
# HELP http_requests_total The total number of HTTP requests. # TYPE http_requests_total counter http_requests_total{method="post",code="200"} 1027 1395066363000 http_requests_total{method="post",code="400"} 3 1395066363000 # Escaping in label values: msdos_file_access_time_seconds{path="C:\\DIR\\FILE.TXT",error="Cannot find file:\n\"FILE.TXT\""} 1.458255915e9 # Minimalistic line: metric_without_timestamp_and_labels 12.47 # A weird metric from before the epoch: something_weird{problem="division by zero"} +Inf -3982045 # A histogram, which has a pretty complex representation in the text format: # HELP http_request_duration_seconds A histogram of the request duration. # TYPE http_request_duration_seconds histogram http_request_duration_seconds_bucket{le="0.05"} 24054 http_request_duration_seconds_bucket{le="0.1"} 33444 http_request_duration_seconds_bucket{le="0.2"} 100392 http_request_duration_seconds_bucket{le="0.5"} 129389 http_request_duration_seconds_bucket{le="1"} 133988 http_request_duration_seconds_bucket{le="+Inf"} 144320 http_request_duration_seconds_sum 53423 http_request_duration_seconds_count 144320 # Finally a summary, which has a complex representation, too: # HELP rpc_duration_seconds A summary of the RPC duration in seconds. # TYPE rpc_duration_seconds summary rpc_duration_seconds{quantile="0.01"} 3102 rpc_duration_seconds{quantile="0.05"} 3272 rpc_duration_seconds{quantile="0.5"} 4773 rpc_duration_seconds{quantile="0.9"} 9001 rpc_duration_seconds{quantile="0.99"} 76656 rpc_duration_seconds_sum 1.7560473e+07 rpc_duration_seconds_count 2693
須要特別注意的是,假設採樣數據 metric 叫作 x
, 若是 x
是 histogram
或 summary
類型必需知足如下條件:
x_sum
。x_count
。summary
類型的採樣數據的 quantile 應表示爲 x{quantile="y"}
。histogram
類型的採樣分區統計數據將表示爲 x_bucket{le="y"}
。histogram
類型的採樣必須包含 x_bucket{le="+Inf"}
, 它的值等於 x_count
的值。summary
和 historam
中 quantile
和 le
必需按從小到大順序排列。既然一個 exporter 就是將收集的數據轉化爲文本格式,並提供 http 請求便可,那很容本身實現一個。
下面我將用 golang
實現一個簡單的 sample_exporter
, 其代碼大體爲:
package main import ( "fmt" "net/http" ) func handler(w http.ResponseWriter, r *http.Request) { fmt.Fprintf(w, exportData) } func main() { http.HandleFunc("/", handler) http.ListenAndServe(":8080", nil) } var exportData string = `# HELP sample_http_requests_total The total number of HTTP requests. # TYPE sample_http_requests_total counter sample_http_requests_total{method="post",code="200"} 1027 1395066363000 sample_http_requests_total{method="post",code="400"} 3 1395066363000 # Escaping in label values: sample_msdos_file_access_time_seconds{path="C:\\DIR\\FILE.TXT",error="Cannot find file:\n\"FILE.TXT\""} 1.458255915e9 # Minimalistic line: sample_metric_without_timestamp_and_labels 12.47 # A histogram, which has a pretty complex representation in the text format: # HELP sample_http_request_duration_seconds A histogram of the request duration. # TYPE sample_http_request_duration_seconds histogram sample_http_request_duration_seconds_bucket{le="0.05"} 24054 sample_http_request_duration_seconds_bucket{le="0.1"} 33444 sample_http_request_duration_seconds_bucket{le="0.2"} 100392 sample_http_request_duration_seconds_bucket{le="0.5"} 129389 sample_http_request_duration_seconds_bucket{le="1"} 133988 sample_http_request_duration_seconds_bucket{le="+Inf"} 144320 sample_http_request_duration_seconds_sum 53423 sample_http_request_duration_seconds_count 144320 # Finally a summary, which has a complex representation, too: # HELP sample_rpc_duration_seconds A summary of the RPC duration in seconds. # TYPE sample_rpc_duration_seconds summary sample_rpc_duration_seconds{quantile="0.01"} 3102 sample_rpc_duration_seconds{quantile="0.05"} 3272 sample_rpc_duration_seconds{quantile="0.5"} 4773 sample_rpc_duration_seconds{quantile="0.9"} 9001 sample_rpc_duration_seconds{quantile="0.99"} 76656 sample_rpc_duration_seconds_sum 1.7560473e+07 sample_rpc_duration_seconds_count 2693 `
當運行此程序,你訪問 http://localhost:8080/metrics
, 將看到這樣的頁面:
咱們能夠利用 Prometheus 的 static_configs 來收集 sample_exporter
的數據。
打開 prometheus.yml
文件, 在 scrape_configs
中添加以下配置:
- job_name: "sample" static_configs: - targets: ["127.0.0.1:8080"]
重啓加載配置,而後到 Prometheus Console 查詢,你會看到 simple_exporter
的數據。