基於Prometheus的Pushgateway實戰

時間 2019-11-26

標籤基於 prometheus pushgateway 實戰简体版

原文原文鏈接

1、Pushgateway 簡介

Pushgateway 是 Prometheus 生態中一個重要工具，使用它的緣由主要是：html

Prometheus 採用 pull 模式，可能因爲不在一個子網或者防火牆緣由，致使 Prometheus 沒法直接拉取各個 target 數據。
在監控業務數據的時候，須要將不一樣數據彙總, 由 Prometheus 統一收集。

因爲以上緣由，不得不使用 pushgateway，但在使用以前，有必要了解一下它的一些弊端：node

將多個節點數據彙總到 pushgateway, 若是 pushgateway 掛了，受影響比多個 target 大。
Prometheus 拉取狀態 up 只針對 pushgateway, 沒法作到對每一個節點有效。
Pushgateway 能夠持久化推送給它的全部監控數據。

所以，即便你的監控已經下線，prometheus 還會拉取到舊的監控數據，須要手動清理 pushgateway 不要的數據。python

拓撲圖以下：linux

2、基於Docker 安裝

使用 prom/pushgateway 的 Docker 鏡像git

docker pull prom/pushgateway

接下來啓動Push Gateway：github

docker run -d \
  --name=pg \ -p 9091:9091 \ prom/pushgateway

訪問url：sql

http://192.168.91.132:9091/

效果以下：docker

在上一篇文章 http://www.javashuo.com/article/p-bydptqfy-hb.html 中，已經搭建好了Prometheusshell

要使Push Gateway正常工做，必需要在prometheus中配置對應的job才行flask

修改配置文件

vim /opt/prometheus/prometheus.yml

添加Push Gateway，完整內容以下：

global:
  scrape_interval:     60s
  evaluation_interval: 60s
 
scrape_configs:
  - job_name: prometheus
    static_configs:
      - targets: ['localhost:9090']
        labels:
          instance: prometheus
 
  - job_name: linux
    static_configs:
      - targets: ['192.168.91.132:9100']
        labels:
          instance: localhost

  - job_name: pushgateway
    static_configs:
      - targets: ['192.168.91.132:9091']
        labels:
          instance: pushgateway

因爲prometheus.yml是外部加載的，docker在前面已經後臺運行了。沒法及時生效！

使用 docker ps 命令查看當前docker進程

CONTAINER ID        IMAGE                COMMAND                  CREATED             STATUS              PORTS                    NAMES
59ae7d9c8c3a prom/prometheus      "/bin/prometheus -..."   16 minutes ago      Up 16 minutes       0.0.0.0:9090->9090/tcp awesome_mcnulty
d907d0240018        prom/pushgateway     "/bin/pushgateway"       36 minutes ago      Up 36 minutes       0.0.0.0:9091->9091/tcp   pg
6b06f3b354cb        grafana/grafana      "/run.sh"                About an hour ago   Up About an hour    0.0.0.0:3000->3000/tcp   grafana3
62a0f435ea08        prom/node-exporter   "/bin/node_exporter"     2 hours ago         Up 2 hours                                   happy_galileo

重啓prometheus的docker容器

docker restart 59ae7d9c8c3a

訪問targets，等待1分鐘，等待pushgateway狀態爲UP

3、數據管理

正常狀況咱們會使用 Client SDK 推送數據到 pushgateway, 可是咱們還能夠經過 API 來管理, 例如：

shell腳本

向 {job="some_job"} 添加單條數據：

echo "some_metric 3.14" | curl --data-binary @- http://pushgateway.example.org:9091/metrics/job/some_job

--data-binary 表示發送二進制數據，注意：它是使用POST方式發送的！

添加更多更復雜數據，一般數據會帶上 instance, 表示來源位置：

cat <<EOF | curl --data-binary @- http://pushgateway.example.org:9091/metrics/job/some_job/instance/some_instance
# TYPE some_metric counter
some_metric{label="val1"} 42
# TYPE another_metric gauge
# HELP another_metric Just an example.
another_metric 2398.283
EOF

注意：必須是指定的格式才行！

刪除某個組下的某實例的全部數據：

curl -X DELETE http://pushgateway.example.org:9091/metrics/job/some_job/instance/some_instance

刪除某個組下的全部數據：

curl -X DELETE http://pushgateway.example.org:9091/metrics/job/some_job

能夠發現 pushgateway 中的數據咱們一般按照 job 和 instance 分組分類，因此這兩個參數不可缺乏。

由於 Prometheus 配置 pushgateway 的時候，也會指定 job 和 instance, 可是它只表示 pushgateway 實例，不能真正表達收集數據的含義。因此在 prometheus 中配置 pushgateway 的時候，須要添加 honor_labels: true 參數，從而避免收集數據自己的 job 和 instance 被覆蓋。

注意，爲了防止 pushgateway 重啓或意外掛掉，致使數據丟失，咱們能夠經過 -persistence.file 和 -persistence.interval 參數將數據持久化下來。

本文參考連接：

https://songjiayang.gitbooks.io/prometheus/content/pushgateway/how.html

python腳本

安裝模塊

pip3 install flask
pip3 install prometheus_client

Metrics

Prometheus提供4種類型Metrics：Counter, Gauge, Summary和Histogram

Counter

Counter能夠增加，而且在程序重啓的時候會被重設爲0，常被用於任務個數，總處理時間，錯誤個數等只增不減的指標。

示例代碼：

import prometheus_client
from prometheus_client import Counter
from prometheus_client.core import CollectorRegistry
from flask import Response, Flask

app = Flask(__name__)

requests_total = Counter("request_count", "Total request cout of the host")

@app.route("/metrics")
def requests_count():
    requests_total.inc()
    # requests_total.inc(2)
    return Response(prometheus_client.generate_latest(requests_total),
                    mimetype="text/plain")

@app.route('/')
def index():
    requests_total.inc()
    return "Hello World"

if __name__ == "__main__":
    app.run(host="0.0.0.0")

View Code

運行該腳本，訪問youhost:5000/metrics

# HELP request_count Total request cout of the host
# TYPE request_count counter
request_count 3.0

Gauge

Gauge與Counter相似，惟一不一樣的是Gauge數值能夠減小，常被用於溫度、利用率等指標。

示例代碼：

import random
import prometheus_client
from prometheus_client import Gauge
from flask import Response, Flask

app = Flask(__name__)

random_value = Gauge("random_value", "Random value of the request")

@app.route("/metrics")
def r_value():
    random_value.set(random.randint(0, 10))
    return Response(prometheus_client.generate_latest(random_value),
                    mimetype="text/plain")


if __name__ == "__main__":
    app.run(host="0.0.0.0")

View Code

運行該腳本，訪問youhost:5000/metrics

# HELP random_value Random value of the request
# TYPE random_value gauge
random_value 3.0

Summary/Histogram

Summary/Histogram概念比較複雜，通常exporter很難用到，暫且不說。

PLUS

LABELS

使用labels來區分metric的特徵

示例代碼：

from prometheus_client import Counter

c = Counter('requests_total', 'HTTP requests total', ['method', 'clientip'])

c.labels('get', '127.0.0.1').inc()
c.labels('post', '192.168.0.1').inc(3)
c.labels(method="get", clientip="192.168.0.1").inc()

View Code

REGISTRY

示例代碼：

from prometheus_client import Counter, Gauge
from prometheus_client.core import CollectorRegistry

REGISTRY = CollectorRegistry(auto_describe=False)

requests_total = Counter("request_count", "Total request cout of the host", registry=REGISTRY)
random_value = Gauge("random_value", "Random value of the request", registry=REGISTRY)

View Code

本文參考連接：

https://blog.csdn.net/huochen1994/article/details/76263078

舉例:(網卡流量)

先訪問這篇文章《python 獲取網卡實時流量》：

http://www.py3study.com/Article/details/id/347.html

下面這段python腳本，主要是參考上面文章的基礎上修改的

發送本機網卡流量

import prometheus_client
from prometheus_client import Counter
from prometheus_client import Gauge
from prometheus_client.core import CollectorRegistry
import psutil
import time
import requests
import socket

def get_key():
    key_info = psutil.net_io_counters(pernic=True).keys()

    recv = {}
    sent = {}

    for key in key_info:
        recv.setdefault(key, psutil.net_io_counters(pernic=True).get(key).bytes_recv)
        sent.setdefault(key, psutil.net_io_counters(pernic=True).get(key).bytes_sent)

    return key_info, recv, sent


def get_rate(func):
    import time

    key_info, old_recv, old_sent = func()

    time.sleep(1)

    key_info, now_recv, now_sent = func()

    net_in = {}
    net_out = {}

    for key in key_info:
        # float('%.2f' % a)
        # net_in.setdefault(key, float('%.2f' %((now_recv.get(key) - old_recv.get(key)) / 1024)))
        # net_out.setdefault(key, float('%.2f' %((now_sent.get(key) - old_sent.get(key)) / 1024)))

        # 計算流量
        net_in.setdefault(key, now_recv.get(key) - old_recv.get(key))
        net_out.setdefault(key, now_sent.get(key) - old_sent.get(key))

    return key_info, net_in, net_out

# def get_host_ip():
#     """
#     查詢本機ip地址,針對單網卡
#     :return: ip
#     """
#     try:
#         s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
#         s.connect(('8.8.8.8', 80))
#         ip = s.getsockname()[0]
#     finally:
#         s.close()
#         return ip

# 打印多網卡 mac 和 ip 信息
def PrintNetIfAddr():
    dic = psutil.net_if_addrs()
    net_dic = {}
    net_dic['no_ip'] = []  # 無ip的網卡列表
    for adapter in dic:
        snicList = dic[adapter]
        mac = '無 mac 地址'
        ipv4 = '無 ipv4 地址'
        ipv6 = '無 ipv6 地址'
        for snic in snicList:
            if snic.family.name in {'AF_LINK', 'AF_PACKET'}:
                mac = snic.address
            elif snic.family.name == 'AF_INET':
                ipv4 = snic.address
            elif snic.family.name == 'AF_INET6':
                ipv6 = snic.address
        # print('%s, %s, %s, %s' % (adapter, mac, ipv4, ipv6))

        # 判斷網卡名不在net_dic中時,而且網卡不是lo
        if adapter not in net_dic and adapter != 'lo':
            if not ipv4.startswith("無"):  # 判斷ip地址不是以無開頭
                net_dic[adapter] = ipv4  # 增長鍵值對
            else:
                net_dic['no_ip'].append(adapter)  # 無ip的網卡

    # print(net_dic)
    return net_dic

key_info, net_in, net_out = get_rate(get_key)

# ip=get_host_ip()  # 本機ip
hostname = socket.gethostname() # 主機名

REGISTRY = CollectorRegistry(auto_describe=False)
input = Gauge("network_traffic_input", hostname,['adapter_name','unit','ip','instance'],registry=REGISTRY)  # 流入
output = Gauge("network_traffic_output", hostname,['adapter_name','unit','ip','instance'],registry=REGISTRY)  # 流出


for key in key_info:
    net_addr = PrintNetIfAddr()
    # 判斷網卡不是lo(迴環網卡)以及 不是無ip的網卡
    if key != 'lo' and  key not in net_addr['no_ip']:
        # 流入和流出
        input.labels(ip=net_addr[key],adapter_name=key, unit="Byte",instance=hostname).inc(net_in.get(key))
        output.labels(ip=net_addr[key],adapter_name=key, unit="Byte",instance=hostname).inc(net_out.get(key))

requests.post("http://192.168.91.132:9091/metrics/job/network_traffic",data=prometheus_client.generate_latest(REGISTRY))
print("發送了一次網卡流量數據")

View Code

執行腳本，它會發送1次數據給Push Gateway

取到的流量沒有除以1024，因此默認是字節

注意：發送的連接，約定成俗的格式以下：

http://Pushgateway地址:9091/metrics/job/監控項目

好比監控etcd，地址就是這樣的

http://Pushgateway地址:9091/metrics/job/etcd

必須使用POST方式發送數據！

代碼解釋

關鍵代碼，就是這幾行

REGISTRY = CollectorRegistry(auto_describe=False)
input = Gauge("network_traffic_input", hostname,['adapter_name','unit','ip','instance'],registry=REGISTRY)  # 流入
output = Gauge("network_traffic_output", hostname,['adapter_name','unit','ip','instance'],registry=REGISTRY)  # 流出

input.labels(ip=net_addr[key],adapter_name=key, unit="Byte",instance=hostname).inc(net_in.get(key))
output.labels(ip=net_addr[key],adapter_name=key, unit="Byte",instance=hostname).inc(net_out.get(key))

一、自定義的指標收集類都必須到CollectorRegistry進行註冊，指標數據經過CollectorRegistry類的方法或者函數，返回給Prometheus.
二、CollectorRegistry必須提供register()和unregister()函數，一個指標收集器能夠註冊多個CollectorRegistry.
三、客戶端庫必須是線程安全的

代碼第一行，聲明瞭CollectorRegistry

input和output是流入流出的流量。Metrics使用的是Gauge

input = Gauge("network_traffic_input", hostname,['adapter_name','unit','ip','instance'],registry=REGISTRY)  # 流入

network_traffic_input表示鍵值，它必須惟一。由於在grafana圖表中，要用這個鍵值繪製圖表。

"" 爲空，它其實對應的是描述信息。爲了不數據冗長，通常不寫它。

['adapter_name','unit','ip','instance'] ，它是一個列表，裏面每個元素都是labels，它是用來區分metric的特徵

registry=REGISTRY 把數據註冊到REGISTRY中

input.labels(ip=net_addr[key],adapter_name=key, unit="Byte",instance=hostname).inc(net_in.get(key))

這裏定義了input的labels，括號裏面有3個鍵值對。注意：這3個鍵值對必須在['adapter_name','unit','ip'] 列表中。

若是labels中要增長鍵值對，那麼上面的列表中，也要增長對應的元素。不然會報錯！

inc表示具體值。它對應的是input

刷新Push Gateway頁面

展開數據，這裏就是流入流出的數據了

進入grafana頁面，新建一個圖表

添加網絡流入和流出指標

更改標題

設置liunx任務計劃，每分鐘執行一次

* * * * * python3 /opt/test.py

效果以下：

若是服務器沒有流量的話，能夠造點流量

寫一個腳本，持續訪問某張圖片

import requests
while True:
    requests.get("http://192.168.91.128/Netraffic/dt.jpg")
    print('正在訪問圖片')

若是須要監控Mysql，參考這篇文章

https://www.jianshu.com/p/27b979554ef8

注意：它使用的是用flask暴露了一個Metrics，用來給Prometheus提供數據。

那麼就須要在 Prometheus的配置文件中，添加對應的job才能收集到數據。

它會按期訪問暴露的http連接，獲取數據。

總結：

使用Prometheus監控，有2中方式

1. 暴露http方式的Metrics，注意：須要在Prometheus的配置文件中添加job

2. 主動發送數據到Pushgateway，注意：只須要添加一個Pushgateway就能夠了。它至關於一個API，不管有多少個服務器，發送到統一的地址。

生產環境中，通常使用Pushgateway，簡單，也不須要修改Prometheus的配置文件！

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。