生產環境Consul Prometheus監控

Consul監控

Consul支持衆多監控工具進行對自身監控。咱們這裏使用Prometheus進行監控。node

前提條件

  • 有一個consul server集羣及agent。集羣搭建及配置請參考Consul安裝備份升級python

  • 須要在配置文件中指定telemetry選項。以下所示shell

    ~]# cat /usr/local/consul/consul.d/consul.json 
    {
      "datacenter": "dc1",
      "client_addr": "0.0.0.0",
      "bind_addr": "{{ GetInterfaceIP \"eth0\" }}",
      "data_dir": "/usr/local/consul/data",
      "retry_interval": "20s",
      "retry_join": ["10.111.67.1","10.111.67.2","10.111.67.3","10.111.67.4","10.111.67.5"],
      "enable_local_script_checks": true,
      "log_file": "/usr/local/consul/logs/",
      "log_level": "debug",
      "enable_debug": true,
      "pid_file": "/var/run/consul.pid",
      "performance": {
          "raft_multiplier": 1
      },
      "telemetry": {
          "prometheus_retention_time": "120s",
          "disable_hostname": true
      }
    }
  • 啓動成功後,使用以下命令測試json

    ~]# curl 127.0.0.1:8500/v1/agent/metrics?format=prometheus
    # HELP consul_fsm_register consul_fsm_register
    # TYPE consul_fsm_register summary
    consul_fsm_register{quantile="0.5"} NaN
    consul_fsm_register{quantile="0.9"} NaN
    consul_fsm_register{quantile="0.99"} NaN
    consul_fsm_register_sum 3.396029010415077
    consul_fsm_register_count 8
    # HELP consul_http_GET_v1_agent_metrics consul_http_GET_v1_agent_metrics
    # TYPE consul_http_GET_v1_agent_metrics summary
    consul_http_GET_v1_agent_metrics{quantile="0.5"} 0.5403839945793152
    consul_http_GET_v1_agent_metrics{quantile="0.9"} 0.5403839945793152
    consul_http_GET_v1_agent_metrics{quantile="0.99"} 0.5403839945793152
    consul_http_GET_v1_agent_metrics_sum 366820.44427236915
    consul_http_GET_v1_agent_metrics_count 349523
    # HELP consul_http_GET_v1_catalog_service__ consul_http_GET_v1_catalog_service__
    # TYPE consul_http_GET_v1_catalog_service__ summary
    consul_http_GET_v1_catalog_service__{quantile="0.5"} 31258.423828125
    consul_http_GET_v1_catalog_service__{quantile="0.9"} 306137.71875
    consul_http_GET_v1_catalog_service__{quantile="0.99"} 306137.71875
    consul_http_GET_v1_catalog_service___sum 4.0220439955034314e+11
    consul_http_GET_v1_catalog_service___count 2.388023e+06
    …………………………

Server監控

server監控咱們採用Prometheus基於文件的自動發現(file_sd_configs),也能夠使用靜態配置(static_config)。api

由於咱們要作Consul的報警,報警須要有主機名,因此咱們使用基於文件的自動發現(file_sd_configs),對每臺主機打上consul_node_name標籤。而靜態配置(static_config)則不能對每一臺主機單獨打標籤,只能對總體的targets列表打標籤。瀏覽器

配置文件以下,此配置文件是k8s的配置文件bash

~]# cat prometheus-configmap.yaml 
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config-consul
  namespace: prometheus
  labels:
    app: prometheus-consul
    environment: prod
    release: release
data:
  prometheus.yml: |
    global:
      external_labels:
        region: cn-hangzhou
        monitor: consul
        replica: A
    scrape_configs:
    - job_name: prometheus
      static_configs:
      - targets:
        - localhost:9090

    - job_name: consul-server
      # 採集頻率
      scrape_interval: 60s
      # 採集超時
      scrape_timeout: 10s
      # 採集對象的path路徑
      metrics_path: "/v1/agent/metrics"
      scheme: http
      params:
        format: ['prometheus']
      file_sd_configs:
      - files:
        - /etc/config/consul-server.json
        refresh_interval: 1m

  consul-server.json: |
    [
        {
            "targets": [
                "10.111.67.1:8500"
            ],
            "labels": {
                "consul_node_name": "Consul-Server-1"
            }
        },
        {
            "targets": [
                "10.111.67.2:8500"
            ],
            "labels": {
                "consul_node_name": "Consul-Server-2"
            }
        },
        {
            "targets": [
                "10.111.67.3:8500"
            ],
            "labels": {
                "consul_node_name": "Consul-Server-3"
            }
        },
        {
            "targets": [
                "10.111.67.4:8500"
            ],
            "labels": {
                "consul_node_name": "Consul-Server-4"
            }
        },
        {
            "targets": [
                "10.111.67.5:8500"
            ],
            "labels": {
                "consul_node_name": "Consul-Server-5"
            }
        }
    ]

至此,Prometheus就能夠採集的Consul Server的數據了,能夠使用Prometheus自帶的UI進行查詢。markdown

Client監控

對於Consul client監控,由於Consul client數量太多,成百上千臺。所以若是使用基於文件的發現(file_sd_configs)給每一臺主機打標籤,維護這個文件工做量太大(有主機的新增和刪除)。因此咱們選用基於Consul的自動發現(consul_sd_config)`來實現client的監控。app

Consul client自注冊

要想讓Prometheus或者別的服務發現,那這個服務必須得註冊到Consul中。所以咱們使用腳本生成一個簡單的服務註冊curl

~]# cat create-consul-registration.sh 
#!/bin/bash

ADDR=`ip addr show|awk -F '[ /]+' '/eth[0-9]|em[0-9]/ && /inet/ {print $3}'`
CONSUL_CONF_DIR='/usr/local/consul/consul.d'
CONSUL_REDISTER_FILE="$CONSUL_CONF_DIR/consul-members-registration.json"

if [[ -n "$ADDR" && -d $CONSUL_CONF_DIR ]];then
        cat > ${CONSUL_REDISTER_FILE} <<-EOF
        {
            "service": {
                "id": "consul-${ADDR}",
                "name": "consul-members",
                "tags": [
                    "prometheus",
                    "client",
                    "consul-client"
                ],
                "address": "${ADDR}",
                "port": 8500,
                "check": {
                    "http": "http://127.0.0.1:8500",
                    "interval": "60s"
                }
            }
        }
        EOF
else
        echo "ip address is empty or the $CONSUL_CONF_DIR does not exist"
fi

執行這個腳本會在/usr/local/consul/consul.d/下建立服務註冊的配置文件consul-members-registration.json

~]# cat /usr/local/consul/consul.d/consul-members-registration.json 
{
    "service": {
        "id": "consul-10.111.74.8",
        "name": "consul-members",
        "tags": [
            "prometheus",
            "client",
            "consul-client"
        ],
        "address": "10.111.74.8",
        "port": 8500,
        "check": {
            "http": "http://127.0.0.1:8500",
            "interval": "60s"
        }
    }
}

以後執行consul reload加載配置

~]# consul reload

此時,這個服務就已經註冊到Consul中了,service名稱爲consul-members ,service ID爲consul-10.111.74.86,咱們能夠使用curl命令或者瀏覽器來驗證。

~]# curl -s 127.0.0.1:8500/v1/agent/services|python -m json.tool
{
    "consul-10.111.74.8": {
        "Address": "10.111.74.8",
        "EnableTagOverride": false,
        "ID": "consul-10.111.74.8",
        "Meta": {},
        "Port": 8500,
        "Service": "consul-members",
        "Tags": [
            "prometheus",
            "client",
            "consul-client"
        ],
        "Weights": {
            "Passing": 1,
            "Warning": 1
        }
    }
}

Prometheus配置

配置以下:

~]# cat prometheus-configmap.yaml 
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config-consul
  namespace: prometheus
  labels:
    app: prometheus-consul
    environment: prod
    release: release
data:
  prometheus.yml: |
    global:
      external_labels:
        region: cn-hangzhou
        monitor: consul
        replica: A
    scrape_configs:
    - job_name: prometheus
      static_configs:
      - targets:
        - localhost:9090

    - job_name: consul-client
      # 採集頻率
      scrape_interval: 60s
      # 採集超時
      scrape_timeout: 10s
      # 採集對象的path路徑
      metrics_path: "/v1/agent/metrics"
      scheme: http
      params:
        format: ['prometheus']
      consul_sd_configs:
      - server: "10.111.67.1:8500"
        services:
        - consul-members
      relabel_configs:
      - action: replace
        source_labels:
        - __meta_consul_dc
        target_label: consul_dc
      - action: replace
        source_labels:
        - __meta_consul_node
        target_label: consul_node_name
      - action: replace
        source_labels:
        - __meta_consul_service
        target_label: consul_service
      - action: replace
        source_labels:
        - __meta_consul_service_id
        target_label: consul_service_id

由於咱們要作Consul的報警,報警須要有主機名、Service名稱、Service ID、DC等信息,因此咱們須要對標籤進行重寫。可重寫的標籤有:

  • __meta_consul_address: the address of the target
  • __meta_consul_dc: the datacenter name for the target
  • __meta_consul_tagged_address_&lt;key&gt;: each node tagged address key value of the target
  • __meta_consul_metadata_&lt;key&gt;: each node metadata key value of the target
  • __meta_consul_node: the node name defined for the target
  • __meta_consul_service_address: the service address of the target
  • __meta_consul_service_id: the service ID of the target
  • __meta_consul_service_metadata_&lt;key&gt;: each service metadata key value of the target
  • __meta_consul_service_port: the service port of the target
  • __meta_consul_service: the name of the service the target belongs to
  • __meta_consul_tags: the list of tags of the target joined by the tag separator