經過Consul+Prometheus自動註冊node-exporter實現自動監控OpenStack的VM

很是重要:本文只提供思路,部署過程是我根據回憶寫的,不保證正確,具體出現的部署過程的問題須要本身排查,小程序的部署方式不懂得請留言。小程序使用必須開啓consul的認證模式!

1. 提出問題

在工做中OpenStack集羣的vm須要解決基礎性能指標的監控,若是每臺的啓動再去手動添加監控node_exporter,再寫prometheus.yml的話,對於吾等懶程序員簡直就是噩夢,由此開始設計基於Prometheus+Consul的監控方案。node

2. 解決方案

1. 經過將node_exporter打包進Image實現強制自動部署
2. 經過開發一個小程序自動註冊node_exporter到consul,同時小程序也與node_exporter同樣打包進Image
3. 配置Prometheus經過consul來發現node_exporter節點

3. 部署Consul集羣

3.1 集羣規劃

系統 主機名 IP
Centos-7.7 compute-7-1 172.16.100.71
Centos-7.7 compute-7-2 172.16.100.72
Centos-7.7 compute-7-3 172.16.100.73

3.1 自行下載Consul並安裝

Consul v1.7.2
全部節點分別安裝consullinux

$ wget https://releases.hashicorp.com/consul/1.7.2/consul_1.7.2_linux_amd64.zip
$ unzip consul_1.7.2_linux_amd64.zip
$ mv consul_1.7.2/consul /usr/bin/
$ mkdir /data/consul
$ mkdir /etc/consul.d
$ useradd consul

全部節點分別修改配置文件git

$ vim /etc/consul.d/consul_config.json
{
    "bootstrap_expect": 1,
    "datacenter": "sibat_consul",
    "data_dir": "/data/consul",
    "node_name": "compute-7-1",
    "server": true,
    "client_addr": "0.0.0.0",
    "ui": true,
    "bind_addr": "172.16.100.71"
}
$ vim /etc/consul.d/consul_config.json
{
    "bootstrap_expect": 1,
    "datacenter": "sibat_consul",
    "data_dir": "/data/consul",
    "node_name": "compute-7-2",
    "server": true,
    "client_addr": "0.0.0.0",
    "ui": true,
    "bind_addr": "172.16.100.72"
}
$ vim /etc/consul.d/consul_config.json
{
    "bootstrap_expect": 1,
    "datacenter": "sibat_consul",
    "data_dir": "/data/consul",
    "node_name": "compute-7-3",
    "server": true,
    "client_addr": "0.0.0.0",
    "ui": true,
    "bind_addr": "172.16.100.73"
        }
    }
}

全部節點分別配置systemd,啓動consul並設置開機自啓動程序員

$ vim /usr/lib/systemd/system/consul.service 
[Unit]
Description=consul: the monitoring system
Documentation=http://prometheus.io/docs/

[Service]
User=consul
Group=consul
ExecStart=/usr/bin/consul agent -config-file /etc/consul.d/consul_config.json
KillMode=process
Restart=on-failure
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
$ systemctl daemon-reload && systemctl start consul && systemctl enable consul
$ systemctl daemon-reload && systemctl start consul && systemctl enable consul
$ systemctl daemon-reload && systemctl start consul && systemctl enable consul
3.2 配置master token

初始化master tokengithub

$ curl \
    --request PUT \
    http://172.16.100.71:8500/v1/acl/bootstrap
`{"ID":"8dc1eb67-1f5f-4e10-ad9d-5e58b047647c"}`

獲取encrypt算法

$ consul keygen
gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=
3.3 配置獲取到的master token

compute-7-1:shell

{
    "bootstrap_expect": 1,
    "datacenter": "sibat_consul",
    "primary_datacenter":"sibat_consul",
    "data_dir": "/data/consul",
    "start_join":[
        "172.16.100.72",
        "172.16.100.73"
    ],
    "retry_join":[
        "172.16.100.72",
        "172.16.100.73"
    ],
    "connect":{
        "enabled": true
    },
    "server": true,
    "client_addr": "0.0.0.0",
    "ui": true,
    "node_name": "compute-7-1",
    "bind_addr": "172.16.100.71",
    "advertise_addr": "172.16.100.71",
    "enable_script_checks": false,
    "enable_local_script_checks": true,
    "log_file": "/var/log",
    "log_rotate_bytes": 300000000,
    "log_rotate_duration": "360h",
    "log_level": "info",
    "encrypt": "gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=",
    "acl": {
        "enabled": true,
        "default_policy": "deny",
        "enable_token_persistence": true,
        "tokens": {
            "master": "8dc1eb67-1f5f-4e10-ad9d-5e58b047647c"
        }
    }
}

compute-7-2json

{
    "datacenter": "sibat_consul",
    "primary_datacenter":"sibat_consul",
    "data_dir": "/data/consul",
    "connect":{
        "enabled": true
    },
    "server": true,
    "client_addr": "0.0.0.0",
    "ui": true,
    "node_name": "compute-7-2",
    "bind_addr": "172.16.100.72",
    "advertise_addr": "172.16.100.72",
    "enable_script_checks": false,
    "enable_local_script_checks": true,
    "log_file": "/var/log",
    "log_rotate_bytes": 300000000,
    "log_rotate_duration": "360h",
    "log_level": "info",
    "acl_datacenter": "sibat_consul",
    "encrypt": "gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=",
    "acl": {
        "enabled": true,
        "default_policy": "deny",
        "enable_token_persistence": true,
        "tokens": {
            "master": "8dc1eb67-1f5f-4e10-ad9d-5e58b047647c"
        }
    }
}

compute-7-3bootstrap

{
    "datacenter": "sibat_consul",
    "primary_datacenter":"sibat_consul",
    "data_dir": "/data/consul",
    "connect":{
        "enabled": true
    },
    "server": true,
    "client_addr": "0.0.0.0",
    "ui": true,
    "node_name": "compute-7-3",
    "bind_addr": "172.16.100.73",
    "advertise_addr": "172.16.100.73",
    "enable_script_checks": false,
    "enable_local_script_checks": true,
    "log_file": "/var/log",
    "log_rotate_bytes": 300000000,
    "log_rotate_duration": "360h",
    "log_level": "info",
    "acl_datacenter": "sibat_consul",
    "encrypt": "gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=",
    "acl": {
        "enabled": true,
        "default_policy": "deny",
        "enable_token_persistence": true,
        "tokens": {
            "master": "8dc1eb67-1f5f-4e10-ad9d-5e58b047647c"
        }
    }
}

在三個節點中啓動
先在slave節點啓動小程序

$ systemctl restart consul
$ systemctl restart consul

以後再master啓動

$ systemctl restart consul

啓動後咱們會查看到服務器日誌中出現與權限有關的錯誤,根據官方文檔的說法是由於未配置agent的token致使的,所以還需初始化slave token:

$ curl     --request PUT     --header "X-Consul-Token: cd76a0f7-5535-40cc-8696-073462acc6c7"     --data '{
  "Name": "Agent Token",
  "Type": "client",
  "Rules": "node \"\" { policy = \"write\" } service \"\" { policy = \"read\" }"
}' http://172.16.100.71:8500/v1/acl/create
3.4 配置獲取到的agent token

compute-7-1:

{
    "bootstrap_expect": 1,
    "datacenter": "sibat_consul",
    "primary_datacenter":"sibat_consul",
    "data_dir": "/data/consul",
    "start_join":[
        "172.16.100.72",
        "172.16.100.73"
    ],
    "retry_join":[
        "172.16.100.72",
        "172.16.100.73"
    ],
    "connect":{
        "enabled": true
    },
    "server": true,
    "client_addr": "0.0.0.0",
    "ui": true,
    "node_name": "compute-7-1",
    "bind_addr": "172.16.100.71",
    "advertise_addr": "172.16.100.71",
    "enable_script_checks": false,
    "enable_local_script_checks": true,
    "log_file": "/var/log",
    "log_rotate_bytes": 300000000,
    "log_rotate_duration": "360h",
    "log_level": "info",
    "encrypt": "gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=",
    "acl": {
        "enabled": true,
        "default_policy": "deny",
        "enable_token_persistence": true,
        "tokens": {
            "master": "8dc1eb67-1f5f-4e10-ad9d-5e58b047647c",
            "agent": "883efc94-0c59-c46f-67cf-4644ac4adad2"
        }
    }
}

compute-7-2

{
    "datacenter": "sibat_consul",
    "primary_datacenter":"sibat_consul",
    "data_dir": "/data/consul",
    "connect":{
        "enabled": true
    },
    "server": true,
    "client_addr": "0.0.0.0",
    "ui": true,
    "node_name": "compute-7-2",
    "bind_addr": "172.16.100.72",
    "advertise_addr": "172.16.100.72",
    "enable_script_checks": false,
    "enable_local_script_checks": true,
    "log_file": "/var/log",
    "log_rotate_bytes": 300000000,
    "log_rotate_duration": "360h",
    "log_level": "info",
    "acl_datacenter": "sibat_consul",
    "encrypt": "gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=",
    "acl": {
        "enabled": true,
        "default_policy": "deny",
        "enable_token_persistence": true,
        "tokens": {
            "master": "8dc1eb67-1f5f-4e10-ad9d-5e58b047647c",
            "agent": "883efc94-0c59-c46f-67cf-4644ac4adad2"
        }
    }
}

compute-7-3

{
    "datacenter": "sibat_consul",
    "primary_datacenter":"sibat_consul",
    "data_dir": "/data/consul",
    "connect":{
        "enabled": true
    },
    "server": true,
    "client_addr": "0.0.0.0",
    "ui": true,
    "node_name": "compute-7-3",
    "bind_addr": "172.16.100.73",
    "advertise_addr": "172.16.100.73",
    "enable_script_checks": false,
    "enable_local_script_checks": true,
    "log_file": "/var/log",
    "log_rotate_bytes": 300000000,
    "log_rotate_duration": "360h",
    "log_level": "info",
    "acl_datacenter": "sibat_consul",
    "encrypt": "gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=",
    "acl": {
        "enabled": true,
        "default_policy": "deny",
        "enable_token_persistence": true,
        "tokens": {
            "master": "8dc1eb67-1f5f-4e10-ad9d-5e58b047647c",
            "agent": "883efc94-0c59-c46f-67cf-4644ac4adad2"
        }
    }
}

在三個節點中啓動
先在slave節點啓動

$ systemctl restart consul
$ systemctl restart consul

以後再master啓動

$ systemctl restart consul

待集羣穩定後便可訪問UI,http://172.16.100.71:8500

4. 集成Prometheus

$ sudo vim /etc/prometheus/prometheus.yml
...
  - job_name: 'OpenStack-vms'
    consul_sd_configs:
      - server: "172.16.100.71:8500"
        token: '8dc1eb67-1f5f-4e10-ad9d-5e58b047647c'
        services: []
      - server: "172.16.100.72:8500"
        token: '8dc1eb67-1f5f-4e10-ad9d-5e58b047647c'
        services: []
      - server: "172.16.100.73:8500"
        token: '8dc1eb67-1f5f-4e10-ad9d-5e58b047647c'
        services: []
    relabel_configs:
      - source_labels: [__meta_consul_tags]
        regex: ".*OpenStack-vms.*"
        replacement: OpenStack-vms
        action: keep
        target_label: env
      - regex: __meta_consul_service_metadata_(.+)
        action: labelmap
...
$ sudo systemctl restart prometheus

啓動後,在prometheus UI就能夠找到剛纔配置的job_name了:
TIM圖片20200611134431.png

5. VMS自動註冊

問題:關於自動註冊,原生的組件中都沒有較美好的方案。我剛開始使用curl的方式經過shell寫入rc.local的方式自動註冊,可是發現有時仍是會出現沒有註冊的狀況,再加上centos7的併發啓動的機制,使得這個過程並不友好。同時還發現consul並非強一致性的註冊中心,有時會出現相同的serviceid同時被註冊到不一樣的節點的狀況:
TIM圖片20200611135436.png
因此使用go語言開發了一個小程序自動註冊node_exporter,並使用systemd設置開機自啓動來達到自動註冊的效果,並經過一套算法來避免重複註冊以及實現均衡註冊。

5.1 Node_Exporter

$ wget https://github.com/prometheus/node_exporter/releases/download/v1.0.0/node_exporter-1.0.0.linux-amd64.tar.gz
$ tar -zxvf node_exporter-1.0.0.linux-amd64.tar.gz -C /usr/local/
$ mv /usr/local/node_exporter-1.0.0.linux-amd64.tar.gz /usr/local/node_exporter
$ vim /usr/lib/systemd/system/node_exporter.service 
[Unit]
Description=node_exporter: the monitoring system
Documentation=http://prometheus.io/docs/

[Service]
User=nobody
ExecStart=/usr/local/node_exporter/node_exporter
Restart=always
StartLimitInterval=0
RestartSec=10

[Install]
WantedBy=multi-user.target
$ systemctl daemon-reload && systemctl start node_exporter && systemctl enable node_exporter

5.2 consulR註冊小程序

安裝consulR小程序

$ wget https://github.com/FrankenFuncc/consul-registy-service/releases/download/202006161758/consulR.zip
$ unzip consulR.zip
$ cd consulR
$ chmod +x consulR
$ mv consulR /usr/local/
$ mkdir /data/consul/logs -p

配置文件

$ vim /etc/consul/consulR.yaml
System:
  ServiceName: consul-registy-service
  ListenAddress: 0.0.0.0
  Port: 9984
  #經過此IP與端口來檢索出口網卡IP地址
  FindAddress: 8.8.8.8:80
Logs:
  LogFilePath: /data/consul/consul.log
  LogLevel: info
Consul:
  Address: 172.16.100.71:8500,172.16.100.72:8500,172.16.100.73:8500
  #Consul Master Token
  Token: 8dc1eb67-1f5f-4e10-ad9d-5e58b047647c
  CheckTimeout: 5s
  CheckInterval: 5s
  #關於虛機刪除或宕機可在此配置,consul爲保持立即狀態或自動清理
  CheckDeregisterCriticalServiceAfter: true
  CheckDeregisterCriticalServiceAfterTime: 5s
Service:
  Tag: node-exporter
  #Address空則默認經過FindAddress配置來檢索出口網卡IP地址
  Address:
  Port: 9100
$ chown -R nobody.nobody /etc/consul/consulR.yaml

使用systemd管理

$ vim /usr/lib/systemd/system/consulR.service 
[Unit]
Description=Consul
After=network-online.target

[Service]
User=nobody
ExecStart=/usr/local/consulR --confpath=/etc/consul/consulR.yaml
Restart=on-failure
RestartSec=1

[Install]
WantedBy=multi-user.target

設置開機自啓動

$ systemctl daemon-reload && systemctl start consulR && systemctl enable consulR

VM關機

$ poweroff

製做鏡像

$ qemu-img convert -c disk -O qcow2 centos-fantasy.qcow2
$ openstack image create "CentOS7-Fantasy" --file centos-fantasy.qcow2   --disk-format qcow2  --container-format bare --public

建立鏡像後,用這個鏡像建立虛擬機,將會自動把9100註冊到consul集羣,以後就能被Prometheus自動發現了。
TIM圖片20200617093939.png

6. 監控可視化

在Grafana導入8919模板
image.png
image.png
image.png
image.png這樣就能夠在instance看到自動發現後的監控主機詳情了。。。很簡單對吧?

相關文章
相關標籤/搜索