在工做中OpenStack集羣的vm須要解決基礎性能指標的監控,若是每臺的啓動再去手動添加監控node_exporter,再寫prometheus.yml的話,對於吾等懶程序員簡直就是噩夢,由此開始設計基於Prometheus+Consul的監控方案。node
1. 經過將node_exporter打包進Image實現強制自動部署 2. 經過開發一個小程序自動註冊node_exporter到consul,同時小程序也與node_exporter同樣打包進Image 3. 配置Prometheus經過consul來發現node_exporter節點
系統 | 主機名 | IP |
---|---|---|
Centos-7.7 | compute-7-1 | 172.16.100.71 |
Centos-7.7 | compute-7-2 | 172.16.100.72 |
Centos-7.7 | compute-7-3 | 172.16.100.73 |
Consul v1.7.2
全部節點分別安裝consullinux
$ wget https://releases.hashicorp.com/consul/1.7.2/consul_1.7.2_linux_amd64.zip $ unzip consul_1.7.2_linux_amd64.zip $ mv consul_1.7.2/consul /usr/bin/ $ mkdir /data/consul $ mkdir /etc/consul.d $ useradd consul
全部節點分別修改配置文件git
$ vim /etc/consul.d/consul_config.json { "bootstrap_expect": 1, "datacenter": "sibat_consul", "data_dir": "/data/consul", "node_name": "compute-7-1", "server": true, "client_addr": "0.0.0.0", "ui": true, "bind_addr": "172.16.100.71" }
$ vim /etc/consul.d/consul_config.json { "bootstrap_expect": 1, "datacenter": "sibat_consul", "data_dir": "/data/consul", "node_name": "compute-7-2", "server": true, "client_addr": "0.0.0.0", "ui": true, "bind_addr": "172.16.100.72" }
$ vim /etc/consul.d/consul_config.json { "bootstrap_expect": 1, "datacenter": "sibat_consul", "data_dir": "/data/consul", "node_name": "compute-7-3", "server": true, "client_addr": "0.0.0.0", "ui": true, "bind_addr": "172.16.100.73" } } }
全部節點分別配置systemd,啓動consul並設置開機自啓動程序員
$ vim /usr/lib/systemd/system/consul.service [Unit] Description=consul: the monitoring system Documentation=http://prometheus.io/docs/ [Service] User=consul Group=consul ExecStart=/usr/bin/consul agent -config-file /etc/consul.d/consul_config.json KillMode=process Restart=on-failure LimitNOFILE=65536 [Install] WantedBy=multi-user.target
$ systemctl daemon-reload && systemctl start consul && systemctl enable consul $ systemctl daemon-reload && systemctl start consul && systemctl enable consul $ systemctl daemon-reload && systemctl start consul && systemctl enable consul
初始化master tokengithub
$ curl \ --request PUT \ http://172.16.100.71:8500/v1/acl/bootstrap `{"ID":"8dc1eb67-1f5f-4e10-ad9d-5e58b047647c"}`
獲取encrypt算法
$ consul keygen gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=
compute-7-1:shell
{ "bootstrap_expect": 1, "datacenter": "sibat_consul", "primary_datacenter":"sibat_consul", "data_dir": "/data/consul", "start_join":[ "172.16.100.72", "172.16.100.73" ], "retry_join":[ "172.16.100.72", "172.16.100.73" ], "connect":{ "enabled": true }, "server": true, "client_addr": "0.0.0.0", "ui": true, "node_name": "compute-7-1", "bind_addr": "172.16.100.71", "advertise_addr": "172.16.100.71", "enable_script_checks": false, "enable_local_script_checks": true, "log_file": "/var/log", "log_rotate_bytes": 300000000, "log_rotate_duration": "360h", "log_level": "info", "encrypt": "gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=", "acl": { "enabled": true, "default_policy": "deny", "enable_token_persistence": true, "tokens": { "master": "8dc1eb67-1f5f-4e10-ad9d-5e58b047647c" } } }
compute-7-2json
{ "datacenter": "sibat_consul", "primary_datacenter":"sibat_consul", "data_dir": "/data/consul", "connect":{ "enabled": true }, "server": true, "client_addr": "0.0.0.0", "ui": true, "node_name": "compute-7-2", "bind_addr": "172.16.100.72", "advertise_addr": "172.16.100.72", "enable_script_checks": false, "enable_local_script_checks": true, "log_file": "/var/log", "log_rotate_bytes": 300000000, "log_rotate_duration": "360h", "log_level": "info", "acl_datacenter": "sibat_consul", "encrypt": "gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=", "acl": { "enabled": true, "default_policy": "deny", "enable_token_persistence": true, "tokens": { "master": "8dc1eb67-1f5f-4e10-ad9d-5e58b047647c" } } }
compute-7-3bootstrap
{ "datacenter": "sibat_consul", "primary_datacenter":"sibat_consul", "data_dir": "/data/consul", "connect":{ "enabled": true }, "server": true, "client_addr": "0.0.0.0", "ui": true, "node_name": "compute-7-3", "bind_addr": "172.16.100.73", "advertise_addr": "172.16.100.73", "enable_script_checks": false, "enable_local_script_checks": true, "log_file": "/var/log", "log_rotate_bytes": 300000000, "log_rotate_duration": "360h", "log_level": "info", "acl_datacenter": "sibat_consul", "encrypt": "gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=", "acl": { "enabled": true, "default_policy": "deny", "enable_token_persistence": true, "tokens": { "master": "8dc1eb67-1f5f-4e10-ad9d-5e58b047647c" } } }
在三個節點中啓動
先在slave節點啓動小程序
$ systemctl restart consul $ systemctl restart consul
以後再master啓動
$ systemctl restart consul
啓動後咱們會查看到服務器日誌中出現與權限有關的錯誤,根據官方文檔的說法是由於未配置agent的token致使的,所以還需初始化slave token:
$ curl --request PUT --header "X-Consul-Token: cd76a0f7-5535-40cc-8696-073462acc6c7" --data '{ "Name": "Agent Token", "Type": "client", "Rules": "node \"\" { policy = \"write\" } service \"\" { policy = \"read\" }" }' http://172.16.100.71:8500/v1/acl/create
compute-7-1:
{ "bootstrap_expect": 1, "datacenter": "sibat_consul", "primary_datacenter":"sibat_consul", "data_dir": "/data/consul", "start_join":[ "172.16.100.72", "172.16.100.73" ], "retry_join":[ "172.16.100.72", "172.16.100.73" ], "connect":{ "enabled": true }, "server": true, "client_addr": "0.0.0.0", "ui": true, "node_name": "compute-7-1", "bind_addr": "172.16.100.71", "advertise_addr": "172.16.100.71", "enable_script_checks": false, "enable_local_script_checks": true, "log_file": "/var/log", "log_rotate_bytes": 300000000, "log_rotate_duration": "360h", "log_level": "info", "encrypt": "gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=", "acl": { "enabled": true, "default_policy": "deny", "enable_token_persistence": true, "tokens": { "master": "8dc1eb67-1f5f-4e10-ad9d-5e58b047647c", "agent": "883efc94-0c59-c46f-67cf-4644ac4adad2" } } }
compute-7-2
{ "datacenter": "sibat_consul", "primary_datacenter":"sibat_consul", "data_dir": "/data/consul", "connect":{ "enabled": true }, "server": true, "client_addr": "0.0.0.0", "ui": true, "node_name": "compute-7-2", "bind_addr": "172.16.100.72", "advertise_addr": "172.16.100.72", "enable_script_checks": false, "enable_local_script_checks": true, "log_file": "/var/log", "log_rotate_bytes": 300000000, "log_rotate_duration": "360h", "log_level": "info", "acl_datacenter": "sibat_consul", "encrypt": "gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=", "acl": { "enabled": true, "default_policy": "deny", "enable_token_persistence": true, "tokens": { "master": "8dc1eb67-1f5f-4e10-ad9d-5e58b047647c", "agent": "883efc94-0c59-c46f-67cf-4644ac4adad2" } } }
compute-7-3
{ "datacenter": "sibat_consul", "primary_datacenter":"sibat_consul", "data_dir": "/data/consul", "connect":{ "enabled": true }, "server": true, "client_addr": "0.0.0.0", "ui": true, "node_name": "compute-7-3", "bind_addr": "172.16.100.73", "advertise_addr": "172.16.100.73", "enable_script_checks": false, "enable_local_script_checks": true, "log_file": "/var/log", "log_rotate_bytes": 300000000, "log_rotate_duration": "360h", "log_level": "info", "acl_datacenter": "sibat_consul", "encrypt": "gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=", "acl": { "enabled": true, "default_policy": "deny", "enable_token_persistence": true, "tokens": { "master": "8dc1eb67-1f5f-4e10-ad9d-5e58b047647c", "agent": "883efc94-0c59-c46f-67cf-4644ac4adad2" } } }
在三個節點中啓動
先在slave節點啓動
$ systemctl restart consul $ systemctl restart consul
以後再master啓動
$ systemctl restart consul
待集羣穩定後便可訪問UI,http://172.16.100.71:8500
$ sudo vim /etc/prometheus/prometheus.yml ... - job_name: 'OpenStack-vms' consul_sd_configs: - server: "172.16.100.71:8500" token: '8dc1eb67-1f5f-4e10-ad9d-5e58b047647c' services: [] - server: "172.16.100.72:8500" token: '8dc1eb67-1f5f-4e10-ad9d-5e58b047647c' services: [] - server: "172.16.100.73:8500" token: '8dc1eb67-1f5f-4e10-ad9d-5e58b047647c' services: [] relabel_configs: - source_labels: [__meta_consul_tags] regex: ".*OpenStack-vms.*" replacement: OpenStack-vms action: keep target_label: env - regex: __meta_consul_service_metadata_(.+) action: labelmap ...
$ sudo systemctl restart prometheus
啓動後,在prometheus UI就能夠找到剛纔配置的job_name了:
問題:關於自動註冊,原生的組件中都沒有較美好的方案。我剛開始使用curl的方式經過shell寫入rc.local的方式自動註冊,可是發現有時仍是會出現沒有註冊的狀況,再加上centos7的併發啓動的機制,使得這個過程並不友好。同時還發現consul並非強一致性的註冊中心,有時會出現相同的serviceid同時被註冊到不一樣的節點的狀況:
因此使用go語言開發了一個小程序自動註冊node_exporter,並使用systemd設置開機自啓動來達到自動註冊的效果,並經過一套算法來避免重複註冊以及實現均衡註冊。
$ wget https://github.com/prometheus/node_exporter/releases/download/v1.0.0/node_exporter-1.0.0.linux-amd64.tar.gz $ tar -zxvf node_exporter-1.0.0.linux-amd64.tar.gz -C /usr/local/ $ mv /usr/local/node_exporter-1.0.0.linux-amd64.tar.gz /usr/local/node_exporter
$ vim /usr/lib/systemd/system/node_exporter.service [Unit] Description=node_exporter: the monitoring system Documentation=http://prometheus.io/docs/ [Service] User=nobody ExecStart=/usr/local/node_exporter/node_exporter Restart=always StartLimitInterval=0 RestartSec=10 [Install] WantedBy=multi-user.target
$ systemctl daemon-reload && systemctl start node_exporter && systemctl enable node_exporter
安裝consulR小程序
$ wget https://github.com/FrankenFuncc/consul-registy-service/releases/download/202006161758/consulR.zip $ unzip consulR.zip $ cd consulR $ chmod +x consulR $ mv consulR /usr/local/ $ mkdir /data/consul/logs -p
配置文件
$ vim /etc/consul/consulR.yaml System: ServiceName: consul-registy-service ListenAddress: 0.0.0.0 Port: 9984 #經過此IP與端口來檢索出口網卡IP地址 FindAddress: 8.8.8.8:80 Logs: LogFilePath: /data/consul/consul.log LogLevel: info Consul: Address: 172.16.100.71:8500,172.16.100.72:8500,172.16.100.73:8500 #Consul Master Token Token: 8dc1eb67-1f5f-4e10-ad9d-5e58b047647c CheckTimeout: 5s CheckInterval: 5s #關於虛機刪除或宕機可在此配置,consul爲保持立即狀態或自動清理 CheckDeregisterCriticalServiceAfter: true CheckDeregisterCriticalServiceAfterTime: 5s Service: Tag: node-exporter #Address空則默認經過FindAddress配置來檢索出口網卡IP地址 Address: Port: 9100
$ chown -R nobody.nobody /etc/consul/consulR.yaml
使用systemd管理
$ vim /usr/lib/systemd/system/consulR.service [Unit] Description=Consul After=network-online.target [Service] User=nobody ExecStart=/usr/local/consulR --confpath=/etc/consul/consulR.yaml Restart=on-failure RestartSec=1 [Install] WantedBy=multi-user.target
設置開機自啓動
$ systemctl daemon-reload && systemctl start consulR && systemctl enable consulR
VM關機
$ poweroff
製做鏡像
$ qemu-img convert -c disk -O qcow2 centos-fantasy.qcow2 $ openstack image create "CentOS7-Fantasy" --file centos-fantasy.qcow2 --disk-format qcow2 --container-format bare --public
建立鏡像後,用這個鏡像建立虛擬機,將會自動把9100註冊到consul集羣,以後就能被Prometheus自動發現了。
在Grafana導入8919模板這樣就能夠在instance看到自動發現後的監控主機詳情了。。。很簡單對吧?