注意:沒有使用supervisor進程管理器的,只參考配置,忽略和supervisor相關命令。而且alertmanager的版本不得低於0.15.2,低版本alert不支持集羣配置。node
cd /data/yy-monitor-server/etcnginx
vim alertmanager.ymlgit
# The root route on which each incoming alert enters.
route:
routes:
group_wait: 15s
group_interval: 15s
|
根目錄下運行 vim /etc/supervisord.d/yy-monitor-server.inigithub
[program:alertmanager]
priority = 3
user = yy
command
=
/usr/bin/alertmanager
--cluster.listen-address=
"10.22.0.1002:12001"
# 當前節點ip和自定義的端口號
--log.level=debug
|
其餘節點配置:vim
[program:alertmanager]
priority = 3
user = yy
command
=
/usr/bin/alertmanager
--cluster.listen-address=
"10.22.0.1001:12002"
# 當前節點ip和自定義的端口號:
--cluster.peer=10.22.0.1002:12001
# 選擇一個節點加入集羣
--log.level=debug
|
重啓配置,不然不能生效:api
systemctl restart supervisordbash
supervisorctl restart alertmanager服務器
cd /data/yy-monitor-server/logapp
tail -f alermanager.logdom
level=debug ts=2018-08-28T08:58:44.75092899Z caller=cluster.go:287 component=cluster memberlist=
"2018/08/28 16:58:44 [DEBUG] memberlist: Initiating push/pull sync with: 10.22.0.1001:12002\n"
level=debug ts=2018-08-28T08:59:21.675338872Z caller=cluster.go:287 component=cluster memberlist=
"2018/08/28 16:59:21 [DEBUG] memberlist: Stream connection from=10.22.0.1001:42736\n"
level=debug ts=2018-08-28T08:59:44.754235616Z caller=cluster.go:287 component=cluster memberlist=
"2018/08/28 16:59:44 [DEBUG] memberlist: Initiating push/pull sync with: 10.22.0.1000:12003\n"
|
啓動完成後訪問任意Alertmanager節點http://localhost:9093/#/status,能夠查看當前Alertmanager集羣的狀態。
cd /data/yy-monitor-server/etc
vi prometheus.yml
global:
scrape_interval: 5s
scrape_timeout: 5s
evaluation_interval: 5s
# The labels to add to any time series or alerts when communicating with
# external systems (federation, remote storage, Alertmanager).
external_labels:
dc
: europe1
# Alertmanager configuration
alerting:
alert_relabel_configs:
- source_labels: [
dc
]
regex: (.+)\d+
target_label:
dc
alertmanagers:
- static_configs:
- targets:: [
'10.22.0.1000:9093'
,
'10.22.0.1001:9093'
,
'10.22.0.1002:9093'
]
|
global:
scrape_interval: 5s
scrape_timeout: 5s
evaluation_interval: 5s
# Note that this is different only by the trailing number.
external_labels:
dc
: europe2
# Alertmanager configuration
alerting:
alert_relabel_configs:
- source_labels: [
dc
]
regex: (.+)\d+
target_label:
dc
alertmanagers:
- static_configs:
- targets:: [
'10.22.0.1000:9093'
,
'10.22.0.1001:9093'
,
'10.22.0.1002:9093'
]
|
global:
scrape_interval: 5s
scrape_timeout: 5s
evaluation_interval: 5s
external_labels:
dc
: europe3
# Alertmanager configuration
alerting:
alert_relabel_configs:
- source_labels: [
dc
]
regex: (.+)\d+
target_label:
dc
alertmanagers:
- static_configs:
- targets:: [
'10.22.0.1000:9093'
,
'10.22.0.1001:9093'
,
'10.22.0.1002:9093'
]
|
# supervisorctl restart prometheus
prometheus: stopped
prometheus: started
|
選取一臺主機作配置(如:10.22.0.1002)
cd /data/yy-monitor-server/etc
vi nginx.conf
# Alertmanager
upstream alert{
server 10.22.0.1002:9093;
server 10.22.0.1001:9093;
server 10.22.0.1000:9093;
}
server{
# alertmanager
location
/alertmanager/
{
proxy_pass http:
//alert/
;
}
}
|
重啓nginx
# supervisorctl restart nginx
nginx: stopped
nginx: started
|
中止其中兩臺服務:
1002
# supervisorctl stop alertmanager
alertmanager: stopped
1001
# supervisorctl stop alertmanager
alertmanager: stopped
|
訪問ui正常,配置代理成功。
To create a highly available cluster of the Alertmanager the instances need to be configured to communicate with each other. This is configured using the --cluster.*
flags.
--cluster.listen-address
string: cluster listen address (default "0.0.0.0:9094")--cluster.advertise-address
string: cluster advertise address--cluster.peer
value: initial peers (repeat flag for each additional peer)--cluster.peer-timeout
value: peer timeout period (default "15s")--cluster.gossip-interval
value: cluster message propagation speed (default "200ms")--cluster.pushpull-interval
value: lower values will increase convergence speeds at expense of bandwidth (default "1m0s")--cluster.settle-timeout
value: maximum time to wait for cluster connections to settle before evaluating notifications.--cluster.tcp-timeout
value: timeout value for tcp connections, reads and writes (default "10s")--cluster.probe-timeout
value: time to wait for ack before marking node unhealthy (default "500ms")--cluster.probe-interval
value: interval between random node probes (default "1s")The chosen port in the cluster.listen-address
flag is the port that needs to be specified in the cluster.peer
flag of the other peers.
To start a cluster of three peers on your local machine use goreman
and the Procfile within this repository.
goreman start
To point your Prometheus 1.4, or later, instance to multiple Alertmanagers, configure them in your prometheus.yml
configuration file, for example:
alerting:
alertmanagers: - static_configs: - targets: - alertmanager1:9093 - alertmanager2:9093 - alertmanager3:9093
Important: Do not load balance traffic between Prometheus and its Alertmanagers, but instead point Prometheus to a list of all Alertmanagers. The Alertmanager implementation expects all alerts to be sent to all Alertmanagers to ensure high availability.