1、Prometheus概述node
Prometheus是一個開源系統監測和警報工具箱。
git
主要特徵: github
多維數據模型(時間序列由metri和key/value定義)docker
靈活的查詢語言vim
不依賴分佈式存儲api
採用 http 協議,使用 pull 拉取數據bash
能夠經過push gateway進行時序列數據推送架構
可經過服務發現或靜態配置發現目標app
多種可視化圖表及儀表盤支持分佈式
Prometheus架構以下:
Prometheus組件包括:Prometheus server、push gateway 、alertmanager、Web UI等。
Prometheus server 按期從數據源拉取數據,而後將數據持久化到磁盤。Prometheus 能夠配置 rules,而後定時查詢數據,當條件觸發的時候,會將 alert 推送到配置的 Alertmanager。Alertmanager 收到警告的時候,能夠根據配置,聚合並記錄新時間序列,或者生成警報。同時還可使用其餘 API 或者 Grafana 來將收集到的數據進行可視化。
2、安裝Prometheus Operator
1.Prometheus Operator簡化了在 Kubernetes 上部署並管理和運行 Prometheus 和 Alertmanager 集羣。
# wget https://codeload.github.com/coreos/prometheus-operator/tar.gz/v0.18.0 -O prometheus-operator-0.18.0.tar.gz # tar -zxvf prometheus-operator-0.18.0.tar.gz # cd prometheus-operator-0.18.0 # kubectl apply -f bundle.yaml clusterrolebinding "prometheus-operator" configured clusterrole "prometheus-operator" configured serviceaccount "prometheus-operator" created deployment "prometheus-operator" created # cd contrib/kube-prometheus # hack/cluster-monitoring/deploy namespace "monitoring" created clusterrolebinding "prometheus-operator" created clusterrole "prometheus-operator" created serviceaccount "prometheus-operator" created service "prometheus-operator" created deployment "prometheus-operator" created Waiting for Operator to register custom resource definitions...done! clusterrolebinding "node-exporter" created clusterrole "node-exporter" created daemonset "node-exporter" created serviceaccount "node-exporter" created service "node-exporter" created clusterrolebinding "kube-state-metrics" created clusterrole "kube-state-metrics" created deployment "kube-state-metrics" created rolebinding "kube-state-metrics" created role "kube-state-metrics-resizer" created serviceaccount "kube-state-metrics" created service "kube-state-metrics" created secret "grafana-credentials" created secret "grafana-credentials" created configmap "grafana-dashboard-definitions-0" created configmap "grafana-dashboards" created configmap "grafana-datasources" created deployment "grafana" created service "grafana" created configmap "prometheus-k8s-rules" created serviceaccount "prometheus-k8s" created servicemonitor "alertmanager" created servicemonitor "kube-apiserver" created servicemonitor "kube-controller-manager" created servicemonitor "kube-scheduler" created servicemonitor "kube-state-metrics" created servicemonitor "kubelet" created servicemonitor "node-exporter" created servicemonitor "prometheus-operator" created servicemonitor "prometheus" created service "prometheus-k8s" created prometheus "k8s" created role "prometheus-k8s" created role "prometheus-k8s" created role "prometheus-k8s" created clusterrole "prometheus-k8s" created rolebinding "prometheus-k8s" created rolebinding "prometheus-k8s" created rolebinding "prometheus-k8s" created clusterrolebinding "prometheus-k8s" created secret "alertmanager-main" created service "alertmanager-main" created alertmanager "main" created # kubectl get pod -n monitoring NAME READY STATUS RESTARTS AGE alertmanager-main-0 2/2 Running 0 15h alertmanager-main-1 2/2 Running 0 15h alertmanager-main-2 2/2 Running 0 15h grafana-567fcdf7b7-44ldd 1/1 Running 0 15h kube-state-metrics-76b4dc5ffb-2vbh9 4/4 Running 0 15h node-exporter-9wm8c 2/2 Running 0 15h node-exporter-kf6mq 2/2 Running 0 15h node-exporter-xtm4r 2/2 Running 0 15h prometheus-k8s-0 2/2 Running 0 15h prometheus-k8s-1 2/2 Running 0 15h prometheus-operator-7466f6887f-9nsk8 1/1 Running 0 15h # kubectl -n monitoring get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE alertmanager-main NodePort 10.244.69.39 <none> 9093:30903/TCP 15h alertmanager-operated ClusterIP None <none> 9093/TCP,6783/TCP 15h grafana NodePort 10.244.86.54 <none> 3000:30902/TCP 15h kube-state-metrics ClusterIP None <none> 8443/TCP,9443/TCP 15h node-exporter ClusterIP None <none> 9100/TCP 15h prometheus-k8s NodePort 10.244.226.104 <none> 9090:30900/TCP 15h prometheus-operated ClusterIP None <none> 9090/TCP 15h prometheus-operator ClusterIP 10.244.9.203 <none> 8080/TCP 15h # kubectl -n monitoring get endpoints NAME ENDPOINTS AGE alertmanager-main 10.244.2.10:9093,10.244.35.4:9093,10.244.91.5:9093 15h alertmanager-operated 10.244.2.10:9093,10.244.35.4:9093,10.244.91.5:9093 + 3 more... 15h grafana 10.244.2.8:3000 15h kube-state-metrics 10.244.2.9:9443,10.244.2.9:8443 15h node-exporter 192.168.100.102:9100,192.168.100.103:9100,192.168.100.105:9100 15h prometheus-k8s 10.244.2.11:9090,10.244.35.5:9090 15h prometheus-operated 10.244.2.11:9090,10.244.35.5:9090 15h prometheus-operator 10.244.35.3:8080 15h # kubectl -n monitoring get servicemonitors NAME AGE alertmanager 15h kube-apiserver 15h kube-controller-manager 15h kube-scheduler 15h kube-state-metrics 15h kubelet 15h node-exporter 15h prometheus 15h prometheus-operator 15h # kubectl get customresourcedefinitions NAME AGE alertmanagers.monitoring.coreos.com 11d prometheuses.monitoring.coreos.com 11d servicemonitors.monitoring.coreos.com 11d
注:
在部署過程當中我將鏡像地址都更改成從本地鏡像倉庫進行拉取,可是有pod依然會從遠端拉取鏡像,以下:
這裏我是沒法拉取alertmanager的鏡像,解決方法就是先將該鏡像拉取到本地,而後打包分發至各節點:
# docker save 23744b2d645c -o alertmanager-v0.14.0.tar.gz # ansible node -m copy -a 'src=alertmanager-v0.14.0.tar.gz dest=/root' # ansible node -a 'docker load -i /root/alertmanager-v0.14.0.tar.gz' 192.168.100.104 | SUCCESS | rc=0 >> Loaded image ID: sha256:23744b2d645c0574015adfba4a90283b79251aee3169dbe67f335d8465a8a63f 192.168.100.103 | SUCCESS | rc=0 >> Loaded image ID: sha256:23744b2d645c0574015adfba4a90283b79251aee3169dbe67f335d8465a8a63f # ansible node -a 'docker images quay.io/prometheus/alertmanager' 192.168.100.103 | SUCCESS | rc=0 >> REPOSITORY TAG IMAGE ID CREATED SIZE quay.io/prometheus/alertmanager v0.14.0 23744b2d645c 7 weeks ago 31.9MB 192.168.100.104 | SUCCESS | rc=0 >> REPOSITORY TAG IMAGE ID CREATED SIZE quay.io/prometheus/alertmanager v0.14.0 23744b2d645c 7 weeks ago 31.9MB
2.添加 etcd 監控
Prometheus Operator有 etcd 儀表盤,可是須要額外的配置才能徹底監控顯示。官方文檔:Monitoring external etcd
a.在 namespace 中建立secrets
# kubectl -n monitoring create secret generic etcd-certs --from-file=/etc/kubernetes/ssl/ca.pem --from-file=/etc/kubernetes/ssl/etcd.pem --from-file=/etc/kubernetes/ssl/etcd-key.pem secret "etcd-certs" created # kubectl -n monitoring get secrets etcd-certs NAME TYPE DATA AGE etcd-certs Opaque 3 16h
注:這裏的證書是在部署 etcd 集羣時建立,請更改成本身證書存放的路徑。
b.使Prometheus Operator接入secret
# vim manifests/prometheus/prometheus-k8s.yaml apiVersion: monitoring.coreos.com/v1 kind: Prometheus metadata: name: k8s labels: prometheus: k8s spec: replicas: 2 secrets: - etcd-certs version: v2.2.1 # kubectl -n monitoring replace -f manifests/prometheus/prometheus-k8s.yaml prometheus "k8s" replaced
注:
這裏只需加入以下項便可:
secrets: - etcd-certs
c.建立Service、Endpoints和ServiceMonitor服務
# vim manifests/prometheus/prometheus-etcd.yaml apiVersion: v1 kind: Service metadata: name: etcd-k8s labels: k8s-app: etcd spec: type: ClusterIP clusterIP: None ports: - name: api port: 2379 protocol: TCP --- apiVersion: v1 kind: Endpoints metadata: name: etcd-k8s labels: k8s-app: etcd subsets: - addresses: - ip: 192.168.100.102 nodeName: etcd1 - ip: 192.168.100.103 nodeName: etcd2 - ip: 192.168.100.104 nodeName: etcd3 ports: - name: api port: 2379 protocol: TCP --- apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: etcd-k8s labels: k8s-app: etcd-k8s spec: jobLabel: k8s-app endpoints: - port: api interval: 30s scheme: https tlsConfig: caFile: /etc/prometheus/secrets/etcd-certs/ca.pem certFile: /etc/prometheus/secrets/etcd-certs/etcd.pem keyFile: /etc/prometheus/secrets/etcd-certs/etcd-key.pem #use insecureSkipVerify only if you cannot use a Subject Alternative Name insecureSkipVerify: true selector: matchLabels: k8s-app: etcd namespaceSelector: matchNames: - monitoring # kubectl create -f manifests/prometheus/prometheus-etcd.yaml
注1:請將 etcd 的ip地址和 etcd 的節點名更改成自行配置的ip和節點名。
注2:在 tlsconfig 下邊的三項只需更改最後的ca.pem、etcd.pem、etcd-key.pem爲本身相應的證書名便可。如實在不瞭解,可登錄進 prometheus-k8s 的pod進行查看:
# kubectl exec -ti -n monitoring prometheus-k8s-0 /bin/sh Defaulting container name to prometheus. Use 'kubectl describe pod/prometheus-k8s-0 -n monitoring' to see all of the containers in this pod. /prometheus $ ls /etc/prometheus/secrets/etcd-certs/ ca.pem etcd-key.pem etcd.pem
3.Prometheus Operator 部署完成後會對外暴露三個端口:30900爲Prometheus端口、30902爲grafana端口、30903爲alertmanager端口。
Prometheus顯示以下,如何一切正常,全部target都應該是up的。
Alertmanager顯示以下
Grafana的監控項顯示以下
etcd相關監控項顯示以下
kubernetes集羣顯示以下
節點監控顯示以下