Prometheus是一個開源系統監測和警報工具箱。 Prometheus Operator 是 CoreOS 開發的基於 Prometheus 的 Kubernetes 監控方案,也多是目前功能最全面的開源方案。node
主要特徵:git
1)多維數據模型(時間序列由metri和key/value定義)
2)靈活的查詢語言
3)不依賴分佈式存儲
4)採用 http 協議,使用 pull 拉取數據
5)能夠經過push gateway進行時序列數據推送
6)可經過服務發現或靜態配置發現目標
7)多種可視化圖表及儀表盤支持github
Prometheus架構以下:docker
Prometheus組件包括:Prometheus server、push gateway 、alertmanager、Web UI等。vim
Prometheus server 按期從數據源拉取數據,而後將數據持久化到磁盤。Prometheus 能夠配置 rules,而後定時查詢數據,當條件觸發的時候,會將 alert 推送到配置的 Alertmanager。Alertmanager 收到警告的時候,能夠根據配置,聚合並記錄新時間序列,或者生成警報。同時還可使用其餘 API 或者 Grafana 來將收集到的數據進行可視化。api
Prometheus Serverbash
Prometheus Server 負責從 Exporter 拉取和存儲監控數據,並提供一套靈活的查詢語言(PromQL)供用戶使用。架構
Exporterapp
Exporter 負責收集目標對象(host, container…)的性能數據,並經過 HTTP 接口供 Prometheus Server 獲取。分佈式
可視化組件
監控數據的可視化展示對於監控方案相當重要。之前 Prometheus 本身開發了一套工具,不事後來廢棄了,由於開源社區出現了更爲優秀的產品 Grafana。Grafana 可以與 Prometheus 無縫集成,提供完美的數據展現能力。
Alertmanager
用戶能夠定義基於監控數據的告警規則,規則會觸發告警。一旦 Alermanager 收到告警,會經過預約義的方式發出告警通知。支持的方式包括 Email、PagerDuty、Webhook 等。
Prometheus Operator 的目標是儘量簡化在 Kubernetes 中部署和維護 Prometheus 的工做。其架構以下圖所示:
圖上的每個對象都是 Kubernetes 中運行的資源。
Operator
Operator 即 Prometheus Operator,在 Kubernetes 中以 Deployment 運行。其職責是部署和管理 Prometheus Server,根據 ServiceMonitor 動態更新 Prometheus Server 的監控對象。
Prometheus Server
Prometheus Server 會做爲 Kubernetes 應用部署到集羣中。爲了更好地在 Kubernetes 中管理 Prometheus,CoreOS 的開發人員專門定義了一個命名爲 Prometheus 類型的 Kubernetes 定製化資源。咱們能夠把 Prometheus看做是一種特殊的 Deployment,它的用途就是專門部署 Prometheus Server。
Service
這裏的 Service 就是 Cluster 中的 Service 資源,也是 Prometheus 要監控的對象,在 Prometheus 中叫作 Target。每一個監控對象都有一個對應的 Service。好比要監控 Kubernetes Scheduler,就得有一個與 Scheduler 對應的 Service。固然,Kubernetes 集羣默認是沒有這個 Service 的,Prometheus Operator 會負責建立。
ServiceMonitor
Operator 可以動態更新 Prometheus 的 Target 列表,ServiceMonitor 就是 Target 的抽象。好比想監控 Kubernetes Scheduler,用戶能夠建立一個與 Scheduler Service 相映射的 ServiceMonitor 對象。Operator 則會發現這個新的 ServiceMonitor,並將 Scheduler 的 Target 添加到 Prometheus 的監控列表中。
ServiceMonitor 也是 Prometheus Operator 專門開發的一種 Kubernetes 定製化資源類型。
Alertmanager
除了 Prometheus 和 ServiceMonitor,Alertmanager 是 Operator 開發的第三種 Kubernetes 定製化資源。咱們能夠把 Alertmanager 看做是一種特殊的 Deployment,它的用途就是專門部署 Alertmanager 組件。
Prometheus Operator簡化了在 Kubernetes 上部署並管理和運行 Prometheus 和 Alertmanager 集羣。
一、all-kubernetes-cluster-node load images:
二、在部署節點執行以下:
(1)、裝備 prometheus-operator 安裝包並運行服務:
wget https://codeload.github.com/coreos/prometheus-operator/tar.gz/v0.18.0 -O prometheus-operator-0.18.0.tar.gz tar -zxvf prometheus-operator-0.18.0.tar.gz cd prometheus-operator-0.18.0 kubectl apply -f bundle.yaml clusterrolebinding "prometheus-operator" configured clusterrole "prometheus-operator" configured serviceaccount "prometheus-operator" created deployment "prometheus-operator" created
(2)、在master節點,建立etcd endpoint:
export NODE_IPS="192.168.210.161 192.168.210.162 192.168.210.163" for ip in ${NODE_IPS};do ETCDCTL_API=3 etcdctl --endpoints=https://${ip}:2379 --cacert=/etc/kubernetes/ssl/ca.pem --cert=/etc/etcd/ssl/etcd.pem --key=/etc/etcd/ssl/etcd-key.pem endpoint health;done
(3)、回到部署節點,建立prometheus-operator
cd contrib/kube-prometheus
hack/cluster-monitoring/deploy
#移除:
hack/cluster-monitoring/teardown
namespace "monitoring" created clusterrolebinding "prometheus-operator" created clusterrole "prometheus-operator" created serviceaccount "prometheus-operator" created service "prometheus-operator" created deployment "prometheus-operator" created Waiting for Operator to register custom resource definitions...done! clusterrolebinding "node-exporter" created clusterrole "node-exporter" created daemonset "node-exporter" created serviceaccount "node-exporter" created service "node-exporter" created clusterrolebinding "kube-state-metrics" created clusterrole "kube-state-metrics" created deployment "kube-state-metrics" created rolebinding "kube-state-metrics" created role "kube-state-metrics-resizer" created serviceaccount "kube-state-metrics" created service "kube-state-metrics" created secret "grafana-credentials" created secret "grafana-credentials" created configmap "grafana-dashboard-definitions-0" created configmap "grafana-dashboards" created configmap "grafana-datasources" created deployment "grafana" created service "grafana" created configmap "prometheus-k8s-rules" created serviceaccount "prometheus-k8s" created servicemonitor "alertmanager" created servicemonitor "kube-apiserver" created servicemonitor "kube-controller-manager" created servicemonitor "kube-scheduler" created servicemonitor "kube-state-metrics" created servicemonitor "kubelet" created servicemonitor "node-exporter" created servicemonitor "prometheus-operator" created servicemonitor "prometheus" created service "prometheus-k8s" created prometheus "k8s" created role "prometheus-k8s" created role "prometheus-k8s" created role "prometheus-k8s" created clusterrole "prometheus-k8s" created rolebinding "prometheus-k8s" created rolebinding "prometheus-k8s" created rolebinding "prometheus-k8s" created clusterrolebinding "prometheus-k8s" created secret "alertmanager-main" created service "alertmanager-main" created alertmanager "main" created
kubectl get pod -n monitoring
NAME READY STATUS RESTARTS AGE alertmanager-main-0 2/2 Running 0 15h alertmanager-main-1 2/2 Running 0 15h alertmanager-main-2 2/2 Running 0 15h grafana-567fcdf7b7-44ldd 1/1 Running 0 15h kube-state-metrics-76b4dc5ffb-2vbh9 4/4 Running 0 15h node-exporter-9wm8c 2/2 Running 0 15h node-exporter-kf6mq 2/2 Running 0 15h node-exporter-xtm4r 2/2 Running 0 15h prometheus-k8s-0 2/2 Running 0 15h prometheus-k8s-1 2/2 Running 0 15h prometheus-operator-7466f6887f-9nsk8 1/1 Running 0 15h
kubectl -n monitoring get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE alertmanager-main NodePort 10.244.69.39 <none> 9093:30903/TCP 15h alertmanager-operated ClusterIP None <none> 9093/TCP,6783/TCP 15h grafana NodePort 10.244.86.54 <none> 3000:30902/TCP 15h kube-state-metrics ClusterIP None <none> 8443/TCP,9443/TCP 15h node-exporter ClusterIP None <none> 9100/TCP 15h prometheus-k8s NodePort 10.244.226.104 <none> 9090:30900/TCP 15h prometheus-operated ClusterIP None <none> 9090/TCP 15h prometheus-operator ClusterIP 10.244.9.203 <none> 8080/TCP 15h
kubectl -n monitoring get endpoints
NAME ENDPOINTS AGE alertmanager-main 10.244.2.10:9093,10.244.35.4:9093,10.244.91.5:9093 15h alertmanager-operated 10.244.2.10:9093,10.244.35.4:9093,10.244.91.5:9093 + 3 more... 15h grafana 10.244.2.8:3000 15h kube-state-metrics 10.244.2.9:9443,10.244.2.9:8443 15h node-exporter 192.168.100.102:9100,192.168.100.103:9100,192.168.100.105:9100 15h prometheus-k8s 10.244.2.11:9090,10.244.35.5:9090 15h prometheus-operated 10.244.2.11:9090,10.244.35.5:9090 15h prometheus-operator 10.244.35.3:8080 15h
kubectl -n monitoring get servicemonitors
NAME AGE alertmanager 15h kube-apiserver 15h kube-controller-manager 15h kube-scheduler 15h kube-state-metrics 15h kubelet 15h node-exporter 15h prometheus 15h prometheus-operator 15h
kubectl get customresourcedefinitions
NAME AGE
alertmanagers.monitoring.coreos.com 11d
prometheuses.monitoring.coreos.com 11d
servicemonitors.monitoring.coreos.com 11d
注:在部署過程當中我將鏡像地址都更改成從本地鏡像倉庫進行拉取,可是有pod依然會從遠端拉取鏡像,以下
這裏我是沒法拉取alertmanager的鏡像,解決方法就是先將該鏡像拉取到本地,而後打包分發至各節點:
# docker save 23744b2d645c -o alertmanager-v0.14.0.tar.gz # ansible node -m copy -a 'src=alertmanager-v0.14.0.tar.gz dest=/root' # ansible node -a 'docker load -i /root/alertmanager-v0.14.0.tar.gz' 192.168.100.104 | SUCCESS | rc=0 >> Loaded image ID: sha256:23744b2d645c0574015adfba4a90283b79251aee3169dbe67f335d8465a8a63f 192.168.100.103 | SUCCESS | rc=0 >> Loaded image ID: sha256:23744b2d645c0574015adfba4a90283b79251aee3169dbe67f335d8465a8a63f # ansible node -a 'docker images quay.io/prometheus/alertmanager' 192.168.100.103 | SUCCESS | rc=0 >> REPOSITORY TAG IMAGE ID CREATED SIZE quay.io/prometheus/alertmanager v0.14.0 23744b2d645c 7 weeks ago 31.9MB 192.168.100.104 | SUCCESS | rc=0 >> REPOSITORY TAG IMAGE ID CREATED SIZE quay.io/prometheus/alertmanager v0.14.0 23744b2d645c 7 weeks ago 31.9MB
Prometheus Operator有 etcd 儀表盤,可是須要額外的配置才能徹底監控顯示。官方文檔:Monitoring external etcd
一、master節點上執行,在 namespace 中建立secrets
# kubectl -n monitoring create secret generic etcd-certs --from-file=/etc/kubernetes/ssl/ca.pem --from-file=/etc/etcd/ssl/etcd.pem --from-file=/etc/etcd/ssl/etcd-key.pem secret "etcd-certs" created # kubectl -n monitoring get secrets etcd-certs NAME TYPE DATA AGE etcd-certs Opaque 3 16h
注:這裏的證書是在部署 etcd 集羣時建立,請更改成本身證書存放的路徑。
二、使Prometheus Operator接入secret
# vim manifests/prometheus/prometheus-k8s.yaml apiVersion: monitoring.coreos.com/v1 kind: Prometheus metadata: name: k8s labels: prometheus: k8s spec: replicas: 2 secrets: - etcd-certs version: v2.2.1
sed -i '/replicas:/a\ secrets:\n - etcd-certs' manifests/prometheus/prometheus-k8s.yaml kubectl -n monitoring replace -f manifests/prometheus/prometheus-k8s.yaml prometheus "k8s" replaced
注:這裏只需加入以下項便可:
secrets: - etcd-certs
三、建立Service、Endpoints和ServiceMonitor服務
# vim manifests/prometheus/prometheus-etcd.yaml apiVersion: v1 kind: Service metadata: name: etcd-k8s labels: k8s-app: etcd spec: type: ClusterIP clusterIP: None ports: - name: api port: 2379 protocol: TCP --- apiVersion: v1 kind: Endpoints metadata: name: etcd-k8s labels: k8s-app: etcd subsets: - addresses: - ip: 192.168.210.161 nodeName: 192.168.210.161 - ip: 192.168.210.162 nodeName: 192.168.210.162 - ip: 192.168.210.163 nodeName: 192.168.210.163 ports: - name: api port: 2379 protocol: TCP --- apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: etcd-k8s labels: k8s-app: etcd-k8s spec: jobLabel: k8s-app endpoints: - port: api interval: 30s scheme: https tlsConfig: caFile: /etc/prometheus/secrets/etcd-certs/ca.pem certFile: /etc/prometheus/secrets/etcd-certs/etcd.pem keyFile: /etc/prometheus/secrets/etcd-certs/etcd-key.pem #use insecureSkipVerify only if you cannot use a Subject Alternative Name insecureSkipVerify: true selector: matchLabels: k8s-app: etcd namespaceSelector: matchNames: - monitoring
# kubectl create -f manifests/prometheus/prometheus-etcd.yaml
注1:請將 etcd 的ip地址和 etcd 的節點名更改成自行配置的ip和節點名。
注2:在 tlsconfig 下邊的三項只需更改最後的ca.pem、etcd.pem、etcd-key.pem爲本身相應的證書名便可。如實在不瞭解,可登錄進 prometheus-k8s 的pod進行查看:
# kubectl exec -ti -n monitoring prometheus-k8s-0 /bin/sh Defaulting container name to prometheus. Use 'kubectl describe pod/prometheus-k8s-0 -n monitoring' to see all of the containers in this pod. /prometheus $ ls /etc/prometheus/secrets/etcd-certs/ ca.pem etcd-key.pem etcd.pem
Prometheus Operator 部署完成後會對外暴露三個端口:30900爲Prometheus端口、30902爲grafana端口、30903爲alertmanager端口。
Prometheus顯示以下,如何一切正常,全部target都應該是up的。
Alertmanager顯示以下:
kubectl get pod -n monitoring kubectl get svc -n monitoring kubectl -n monitoring get endpoints kubectl -n monitoring get servicemonitors kubectl get customresourcedefinitions
Grafana 是經過 Dashboard 展現數據的,在 Dashboard 中須要定義:
1)展現 Prometheus 的哪些多維數據?須要給出具體的查詢語言表達式。
2)用什麼形式展現,好比二維線性圖,儀表圖,各類座標的含義等。
可見,要作出一個 Dashboard 也不是件容易的事情。幸運的是,咱們能夠藉助開源社區的力量,直接使用現成的 Dashboard。
訪問 https://grafana.com/dashboards?dataSource=prometheus&search=docker,將會看到不少用於監控 Docker 的 Dashboard。
Grafana的監控項顯示以下
etcd相關監控項顯示以下
kubernetes集羣顯示以下
節點監控顯示以下
Weave Scope 能夠展現集羣和應用的完整視圖。其出色的交互性讓用戶可以輕鬆對容器化應用進行實時監控和問題診斷。
Heapster 是 Kubernetes 原生的集羣監控方案。預約義的 Dashboard 可以從 Cluster 和 Pods 兩個層次監控 Kubernetes。
Prometheus Operator 多是目前功能最全面的 Kubernetes 開源監控方案。除了可以監控 Node 和 Pod,還支持集羣的各類管理組件,好比 API Server、Scheduler、Controller Manager 等。
Kubernetes 監控是一個快速發展的領域。隨着 Kubernetes 的普及,必定會涌現出更多的優秀方案。