[TOC]html
在 Kubernetes 的支持下,管理和伸縮 Web 應用、移動應用後端以及 API 服務都變得比較簡單了。其緣由是這些應用通常都是無狀態的,因此 Deployment 這樣的基礎 Kubernetes API 對象就能夠在無需附加操做的狀況下,對應用進行伸縮和故障恢復了。 node
而對於數據庫、緩存或者監控系統等有狀態應用的管理,就是個挑戰了。這些系統須要應用領域的知識,來正確的進行伸縮和升級,當數據丟失或不可用的時候,要進行有效的從新配置。咱們但願這些應用相關的運維技能能夠編碼到軟件之中,從而藉助 Kubernetes 的能力,正確的運行和管理複雜應用。 mysql
Operator 這種軟件,使用 TPR(第三方資源,如今已經升級爲 CRD) 機制對 Kubernetes API 進行擴展,將特定應用的知識融入其中,讓用戶能夠建立、配置和管理應用。和 Kubernetes 的內置資源同樣,Operator 操做的不是一個單實例應用,而是集羣範圍內的多實例。 git
Kubernetes的Prometheus Operator爲Kubernetes服務和Prometheus實例的部署和管理提供了簡單的監控定義。github
安裝完畢後,Prometheus Operator提供瞭如下功能:sql
Prometheus Operator 架構圖以下:數據庫
以上架構中的各組成部分以不一樣的資源方式運行在 Kubernetes 集羣中,它們各自有不一樣的做用:json
Operator: Operator 資源會根據自定義資源(Custom Resource Definition / CRDs)來部署和管理 Prometheus Server,同時監控這些自定義資源事件的變化來作相應的處理,是整個系統的控制中心。
Prometheus: Prometheus 資源是聲明性地描述 Prometheus 部署的指望狀態。
Prometheus Server: Operator 根據自定義資源 Prometheus 類型中定義的內容而部署的 Prometheus Server 集羣,這些自定義資源能夠看做是用來管理 Prometheus Server 集羣的 StatefulSets 資源。
ServiceMonitor: ServiceMonitor 也是一個自定義資源,它描述了一組被 Prometheus 監控的 targets 列表。該資源經過 Labels 來選取對應的 Service Endpoint,讓 Prometheus Server 經過選取的 Service 來獲取 Metrics 信息。
Service: Service 資源主要用來對應 Kubernetes 集羣中的 Metrics Server Pod,來提供給 ServiceMonitor 選取讓 Prometheus Server 來獲取信息。簡單的說就是 Prometheus 監控的對象,例如 Node Exporter Service、Mysql Exporter Service 等等。
Alertmanager: Alertmanager 也是一個自定義資源類型,由 Operator 根據資源描述內容來部署 Alertmanager 集羣。後端
環境:api
kubeadm安裝的1.12
v2.11.0
咱們使用helm安裝。helm chart根據實際使用修改。prometheus-operator
裏面整合了grafana和監控kubernetes的exporter。須要注意的是,grafana我配置使用了mysql保存數據,相關說明在另外一篇文章中《使用Helm部署Prometheus和Grafana監控Kubernetes》。
cd helm/prometheus-operator/ helm install --name prometheus-operator --namespace monitoring -f values.yaml ./
爲了更加靈活的的使用Prometheus Operator,添加自定義監控是必不可少的。這裏咱們使用ceph-exporter作示例。
values.yaml
中這一段便是使用servicemonitor來添加監控:
serviceMonitor: enabled: true # 開啓監控 # on what port are the metrics exposed by etcd exporterPort: 9128 # for apps that have deployed outside of the cluster, list their adresses here endpoints: [] # Are we talking http or https? scheme: http # service selector label key to target ceph exporter pods serviceSelectorLabelKey: app # default rules are in templates/ceph-exporter.rules.yaml prometheusRules: {} # Custom Labels to be added to ServiceMonitor # 通過測試,servicemonitor標籤添加prometheus operator的release標籤便可正常監控 additionalServiceMonitorLabels: release: prometheus-operator #Custom Labels to be added to Prometheus Rules CRD additionalRulesLabels: {}
最重要的是這個參數
additionalServiceMonitorLabels
,通過測試,servicemonitor須要添加prometheus operator已有的標籤,才能成功添加監控。
[root@lab1 prometheus-operator]# kubectl get servicemonitor ceph-exporter -n monitoring -o yaml [root@lab1 templates]# kubectl get servicemonitor -n monitoring ceph-exporter -o yaml apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: creationTimestamp: 2018-10-30T06:51:12Z generation: 1 labels: app: ceph-exporter chart: ceph-exporter-0.1.0 heritage: Tiller prometheus: ceph-exporter release: prometheus-operator name: ceph-exporter namespace: monitoring resourceVersion: "13937459" selfLink: /apis/monitoring.coreos.com/v1/namespaces/monitoring/servicemonitors/ceph-exporter uid: 30569173-dc10-11e8-bcf3-000c293d66a5 spec: endpoints: - interval: 30s port: http namespaceSelector: matchNames: - monitoring selector: matchLabels: app: ceph-exporter release: ceph-exporter
[root@lab1 prometheus-operator]# kubectl get pod -n monitoring prometheus-operator-operator-7459848949-8dddt -o yaml|more apiVersion: v1 kind: Pod metadata: creationTimestamp: 2018-10-30T00:39:37Z generateName: prometheus-operator-operator-7459848949- labels: app: prometheus-operator-operator chart: prometheus-operator-0.1.6 heritage: Tiller pod-template-hash: "745984894 release: prometheus-operator
要點說明:
ServiceMonitor
的標籤中至少須要有和prometheus-operator POD中標籤相匹配;ServiceMonitor
的spec參數service
能被prometheus訪問,各端點正常;安裝成功後,查看相關資源:
[root@lab1 prometheus-operator]# kubectl get service,servicemonitor,ep -n monitoring NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/alertmanager-operated ClusterIP None <none> 9093/TCP,6783/TCP 12d service/ceph-exporter ClusterIP 10.100.57.62 <none> 9128/TCP 46h service/monitoring-mysql-mysql ClusterIP 10.108.93.155 <none> 3306/TCP 42d service/prometheus-operated ClusterIP None <none> 9090/TCP 12d service/prometheus-operator-alertmanager ClusterIP 10.98.42.209 <none> 9093/TCP 6d19h service/prometheus-operator-grafana ClusterIP 10.103.100.150 <none> 80/TCP 6d19h service/prometheus-operator-kube-state-metrics ClusterIP 10.110.76.250 <none> 8080/TCP 6d19h service/prometheus-operator-operator ClusterIP None <none> 8080/TCP 6d19h service/prometheus-operator-prometheus ClusterIP 10.111.24.83 <none> 9090/TCP 6d19h service/prometheus-operator-prometheus-node-exporter ClusterIP 10.97.126.74 <none> 9100/TCP 6d19h NAME AGE servicemonitor.monitoring.coreos.com/ceph-exporter 1d servicemonitor.monitoring.coreos.com/prometheus-operator 8d servicemonitor.monitoring.coreos.com/prometheus-operator-alertmanager 6d servicemonitor.monitoring.coreos.com/prometheus-operator-apiserver 6d servicemonitor.monitoring.coreos.com/prometheus-operator-coredns 6d servicemonitor.monitoring.coreos.com/prometheus-operator-kube-controller-manager 6d servicemonitor.monitoring.coreos.com/prometheus-operator-kube-etcd 6d servicemonitor.monitoring.coreos.com/prometheus-operator-kube-scheduler 6d servicemonitor.monitoring.coreos.com/prometheus-operator-kube-state-metrics 6d servicemonitor.monitoring.coreos.com/prometheus-operator-kubelet 6d servicemonitor.monitoring.coreos.com/prometheus-operator-node-exporter 6d servicemonitor.monitoring.coreos.com/prometheus-operator-operator 6d servicemonitor.monitoring.coreos.com/prometheus-operator-prometheus 6d NAME ENDPOINTS AGE endpoints/alertmanager-operated 10.244.6.174:9093,10.244.6.174:6783 12d endpoints/ceph-exporter 10.244.2.59:9128 46h endpoints/monitoring-mysql-mysql 10.244.6.171:3306 42d endpoints/prometheus-operated 10.244.2.60:9090,10.244.6.175:9090 12d endpoints/prometheus-operator-alertmanager 10.244.6.174:9093 6d19h endpoints/prometheus-operator-grafana 10.244.6.106:3000 6d19h endpoints/prometheus-operator-kube-state-metrics 10.244.2.163:8080 6d19h endpoints/prometheus-operator-operator 10.244.6.113:8080 6d19h endpoints/prometheus-operator-prometheus 10.244.2.60:9090,10.244.6.175:9090 6d19h endpoints/prometheus-operator-prometheus-node-exporter 192.168.105.92:9100,192.168.105.93:9100,192.168.105.94:9100 + 4 more... 6d19h
上面的prometheus-operator裏的_dashboards
有我修改過的dashboard,比較全面,使用手動在grafana界面導入,後續能夠隨意修改dashboard,使用過程當中很是方便。而若是將dashboard json文件放到dashboards
目錄中,helm安裝的話,安裝的dashboard不支持grafana中直接修改,使用過程當中比較麻煩。
添加prometheusrule,如下是一個示例:
[root@lab1 ceph-exporter]# kubectl get prometheusrule -n monitoring ceph-exporter -o yaml apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: creationTimestamp: 2018-10-30T06:51:12Z generation: 1 labels: app: prometheus chart: ceph-exporter-0.1.0 heritage: Tiller prometheus: ceph-exporter release: ceph-exporter name: ceph-exporter namespace: monitoring resourceVersion: "13965150" selfLink: /apis/monitoring.coreos.com/v1/namespaces/monitoring/prometheusrules/ceph-exporter uid: 30543ec9-dc10-11e8-bcf3-000c293d66a5 spec: groups: - name: ceph-exporter.rules rules: - alert: Ceph annotations: description: There is no running ceph exporter. summary: Ceph exporter is down expr: absent(up{job="ceph-exporter"} == 1) for: 5m labels: severity: critical
默認監控k8s的rule已經不少很全面了,能夠自行調整prometheus-operator/templates/all-prometheus-rules.yaml
。
報警規則可修改values.yaml
中alertmanager:
下面這段
config: global: resolve_timeout: 5m # The smarthost and SMTP sender used for mail notifications. smtp_smarthost: 'smtp.163.com:25' smtp_from: 'xxxxxx@163.com' smtp_auth_username: 'xxxxxx@163.com' smtp_auth_password: 'xxxxxx' # The API URL to use for Slack notifications. slack_api_url: 'https://hooks.slack.com/services/some/api/token' route: group_by: ["job", "alertname"] group_wait: 30s group_interval: 5m repeat_interval: 12h receiver: 'noemail' routes: - match: severity: critical receiver: critical_email_alert - match_re: alertname: "^KubeJob*" receiver: default_email receivers: - name: 'default_email' email_configs: - to : 'xxxxxx@163.com' send_resolved: true - name: 'critical_email_alert' email_configs: - to : 'xxxxxx@163.com' send_resolved: true - name: 'noemail' email_configs: - to : 'null@null.cn' send_resolved: false ## Alertmanager template files to format alerts ## ref: https://prometheus.io/docs/alerting/notifications/ ## https://prometheus.io/docs/alerting/notification_examples/ ## templateFiles: template_1.tmpl: |- {{ define "cluster" }}{{ .ExternalURL | reReplaceAll ".*alertmanager\\.(.*)" "$1" }}{{ end }} {{ define "slack.k8s.text" }} {{- $root := . -}} {{ range .Alerts }} *Alert:* {{ .Annotations.summary }} - `{{ .Labels.severity }}` *Cluster:* {{ template "cluster" $root }} *Description:* {{ .Annotations.description }} *Graph:* <{{ .GeneratorURL }}|:chart_with_upwards_trend:> *Runbook:* <{{ .Annotations.runbook }}|:spiral_note_pad:> *Details:* {{ range .Labels.SortedPairs }} • *{{ .Name }}:* `{{ .Value }}` {{ end }}
Prometheus Operator經過定義servicemonitor和prometheusrule就能動態調整prometheus和alertmanager配置,更加符合Kubernetes的操做習慣,使Kubernetes監控更優雅。
參考資料:
[1] https://www.kancloud.cn/huyipow/prometheus/527093
[2] https://coreos.com/blog/introducing-operators.html
[3] https://coreos.com/blog/the-prometheus-operator.html
[4] https://github.com/coreos/prometheus-operator
[5] https://prometheus.io/docs/introduction/overview/
[6] https://prometheus.io/docs/alerting/alertmanager/
[7] https://github.com/1046102779/prometheus