Prometheus是一個開源的完整監控解決方案,涵蓋數據採集、查詢、告警、展現整個監控流程,下圖是Prometheus的架構圖:html
官方文檔:https://prometheus.io/docs/introduction/overview/node
Prometheus生態系統由多個組件組成。其中許多組件都是可選的mysql
必須安裝,本質是一個時序數據庫,主要負責數據pull、存儲、分析,提供 PromQL 查詢語言的支持;git
非必選項,支持臨時性Job主動推送指標的中間網關github
部署在客戶端的agent,如 node_exporte, mysql_exporter等web
提供被監控組件信息的 HTTP 接口被叫作 exporter ,目前互聯網公司經常使用的組件大部分都有 exporter 能夠直接使用,好比 Varnish、Haproxy、Nginx、MySQL、Linux 系統信息 (包括磁盤、內存、CPU、網絡等等);如:https://prometheus.io/docs/instrumenting/exporters/spring
用來進行報警,Promethus server 通過分析, 把出發的警報發送給 alertmanager 組件,alertmanager 組件經過自身的規則,來發送通知,(郵件,或者webhook)sql
Prometheus-Operator的架構圖:數據庫
上圖是Prometheus-Operator官方提供的架構圖,其中Operator是最核心的部分,做爲一個控制器,他會去建立Prometheus、ServiceMonitor、AlertManager以及PrometheusRule4個CRD資源對象,而後會一直監控並維持這4個資源對象的狀態。api
其中建立的prometheus這種資源對象就是做爲Prometheus Server存在,而ServiceMonitor就是exporter的各類抽象,exporter前面咱們已經學習了,是用來提供專門提供metrics數據接口的工具,Prometheus就是經過ServiceMonitor提供的metrics數據接口去 pull 數據的,固然alertmanager這種資源對象就是對應的AlertManager的抽象,而PrometheusRule是用來被Prometheus實例使用的報警規則文件。
這樣咱們要在集羣中監控什麼數據,就變成了直接去操做 Kubernetes 集羣的資源對象了,是否是方便不少了。上圖中的 Service 和 ServiceMonitor 都是 Kubernetes 的資源,一個 ServiceMonitor 能夠經過 labelSelector 的方式去匹配一類 Service,Prometheus 也能夠經過 labelSelector 去匹配多個ServiceMonitor。
官方chart地址:https://github.com/helm/charts/tree/master/stable/prometheus-operator
搜索最新包下載到本地
# 搜索
helm search prometheus-operator NAME CHART VERSION APP VERSION DESCRIPTION stable/prometheus-operator 6.4.0 0.31.0 Provides easy monitoring definitions for Kubernetes servi...
# 拉取到本地
helm fetch prometheus-operator
安裝
# 新建一個monitoring的namespaces
Kubectl create ns monitoring
# 安裝
helm install -f ./prometheus-operator/values.yaml --name prometheus-operator --namespace=monitoring ./prometheus-operator
# 更新
helm upgrade -f prometheus-operator/values.yaml prometheus-operator ./prometheus-operator
卸載prometheus-operator
helm delete prometheus-operator --purge
# 刪除crd
kubectl delete customresourcedefinitions prometheuses.monitoring.coreos.com prometheusrules.monitoring.coreos.com servicemonitors.monitoring.coreos.com kubectl delete customresourcedefinitions alertmanagers.monitoring.coreos.com kubectl delete customresourcedefinitions podmonitors.monitoring.coreos.com
修改配置文檔values.yaml
config: global: resolve_timeout: 5m smtp_smarthost: 'smtp.qq.com:465' smtp_from: '1xxx@qq.com' smtp_auth_username: '1xxx@qq.com' smtp_auth_password: 'xreqcqffrxtnieff' smtp_hello: '163.com' smtp_require_tls: false route: group_by: ['job','severity'] group_wait: 30s group_interval: 1m repeat_interval: 12h receiver: default routes: - receiver: webhook match: alertname: TargetDown receivers: - name: default email_configs: - to: 'hejianlai@pcidata.cn' send_resolved: true - name: webhook email_configs: - to: 'xxx@xxx.cn' send_resolved: true
這裏有個坑請參考:http://www.javashuo.com/article/p-cqyzpsry-kp.html
storage: volumeClaimTemplate: spec: storageClassName: nfs-client accessModes: ["ReadWriteOnce"] resources: requests: storage: 50Gi
路徑:prometheus-operator/charts/grafana/values.yaml
persistence: enabled: true storageClassName: "nfs-client" accessModes: - ReadWriteOnce size: 10Gi
- job_name: 'kubernetes-service-endpoints' kubernetes_sd_configs: - role: endpoints relabel_configs: - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape] action: keep regex: true - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme] action: replace target_label: __scheme__ regex: (https?) - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+) - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port] action: replace target_label: __address__ regex: ([^:]+)(?::\d+)?;(\d+) replacement: $1:$2 - action: labelmap regex: __meta_kubernetes_service_label_(.+) - source_labels: [__meta_kubernetes_namespace] action: replace target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_service_name] action: replace target_label: kubernetes_name - job_name: 'kubernetes-pod' kubernetes_sd_configs: - role: pod relabel_configs: - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape] action: keep regex: true - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+) - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port] action: replace regex: ([^:]+)(?::\d+)?;(\d+) replacement: $1:$2 target_label: __address__ - action: labelmap regex: __meta_kubernetes_pod_label_(.+) - source_labels: [__meta_kubernetes_namespace] action: replace target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_pod_name] action: replace target_label: kubernetes_pod_name - job_name: istio-mesh scrape_interval: 15s scrape_timeout: 10s metrics_path: /metrics scheme: http kubernetes_sd_configs: - api_server: null role: endpoints namespaces: names: - istio-system relabel_configs: - source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name] separator: ; regex: istio-telemetry;prometheus replacement: $1 action: keep - job_name: envoy-stats scrape_interval: 15s scrape_timeout: 10s metrics_path: /stats/prometheus scheme: http kubernetes_sd_configs: - api_server: null role: pod namespaces: names: [] relabel_configs: - source_labels: [__meta_kubernetes_pod_container_port_name] separator: ; regex: .*-envoy-prom replacement: $1 action: keep - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port] separator: ; regex: ([^:]+)(?::\d+)?;(\d+) target_label: __address__ replacement: $1:15090 action: replace - separator: ; regex: __meta_kubernetes_pod_label_(.+) replacement: $1 action: labelmap - source_labels: [__meta_kubernetes_namespace] separator: ; regex: (.*) target_label: namespace replacement: $1 action: replace - source_labels: [__meta_kubernetes_pod_name] separator: ; regex: (.*) target_label: pod_name replacement: $1 action: replace metric_relabel_configs: - source_labels: [cluster_name] separator: ; regex: (outbound|inbound|prometheus_stats).* replacement: $1 action: drop - source_labels: [tcp_prefix] separator: ; regex: (outbound|inbound|prometheus_stats).* replacement: $1 action: drop - source_labels: [listener_address] separator: ; regex: (.+) replacement: $1 action: drop - source_labels: [http_conn_manager_listener_prefix] separator: ; regex: (.+) replacement: $1 action: drop - source_labels: [http_conn_manager_prefix] separator: ; regex: (.+) replacement: $1 action: drop - source_labels: [__name__] separator: ; regex: envoy_tls.* replacement: $1 action: drop - source_labels: [__name__] separator: ; regex: envoy_tcp_downstream.* replacement: $1 action: drop - source_labels: [__name__] separator: ; regex: envoy_http_(stats|admin).* replacement: $1 action: drop - source_labels: [__name__] separator: ; regex: envoy_cluster_(lb|retry|bind|internal|max|original).* replacement: $1 action: drop - job_name: istio-policy scrape_interval: 15s scrape_timeout: 10s metrics_path: /metrics scheme: http kubernetes_sd_configs: - api_server: null role: endpoints namespaces: names: - istio-system relabel_configs: - source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name] separator: ; regex: istio-policy;http-monitoring replacement: $1 action: keep - job_name: istio-telemetry scrape_interval: 15s scrape_timeout: 10s metrics_path: /metrics scheme: http kubernetes_sd_configs: - api_server: null role: endpoints namespaces: names: - istio-system relabel_configs: - source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name] separator: ; regex: istio-telemetry;http-monitoring replacement: $1 action: keep - job_name: pilot scrape_interval: 15s scrape_timeout: 10s metrics_path: /metrics scheme: http kubernetes_sd_configs: - api_server: null role: endpoints namespaces: names: - istio-system relabel_configs: - source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name] separator: ; regex: istio-pilot;http-monitoring replacement: $1 action: keep - job_name: galley scrape_interval: 15s scrape_timeout: 10s metrics_path: /metrics scheme: http kubernetes_sd_configs: - api_server: null role: endpoints namespaces: names: - istio-system relabel_configs: - source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name] separator: ; regex: istio-galley;http-monitoring replacement: $1 action: keep - job_name: citadel scrape_interval: 15s scrape_timeout: 10s metrics_path: /metrics scheme: http kubernetes_sd_configs: - api_server: null role: endpoints namespaces: names: - istio-system relabel_configs: - source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name] separator: ; regex: istio-citadel;http-monitoring replacement: $1 action: keep - job_name: kubernetes-pods-istio-secure scrape_interval: 15s scrape_timeout: 10s metrics_path: /metrics scheme: https kubernetes_sd_configs: - api_server: null role: pod namespaces: names: [] tls_config: ca_file: /etc/istio-certs/root-cert.pem cert_file: /etc/istio-certs/cert-chain.pem key_file: /etc/istio-certs/key.pem insecure_skip_verify: true relabel_configs: - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape] separator: ; regex: "true" replacement: $1 action: keep - source_labels: [__meta_kubernetes_pod_annotation_sidecar_istio_io_status, __meta_kubernetes_pod_annotation_istio_mtls] separator: ; regex: (([^;]+);([^;]*))|(([^;]*);(true)) replacement: $1 action: keep - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme] separator: ; regex: (http) replacement: $1 action: drop - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path] separator: ; regex: (.+) target_label: __metrics_path__ replacement: $1 action: replace - source_labels: [__address__] separator: ; regex: ([^:]+):(\d+) replacement: $1 action: keep - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port] separator: ; regex: ([^:]+)(?::\d+)?;(\d+) target_label: __address__ replacement: $1:$2 action: replace - separator: ; regex: __meta_kubernetes_pod_label_(.+) replacement: $1 action: labelmap - source_labels: [__meta_kubernetes_namespace] separator: ; regex: (.*) target_label: namespace replacement: $1 action: replace - source_labels: [__meta_kubernetes_pod_name] separator: ; regex: (.*) target_label: pod_name replacement: $1 action: replace
對於 etcd 集羣通常狀況下,爲了安全都會開啓 https 證書認證的方式,因此要想讓 Prometheus 訪問到 etcd 集羣的監控數據,就須要提供相應的證書校驗。
因爲咱們這裏演示環境使用的是 Kubeadm 搭建的集羣,咱們可使用 kubectl 工具去獲取 etcd 啓動的時候使用的證書路徑:
[root@cn-hongkong ~]# kubectl get pod etcd-cn-hongkong.i-j6caps6av1mtyxyofmrw -n kube-system -o yaml
咱們能夠看到 etcd 使用的證書都對應在節點的 /etc/kubernetes/pki/etcd 這個路徑下面,因此首先咱們將須要使用到的證書經過 secret 對象保存到集羣中去:(在 etcd 運行的節點)
1) 手動獲取etcd信息
curl --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/healthcheck-client.crt --key /etc/kubernetes/pki/etcd/healthcheck-client.key https://172.31.182.152:2379/metrics
2) 使用prometheus抓取
kubectl -n monitoring create secret generic etcd-certs --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.crt --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.key --from-file=/etc/kubernetes/pki/etcd/ca.crt
3) 添加values.yaml文件中kubeEtcd配置
## Component scraping etcd ## kubeEtcd: enabled: true ## If your etcd is not deployed as a pod, specify IPs it can be found on ## endpoints: [] ## Etcd service. If using kubeEtcd.endpoints only the port and targetPort are used ## service: port: 2379 targetPort: 2379 selector: component: etcd ## Configure secure access to the etcd cluster by loading a secret into prometheus and ## specifying security configuration below. For example, with a secret named etcd-client-cert ## serviceMonitor: scheme: https insecureSkipVerify: true serverName: localhost caFile: /etc/prometheus/secrets/etcd-certs/ca.crt certFile: /etc/prometheus/secrets/etcd-certs/healthcheck-client.crt keyFile: /etc/prometheus/secrets/etcd-certs/healthcheck-client.key
4) 將上面建立的etcd-certs對象配置到prometheus中(特別重要)
## Secrets is a list of Secrets in the same namespace as the Prometheus object, which shall be mounted into the Prometheus Pods. ## The Secrets are mounted into /etc/prometheus/secrets/. Secrets changes after initial creation of a Prometheus object are not ## reflected in the running Pods. To change the secrets mounted into the Prometheus Pods, the object must be deleted and recreated ## with the new list of secrets. ## secrets: - etcd-certs
安裝後證書就會出如今prometheus目錄下
咱們須要建一個ServiceMonitor,namespaceSelector:的any:true表示匹配 全部命名空間下面的具備 app= sscp-transaction這個 label 標籤的 Service。
apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: labels: app: sscp-transaction release: prometheus-operator name: springboot namespace: monitoring spec: endpoints: - interval: 15s path: /actuator/prometheus port: health scheme: http namespaceSelector: any: true # matchNames: # - sscp-dev selector: matchLabels: app: sscp-transaction # release: sscp
效果圖: