Prometheus Operator 監控Kubernetes

Prometheus Operator 監控Kubernetes

1. Prometheus的基本架構

​ Prometheus是一個開源的完整監控解決方案,涵蓋數據採集、查詢、告警、展現整個監控流程,下圖是Prometheus的架構圖html

 

官方文檔:https://prometheus.io/docs/introduction/overview/node

 

2. 組件說明

Prometheus生態系統由多個組件組成。其中許多組件都是可選的mysql

  • Promethus  server

必須安裝,本質是一個時序數據庫,主要負責數據pull、存儲、分析,提供 PromQL 查詢語言的支持;git

  • Push Gateway

非必選項,支持臨時性Job主動推送指標的中間網關github

  • exporters

部署在客戶端的agent,node_exporte, mysql_exporterweb

提供被監控組件信息的 HTTP 接口被叫作 exporter ,目前互聯網公司經常使用的組件大部分都有 exporter 能夠直接使用,好比 VarnishHaproxyNginxMySQLLinux 系統信息 (包括磁盤、內存、CPU、網絡等等);如:https://prometheus.io/docs/instrumenting/exporters/spring

  • alertmanager

用來進行報警,Promethus server 通過分析, 把出發的警報發送給 alertmanager 組件,alertmanager 組件經過自身的規則,來發送通知,(郵件,或者webhook)sql

3. Prometheus-Operator

Prometheus-Operator的架構圖:數據庫

 

上圖是Prometheus-Operator官方提供的架構圖,其中Operator是最核心的部分,做爲一個控制器,他會去建立PrometheusServiceMonitorAlertManager以及PrometheusRule4CRD資源對象,而後會一直監控並維持這4個資源對象的狀態。api

其中建立的prometheus這種資源對象就是做爲Prometheus Server存在,而ServiceMonitor就是exporter的各類抽象,exporter前面咱們已經學習了,是用來提供專門提供metrics數據接口的工具,Prometheus就是經過ServiceMonitor提供的metrics數據接口去 pull 數據的,固然alertmanager這種資源對象就是對應的AlertManager的抽象,而PrometheusRule是用來被Prometheus實例使用的報警規則文件。

這樣咱們要在集羣中監控什麼數據,就變成了直接去操做 Kubernetes 集羣的資源對象了,是否是方便不少了。上圖中的 Service ServiceMonitor 都是 Kubernetes 的資源,一個 ServiceMonitor 能夠經過 labelSelector 的方式去匹配一類 ServicePrometheus 也能夠經過 labelSelector 去匹配多個ServiceMonitor

4. Prometheus-Operator部署

官方chart地址:https://github.com/helm/charts/tree/master/stable/prometheus-operator

搜索最新包下載到本地

# 搜索

helm search prometheus-operator

NAME                            CHART VERSION   APP VERSION     DESCRIPTION                                
stable/prometheus-operator      6.4.0           0.31.0          Provides easy monitoring definitions for Kubernetes servi...

# 拉取到本地

helm fetch prometheus-operator

安裝

# 新建一個monitoringnamespaces

Kubectl create ns monitoring

# 安裝

helm install -f ./prometheus-operator/values.yaml --name prometheus-operator --namespace=monitoring ./prometheus-operator

# 更新

helm upgrade -f prometheus-operator/values.yaml prometheus-operator ./prometheus-operator

卸載prometheus-operator

helm delete prometheus-operator --purge

# 刪除crd

kubectl delete customresourcedefinitions prometheuses.monitoring.coreos.com prometheusrules.monitoring.coreos.com servicemonitors.monitoring.coreos.com
kubectl delete customresourcedefinitions alertmanagers.monitoring.coreos.com
kubectl delete customresourcedefinitions podmonitors.monitoring.coreos.com

修改配置文檔values.yaml

4.1. 郵件告警

  config:

    global:

      resolve_timeout: 5m

      smtp_smarthost: 'smtp.qq.com:465'

      smtp_from: '1xxx@qq.com'

      smtp_auth_username: '1xxx@qq.com'

      smtp_auth_password: 'xreqcqffrxtnieff'

      smtp_hello: '163.com'

      smtp_require_tls: false

    route:

      group_by: ['job','severity']

      group_wait: 30s

      group_interval: 1m

      repeat_interval: 12h

      receiver: default

      routes:

      - receiver: webhook

        match:

          alertname: TargetDown

    receivers:

    - name: default

      email_configs:

      - to: 'hejianlai@pcidata.cn'

        send_resolved: true

    - name: webhook

      email_configs:

      - to: 'xxx@xxx.cn'

        send_resolved: true

這裏有個坑請參考:http://www.javashuo.com/article/p-cqyzpsry-kp.html

4.2. prometheus持久化存儲

  storage:

      volumeClaimTemplate:

        spec:

          storageClassName: nfs-client

          accessModes: ["ReadWriteOnce"]

          resources:

            requests:

              storage: 50Gi

4.3. Grafana持久化

路徑:prometheus-operator/charts/grafana/values.yaml

persistence:

  enabled: true

  storageClassName: "nfs-client"

  accessModes:

    - ReadWriteOnce

  size: 10Gi

4.4. 自動發現Service

     - job_name: 'kubernetes-service-endpoints'

       kubernetes_sd_configs:

         - role: endpoints

       relabel_configs:

       - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]

         action: keep

         regex: true

       - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]

         action: replace

         target_label: __scheme__

         regex: (https?)

       - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]

         action: replace

         target_label: __metrics_path__

         regex: (.+)

       - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]

         action: replace

         target_label: __address__

         regex: ([^:]+)(?::\d+)?;(\d+)

         replacement: $1:$2

       - action: labelmap

         regex: __meta_kubernetes_service_label_(.+)

       - source_labels: [__meta_kubernetes_namespace]

         action: replace

         target_label: kubernetes_namespace

       - source_labels: [__meta_kubernetes_service_name]

         action: replace

         target_label: kubernetes_name

     - job_name: 'kubernetes-pod'

       kubernetes_sd_configs:

         - role: pod

       relabel_configs:

       - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]

         action: keep

         regex: true

       - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]

         action: replace

         target_label: __metrics_path__

         regex: (.+)

       - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]

         action: replace

         regex: ([^:]+)(?::\d+)?;(\d+)

         replacement: $1:$2

         target_label: __address__

       - action: labelmap

         regex: __meta_kubernetes_pod_label_(.+)

       - source_labels: [__meta_kubernetes_namespace]

         action: replace

         target_label: kubernetes_namespace

       - source_labels: [__meta_kubernetes_pod_name]

         action: replace

         target_label: kubernetes_pod_name

     - job_name: istio-mesh

       scrape_interval: 15s

       scrape_timeout: 10s

       metrics_path: /metrics

       scheme: http

       kubernetes_sd_configs:

       - api_server: null

         role: endpoints

         namespaces:

           names:

           - istio-system

       relabel_configs:

       - source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]

         separator: ;

         regex: istio-telemetry;prometheus

         replacement: $1

         action: keep

     - job_name: envoy-stats

       scrape_interval: 15s

       scrape_timeout: 10s

       metrics_path: /stats/prometheus

       scheme: http

       kubernetes_sd_configs:

       - api_server: null

         role: pod

         namespaces:

           names: []

       relabel_configs:

       - source_labels: [__meta_kubernetes_pod_container_port_name]

         separator: ;

         regex: .*-envoy-prom

         replacement: $1

         action: keep

       - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]

         separator: ;

         regex: ([^:]+)(?::\d+)?;(\d+)

         target_label: __address__

         replacement: $1:15090

         action: replace

       - separator: ;

         regex: __meta_kubernetes_pod_label_(.+)

         replacement: $1

         action: labelmap

       - source_labels: [__meta_kubernetes_namespace]

         separator: ;

         regex: (.*)

         target_label: namespace

         replacement: $1

         action: replace

       - source_labels: [__meta_kubernetes_pod_name]

         separator: ;

         regex: (.*)

         target_label: pod_name

         replacement: $1

         action: replace

       metric_relabel_configs:

       - source_labels: [cluster_name]

         separator: ;

         regex: (outbound|inbound|prometheus_stats).*

         replacement: $1

         action: drop

       - source_labels: [tcp_prefix]

         separator: ;

         regex: (outbound|inbound|prometheus_stats).*

         replacement: $1

         action: drop

       - source_labels: [listener_address]

         separator: ;

         regex: (.+)

         replacement: $1

         action: drop

       - source_labels: [http_conn_manager_listener_prefix]

         separator: ;

         regex: (.+)

         replacement: $1

         action: drop

       - source_labels: [http_conn_manager_prefix]

         separator: ;

         regex: (.+)

         replacement: $1

         action: drop

       - source_labels: [__name__]

         separator: ;

         regex: envoy_tls.*

         replacement: $1

         action: drop

       - source_labels: [__name__]

         separator: ;

         regex: envoy_tcp_downstream.*

         replacement: $1

         action: drop

       - source_labels: [__name__]

         separator: ;

         regex: envoy_http_(stats|admin).*

         replacement: $1

         action: drop

       - source_labels: [__name__]

         separator: ;

         regex: envoy_cluster_(lb|retry|bind|internal|max|original).*

         replacement: $1

         action: drop

     - job_name: istio-policy

       scrape_interval: 15s

       scrape_timeout: 10s

       metrics_path: /metrics

       scheme: http

       kubernetes_sd_configs:

       - api_server: null

         role: endpoints

         namespaces:

           names:

           - istio-system

       relabel_configs:

       - source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]

         separator: ;

         regex: istio-policy;http-monitoring

         replacement: $1

         action: keep

     - job_name: istio-telemetry

       scrape_interval: 15s

       scrape_timeout: 10s

       metrics_path: /metrics

       scheme: http

       kubernetes_sd_configs:

       - api_server: null

         role: endpoints

         namespaces:

           names:

           - istio-system

       relabel_configs:

       - source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]

         separator: ;

         regex: istio-telemetry;http-monitoring

         replacement: $1

         action: keep

     - job_name: pilot

       scrape_interval: 15s

       scrape_timeout: 10s

       metrics_path: /metrics

       scheme: http

       kubernetes_sd_configs:

       - api_server: null

         role: endpoints

         namespaces:

           names:

           - istio-system

       relabel_configs:

       - source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]

         separator: ;

         regex: istio-pilot;http-monitoring

         replacement: $1

         action: keep

     - job_name: galley

       scrape_interval: 15s

       scrape_timeout: 10s

       metrics_path: /metrics

       scheme: http

       kubernetes_sd_configs:

       - api_server: null

         role: endpoints

         namespaces:

           names:

           - istio-system

       relabel_configs:

       - source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]

         separator: ;

         regex: istio-galley;http-monitoring

         replacement: $1

         action: keep

     - job_name: citadel

       scrape_interval: 15s

       scrape_timeout: 10s

       metrics_path: /metrics

       scheme: http

       kubernetes_sd_configs:

       - api_server: null

         role: endpoints

         namespaces:

           names:

           - istio-system

       relabel_configs:

       - source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]

         separator: ;

         regex: istio-citadel;http-monitoring

         replacement: $1

         action: keep

     - job_name: kubernetes-pods-istio-secure

       scrape_interval: 15s

       scrape_timeout: 10s

       metrics_path: /metrics

       scheme: https

       kubernetes_sd_configs:

       - api_server: null

         role: pod

         namespaces:

           names: []

       tls_config:

         ca_file: /etc/istio-certs/root-cert.pem

         cert_file: /etc/istio-certs/cert-chain.pem

         key_file: /etc/istio-certs/key.pem

         insecure_skip_verify: true

       relabel_configs:

       - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]

         separator: ;

         regex: "true"

         replacement: $1

         action: keep

       - source_labels: [__meta_kubernetes_pod_annotation_sidecar_istio_io_status, __meta_kubernetes_pod_annotation_istio_mtls]

         separator: ;

         regex: (([^;]+);([^;]*))|(([^;]*);(true))

         replacement: $1

         action: keep

       - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]

         separator: ;

         regex: (http)

         replacement: $1

         action: drop

       - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]

         separator: ;

         regex: (.+)

         target_label: __metrics_path__

         replacement: $1

         action: replace

       - source_labels: [__address__]

         separator: ;

         regex: ([^:]+):(\d+)

         replacement: $1

         action: keep

       - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]

         separator: ;

         regex: ([^:]+)(?::\d+)?;(\d+)

         target_label: __address__

         replacement: $1:$2

         action: replace

       - separator: ;

         regex: __meta_kubernetes_pod_label_(.+)

         replacement: $1

         action: labelmap

       - source_labels: [__meta_kubernetes_namespace]

         separator: ;

         regex: (.*)

         target_label: namespace

         replacement: $1

         action: replace

       - source_labels: [__meta_kubernetes_pod_name]

         separator: ;

         regex: (.*)

         target_label: pod_name

         replacement: $1

         action: replace

4.5. etcd

對於 etcd 集羣通常狀況下,爲了安全都會開啓 https 證書認證的方式,因此要想讓 Prometheus 訪問到 etcd 集羣的監控數據,就須要提供相應的證書校驗。

因爲咱們這裏演示環境使用的是 Kubeadm 搭建的集羣,咱們可使用 kubectl 工具去獲取 etcd 啓動的時候使用的證書路徑:

[root@cn-hongkong ~]# kubectl get pod etcd-cn-hongkong.i-j6caps6av1mtyxyofmrw -n kube-system -o yaml

 

咱們能夠看到 etcd 使用的證書都對應在節點的 /etc/kubernetes/pki/etcd 這個路徑下面,因此首先咱們將須要使用到的證書經過 secret 對象保存到集羣中去:(etcd 運行的節點)

1) 手動獲取etcd信息

curl --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/healthcheck-client.crt --key /etc/kubernetes/pki/etcd/healthcheck-client.key https://172.31.182.152:2379/metrics

2) 使用prometheus抓取

kubectl -n monitoring create secret generic etcd-certs --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.crt --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.key --from-file=/etc/kubernetes/pki/etcd/ca.crt

3) 添加values.yaml文件中kubeEtcd配置

## Component scraping etcd

##

kubeEtcd:

  enabled: true

  ## If your etcd is not deployed as a pod, specify IPs it can be found on

  ##

  endpoints: []

  ## Etcd service. If using kubeEtcd.endpoints only the port and targetPort are used

  ##

  service:

    port: 2379

    targetPort: 2379

    selector:

      component: etcd

  ## Configure secure access to the etcd cluster by loading a secret into prometheus and

  ## specifying security configuration below. For example, with a secret named etcd-client-cert

  ##

  serviceMonitor:

    scheme: https

    insecureSkipVerify: true

    serverName: localhost

    caFile: /etc/prometheus/secrets/etcd-certs/ca.crt

    certFile: /etc/prometheus/secrets/etcd-certs/healthcheck-client.crt

    keyFile: /etc/prometheus/secrets/etcd-certs/healthcheck-client.key

4) 將上面建立的etcd-certs對象配置到prometheus中(特別重要)

    ## Secrets is a list of Secrets in the same namespace as the Prometheus object, which shall be mounted into the Prometheus Pods.

    ## The Secrets are mounted into /etc/prometheus/secrets/. Secrets changes after initial creation of a Prometheus object are not

    ## reflected in the running Pods. To change the secrets mounted into the Prometheus Pods, the object must be deleted and recreated

    ## with the new list of secrets.

    ##

    secrets:

    - etcd-certs

安裝後證書就會出如今prometheus目錄下

 

 4.6抓取自定義Server

咱們須要建一個ServiceMonitor,namespaceSelector:的any:true表示匹配 全部命名空間下面的具備 app= sscp-transaction這個 label 標籤的 Service。

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    app: sscp-transaction
    release: prometheus-operator
  name: springboot
  namespace: monitoring
spec:
  endpoints:
  - interval: 15s
    path: /actuator/prometheus
    port: health
    scheme: http
  namespaceSelector:
    any: true
#    matchNames:
#    - sscp-dev
  selector:
    matchLabels:
      app: sscp-transaction
#      release: sscp

效果圖:

 

 

 

 

相關文章
相關標籤/搜索