kubernetes之監控Operator部署Prometheus（三）

時間 2019-11-13

標籤 kubernetes 監控 operator 部署 prometheus 简体版

原文原文鏈接

第一章和第二章中咱們配置Prometheus的成本很是高，並且也很是麻煩。可是咱們要考慮Prometheus、AlertManager 這些組件服務自己的高可用的話，成本就更高了，固然咱們也徹底能夠用自定義的方式來實現這些需求，咱們也知道 Promethues 在代碼上就已經對 Kubernetes 有了原生的支持，能夠經過服務發現的形式來自動監控集羣，所以咱們可使用另一種更加高級的方式來部署 Prometheus：Operator 框架。node

什麼是`Operator`

Operator是由CoreOS開發的，用來擴展Kubernetes API，特定的應用程序控制器，它用來建立、配置和管理複雜的有狀態應用，如數據庫、緩存和監控系統。Operator基於Kubernetes的資源和控制器概念之上構建，但同時又包含了應用程序特定的領域知識。建立Operator的關鍵是CRD（自定義資源）的設計。linux

Operator是將運維人員對軟件操做的知識給代碼化，同時利用 Kubernetes 強大的抽象來管理大規模的軟件應用。目前CoreOS官方提供了幾種Operator的實現，其中就包括咱們今天的主角：Prometheus Operator，Operator的核心實現就是基於 Kubernetes 的如下兩個概念：git

資源：對象的狀態定義
控制器：觀測、分析和行動，以調節資源的分佈

當前CoreOS提供的如下四種Operator：github

etcd：建立etcd集羣
Rook：雲原生環境下的文件、塊、對象存儲服務
Prometheus：建立Prometheus監控實例
Tectonic：部署Kubernetes集羣

接下來咱們將使用Operator建立Prometheus。web

安裝

咱們這裏直接經過 Prometheus-Operator 的源碼來進行安裝，固然也能夠用 Helm 來進行一鍵安裝，咱們採用源碼安裝能夠去了解更多的實現細節。首頁將源碼 Clone 下來：shell

git clone https://github.com/coreos/prometheus-operator
cd prometheus-operator/contrib/kube-prometheus/manifests

進入到 manifests 目錄下面，這個目錄下面包含咱們全部的資源清單文件，直接在該文件夾下面執行建立資源命令便可：數據庫

kubectl apply -f .

部署完成後，會建立一個名爲monitoring的 namespace，因此資源對象對將部署在改命名空間下面，此外 Operator 會自動建立4個 CRD 資源對象：vim

 kubectl get crd |grep coreos
alertmanagers.monitoring.coreos.com           2019-03-18T02:43:57Z
prometheuses.monitoring.coreos.com            2019-03-18T02:43:58Z
prometheusrules.monitoring.coreos.com         2019-03-18T02:43:58Z
servicemonitors.monitoring.coreos.com         2019-03-18T02:43:58Z

能夠在 monitoring 命名空間下面查看全部的 Pod，其中 alertmanager 和 prometheus 是用 StatefulSet 控制器管理的，其中還有一個比較核心的 prometheus-operator 的 Pod，用來控制其餘資源對象和監聽對象變化的：後端

kubectl get pods -n monitoring
NAME                                   READY     STATUS    RESTARTS   AGE
alertmanager-main-0                    2/2       Running   0          37m
alertmanager-main-1                    2/2       Running   0          34m
alertmanager-main-2                    2/2       Running   0          33m
grafana-7489c49998-pkl8w               1/1       Running   0          40m
kube-state-metrics-d6cf6c7b5-7dwpg     4/4       Running   0          27m
node-exporter-dlp25                    2/2       Running   0          40m
node-exporter-fghlp                    2/2       Running   0          40m
node-exporter-mxwdm                    2/2       Running   0          40m
node-exporter-r9v92                    2/2       Running   0          40m
prometheus-adapter-84cd9c96c9-n92n4    1/1       Running   0          40m
prometheus-k8s-0                       3/3       Running   1          37m
prometheus-k8s-1                       3/3       Running   1          37m
prometheus-operator-7b74946bd6-vmbcj   1/1       Running   0          40m

查看建立的 Service:api

kubectl get svc -n monitoring
NAME                    TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
alertmanager-main       ClusterIP   10.110.43.207   <none>        9093/TCP            40m
alertmanager-operated   ClusterIP   None            <none>        9093/TCP,6783/TCP   38m
grafana                 ClusterIP   10.109.160.0    <none>        3000/TCP            40m
kube-state-metrics      ClusterIP   None            <none>        8443/TCP,9443/TCP   40m
node-exporter           ClusterIP   None            <none>        9100/TCP            40m
prometheus-adapter      ClusterIP   10.105.174.21   <none>        443/TCP             40m
prometheus-k8s          ClusterIP   10.97.195.143   <none>        9090/TCP            40m
prometheus-operated     ClusterIP   None            <none>        9090/TCP            38m
prometheus-operator     ClusterIP   None            <none>        8080/TCP            40m

能夠看到上面針對 grafana 和 prometheus 都建立了一個類型爲 ClusterIP 的 Service，固然若是咱們想要在外網訪問這兩個服務的話能夠經過建立對應的 Ingress 對象或者使用 NodePort 類型的 Service，咱們這裏爲了簡單，直接使用 NodePort 類型的服務便可，編輯 grafana 和 prometheus-k8s 這兩個 Service，將服務類型更改成 NodePort:

kubectl edit svc grafana -n monitoring
kubectl edit svc prometheus-k8s -n monitoring
kubectl get svc -n monitoring
NAME                    TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
.....
grafana                 NodePort    10.109.160.0    <none>        3000:31740/TCP      42m
prometheus-k8s          NodePort    10.97.195.143   <none>        9090:31310/TCP      42m

更改完成後，咱們就能夠經過去訪問上面的兩個服務了，好比查看 prometheus 的 targets 頁面：

咱們能夠看到大部分的配置都是正常的，只有兩三個沒有管理到對應的監控目標，好比 kube-controller-manager 和 kube-scheduler 這兩個系統組件，這就和 ServiceMonitor 的定義有關係了，咱們先來查看下 kube-scheduler 組件對應的 ServiceMonitor 資源的定義：(prometheus-serviceMonitorKubeScheduler.yaml)

配置kube-scheduler

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    k8s-app: kube-scheduler
  name: kube-scheduler
  namespace: monitoring
spec:
  endpoints:
  - interval: 30s  #30s獲取一次信息
    port: http-metrics  
  jobLabel: k8s-app  
  namespaceSelector:  
    matchNames:
    - kube-system
  selector:    
    matchLabels:
      k8s-app: kube-scheduler# 對應service的端口名# 表示去匹配某一命名空間中的service，若是想從全部的namespace中匹配用any: true# 匹配的 Service 的labels，若是使用mathLabels，則下面的全部標籤都匹配時纔會匹配該service，若是使用matchExpressions，則至少匹配一個標籤的service都會被選擇

上面是一個典型的 ServiceMonitor 資源文件的聲明方式，上面咱們經過selector.matchLabels在 kube-system 這個命名空間下面匹配具備k8s-app=kube-scheduler這樣的 Service，可是咱們系統中根本就沒有對應的 Service，因此咱們須要手動建立一個 Service：（prometheus-kubeSchedulerService.yaml）

apiVersion: v1
kind: Service
metadata:
  namespace: kube-system
  name: kube-scheduler
  labels:
    k8s-app: kube-scheduler
spec:
  selector:
    component: kube-scheduler
  ports:
  - name: http-metrics
    port: 10251
    targetPort: 10251
    protocol: TCP

其中最重要的是上面 labels 和 selector 部分，labels 區域的配置必須和咱們上面的 ServiceMonitor 對象中的 selector 保持一致，selector下面配置的是component=kube-scheduler，爲何會是這個 label 標籤呢，咱們能夠去 describe 下 kube-scheduelr 這個 Pod：

$ kubectl describe pod kube-scheduler-k8s-master -n kube-system
Name:               kube-scheduler-k8s-master
Namespace:          kube-system
Priority:           2000000000
PriorityClassName:  system-cluster-critical
Node:               k8s-master/172.16.138.40
Start Time:         Tue, 19 Feb 2019 21:15:05 -0500
Labels:             component=kube-scheduler
                    tier=control-plane
......

咱們能夠看到這個 Pod 具備component=kube-scheduler和tier=control-plane這兩個標籤，而前面這個標籤具備更惟一的特性，因此使用前面這個標籤較好，這樣上面建立的 Service 就能夠和咱們的 Pod 進行關聯了，直接建立便可：

$ kubectl create -f prometheus-kubeSchedulerService.yaml
$ kubectl get svc -n kube-system -l k8s-app=kube-scheduler
NAME             TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)     AGE
kube-scheduler   ClusterIP   10.103.165.58   <none>        10251/TCP   4m

建立完成後，隔一小會兒後去 prometheus 查看 targets 下面 kube-scheduler 的狀態：

咱們能夠看到如今已經發現了 target，可是抓取數據結果出錯了，這個錯誤是由於咱們集羣是使用 kubeadm 搭建的，其中 kube-scheduler 默認是綁定在127.0.0.1上面的，而上面咱們這個地方是想經過節點的 IP 去訪問，因此訪問被拒絕了，咱們只要把 kube-scheduler 綁定的地址更改爲0.0.0.0便可知足要求，因爲 kube-scheduler 是以靜態 Pod 的形式運行在集羣中的，因此咱們只須要更改靜態 Pod 目錄下面對應的 YAML (kube-scheduler.yaml)文件便可：

$ cd /etc/kubernetes/manifests
將 kube-scheduler.yaml 文件中-command的--address地址更改爲0.0.0.0
$ vim kube-scheduler.yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    scheduler.alpha.kubernetes.io/critical-pod: ""
  creationTimestamp: null
  labels:
    component: kube-scheduler
    tier: control-plane
  name: kube-scheduler
  namespace: kube-system
spec:
  containers:
  - command:
    - kube-scheduler
    - --address=0.0.0.0
    - --kubeconfig=/etc/kubernetes/scheduler.conf
    - --leader-elect=true
....

修改完成後咱們將該文件從當前文件夾中移除，隔一下子再移回該目錄，就能夠自動更新了，而後再去看 prometheus 中 kube-scheduler 這個 target 是否已經正常了：

配置kube-controller-manager

咱們來查看一下kube-controller-manager的ServiceMonitor資源的定義：

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    k8s-app: kube-controller-manager
  name: kube-controller-manager
  namespace: monitoring
spec:
  endpoints:
  - interval: 30s
    metricRelabelings:
    - action: drop
      regex: etcd_(debugging|disk|request|server).*
      sourceLabels:
      - __name__
    port: http-metrics
  jobLabel: k8s-app
  namespaceSelector:
    matchNames:
    - kube-system
  selector:
    matchLabels:
      k8s-app: kube-controller-manager

上面咱們能夠看到是經過k8s-app: kube-controller-manager這個標籤選擇的service,但系統中沒有這個service。這裏咱們手動建立一個：

建立前咱們須要看肯定pod的標籤：

$ kubectl describe pod kube-controller-manager-k8s-master -n kube-system
Name:               kube-controller-manager-k8s-master
Namespace:          kube-system
Priority:           2000000000
PriorityClassName:  system-cluster-critical
Node:               k8s-master/172.16.138.40
Start Time:         Tue, 19 Feb 2019 21:15:16 -0500
Labels:             component=kube-controller-manager
                    tier=control-plane
....

建立svc

apiVersion: v1
kind: Service
metadata:
  namespace: kube-system
  name: kube-controller-manager
  labels:
    k8s-app: kube-controller-manager
spec:
  selector:
    component: kube-controller-manager
  ports:
  - name: http-metrics
    port: 10252
    targetPort: 10252
    protocol: TCP

建立完後，咱們查看targer

這裏和上面是同一個問題。讓咱們使用上面的方法修改。讓咱們修改kube-controller-manager.yaml：

apiVersion: v1
kind: Pod
metadata:
  annotations:
    scheduler.alpha.kubernetes.io/critical-pod: ""
  creationTimestamp: null
  labels:
    component: kube-controller-manager
    tier: control-plane
  name: kube-controller-manager
  namespace: kube-system
spec:
  containers:
  - command:
    - kube-controller-manager
    - --node-monitor-grace-period=10s
    - --pod-eviction-timeout=10s
    - --address=0.0.0.0   #修改
......

修改完成後咱們將該文件從當前文件夾中移除，隔一下子再移回該目錄，就能夠自動更新了，而後再去看 prometheus 中 kube-controller-manager 這個 target 是否已經正常了：

配置coredns

coredns啓動的metrics端口是9153，咱們查看kube-system下的svc是否有這個端口:

kubectl get svc -n kube-system
NAME                                 TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)         AGE
heapster                             ClusterIP   10.96.28.220     <none>        80/TCP          19d
kube-controller-manager              ClusterIP   10.99.208.51     <none>        10252/TCP       1h
kube-dns                             ClusterIP   10.96.0.10       <none>        53/UDP,53/TCP   188d
kube-scheduler                       ClusterIP   10.103.165.58    <none>        10251/TCP       2h
kubelet                              ClusterIP   None             <none>        10250/TCP       5h
kubernetes-dashboard                 NodePort    10.103.15.27     <none>        443:30589/TCP   131d
monitoring-influxdb                  ClusterIP   10.103.155.57    <none>        8086/TCP        19d
tiller-deploy                        ClusterIP   10.104.114.83    <none>        44134/TCP       18d

這裏咱們看到kube-dns沒有metrics的端口，可是metrics後端是啓動，因此咱們須要把這個端口經過svc暴露出來。建立svc：

apiVersion: v1
kind: Service
metadata:
  namespace: kube-system
  name: kube-prometheus-prometheus-coredns
  labels:
    k8s-app: prometheus-operator-coredns
spec:
  selector:
    k8s-app: kube-dns
  ports:
  - name: metrics
    port: 9153
    targetPort: 9153
    protocol: TCP

這裏咱們啓動一個svc，labels是 k8s-app: prometheus-operator-coredns ，全部咱們須要修改DNS的serviceMonitor下的labels值。

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    k8s-app: coredns
  name: coredns
  namespace: monitoring
spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    interval: 15s
    port: metrics
  jobLabel: k8s-app
  namespaceSelector:
    matchNames:
    - kube-system
  selector:
    matchLabels:
      k8s-app: prometheus-operator-coredns

建立查看這兩個資源：

$ kubectl apply  -f prometheus-serviceMonitorCoreDNS.yaml
$ kubectl create -f prometheus-KubeDnsSvc.yaml
$ kubectl get svc -n kube-system
NAME                                 TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)         AGE
kube-prometheus-prometheus-coredns   ClusterIP   10.100.205.135   <none>        9153/TCP        1h

讓咱們再去看 prometheus 中 coredns 這個 target 是否已經正常了：

上面的監控數據配置完成後，如今咱們能夠去查看下 grafana 下面的 dashboard，一樣使用上面的 NodePort 訪問便可，第一次登陸使用 admin:admin 登陸便可，進入首頁後，能夠發現已經和咱們的 Prometheus 數據源關聯上了，正常來講能夠看到一些監控圖表了：

自定義監控項

除了 Kubernetes 集羣中的一些資源對象、節點以及組件須要監控，有的時候咱們可能還須要根據實際的業務需求去添加自定義的監控項，添加一個自定義監控的步驟也是很是簡單的。

第一步創建一個 ServiceMonitor 對象，用於 Prometheus 添加監控項
第二步爲 ServiceMonitor 對象關聯 metrics 數據接口的一個 Service 對象
第三步確保 Service 對象能夠正確獲取到 metrics 數據

接下來演示如何添加 etcd 集羣的監控。

不管是 Kubernetes 集羣外的仍是使用 Kubeadm 安裝在集羣內部的 etcd 集羣，咱們這裏都將其視做集羣外的獨立集羣，由於對於兩者的使用方法沒什麼特殊之處。

etcd 證書

對於 etcd 集羣通常狀況下，爲了安全都會開啓 https 證書認證的方式，因此要想讓 Prometheus 訪問到 etcd 集羣的監控數據，就須要提供相應的證書校驗。

因爲咱們這裏演示環境使用的是 Kubeadm 搭建的集羣，咱們可使用 kubectl 工具去獲取 etcd 啓動的時候使用的證書路徑：

$ kubectl get pods -n kube-system | grep etcd
etcd-k8s-master                         1/1       Running   2773       188d
etcd-k8s-node01                         1/1       Running   2          104d
$ kubectl get pod etcd-k8s-master -n kube-system -o yaml
.....
spec:
  containers:
  - command:
    - etcd
    - --advertise-client-urls=https://172.16.138.40:2379
    - --initial-advertise-peer-urls=https://172.16.138.40:2380
    - --initial-cluster=k8s-master=https://172.16.138.40:2380
    - --listen-client-urls=https://127.0.0.1:2379,https://172.16.138.40:2379
    - --listen-peer-urls=https://172.16.138.40:2380
    - --cert-file=/etc/kubernetes/pki/etcd/server.crt
    - --client-cert-auth=true
    - --data-dir=/var/lib/etcd
    - --key-file=/etc/kubernetes/pki/etcd/server.key
    - --name=k8s-master
    - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
    - --peer-client-cert-auth=true
    - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
    - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    - --snapshot-count=10000
    - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    image: registry.cn-hangzhou.aliyuncs.com/google_containers/etcd-amd64:3.2.18
    imagePullPolicy: IfNotPresent
    livenessProbe:
      exec:
        command:
        - /bin/sh
        - -ec
        - ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt
          --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt --key=/etc/kubernetes/pki/etcd/healthcheck-client.key
          get foo
      failureThreshold: 8
      initialDelaySeconds: 15
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 15
    name: etcd
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/lib/etcd
      name: etcd-data
    - mountPath: /etc/kubernetes/pki/etcd
      name: etcd-certs
......
  tolerations:
  - effect: NoExecute
    operator: Exists
  volumes:
  - hostPath:
      path: /var/lib/etcd
      type: DirectoryOrCreate
    name: etcd-data
  - hostPath:
      path: /etc/kubernetes/pki/etcd
      type: DirectoryOrCreate
    name: etcd-certs
.....

咱們能夠看到 etcd 使用的證書都對應在節點的 /etc/kubernetes/pki/etcd 這個路徑下面，因此首先咱們將須要使用到的證書經過 secret 對象保存到集羣中去：(在 etcd 運行的節點)

$ kubectl -n monitoring create secret generic etcd-certs --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.crt --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.key --from-file=/etc/kubernetes/pki/etcd/ca.crt
secret/etcd-certs created

而後將上面建立的 etcd-certs 對象配置到 prometheus 資源對象中，直接更新 prometheus 資源對象便可：

nodeSelector:
  beta.kubernetes.io/os: linux
replicas: 2
secrets:
- etcd-certs

更新完成後，咱們就能夠在 Prometheus 的 Pod 中獲取到上面建立的 etcd 證書文件了，具體的路徑咱們能夠進入 Pod 中查看：

$ kubectl exec -it prometheus-k8s-0 /bin/sh -n monitoring
Defaulting container name to prometheus.
Use 'kubectl describe pod/prometheus-k8s-0 -n monitoring' to see all of the containers in this pod.
/prometheus $ ls /etc/prometheus/
config_out/         console_libraries/  consoles/           prometheus.yml      rules/              secrets/
/prometheus $ ls /etc/prometheus/secrets/etcd-certs/
ca.crt                  healthcheck-client.crt  healthcheck-client.key
/prometheus $

建立 ServiceMonitor

如今 Prometheus 訪問 etcd 集羣的證書已經準備好了，接下來建立 ServiceMonitor 對象便可（prometheus-serviceMonitorEtcd.yaml）

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: etcd-k8s
  namespace: monitoring
  labels:
    k8s-app: etcd-k8s
spec:
  jobLabel: k8s-app
  endpoints:
  - port: port
    interval: 30s
    scheme: https
    tlsConfig:
      caFile: /etc/prometheus/secrets/etcd-certs/ca.crt
      certFile: /etc/prometheus/secrets/etcd-certs/healthcheck-client.crt
      keyFile: /etc/prometheus/secrets/etcd-certs/healthcheck-client.key
      insecureSkipVerify: true
  selector:
    matchLabels:
      k8s-app: etcd
  namespaceSelector:
    matchNames:
    - kube-system

上面咱們在 monitoring 命名空間下面建立了名爲 etcd-k8s 的 ServiceMonitor 對象，基本屬性和前面章節中的一致，匹配 kube-system 這個命名空間下面的具備 k8s-app=etcd 這個 label 標籤的 Service，jobLabel 表示用於檢索 job 任務名稱的標籤，和前面不太同樣的地方是 endpoints 屬性的寫法，配置上訪問 etcd 的相關證書，endpoints 屬性下面能夠配置不少抓取的參數，好比 relabel、proxyUrl，tlsConfig 表示用於配置抓取監控數據端點的 tls 認證，因爲證書 serverName 和 etcd 中籤發的可能不匹配，因此加上了 insecureSkipVerify=true

直接建立這個 ServiceMonitor 對象：

$ kubectl create -f prometheus-serviceMonitorEtcd.yaml
servicemonitor.monitoring.coreos.com/etcd-k8s created

建立 Service

ServiceMonitor 建立完成了，可是如今尚未關聯的對應的 Service 對象，因此須要咱們去手動建立一個 Service 對象（prometheus-etcdService.yaml）：

apiVersion: v1
kind: Service
metadata:
  name: etcd-k8s
  namespace: kube-system
  labels:
    k8s-app: etcd
spec:
  type: ClusterIP
  clusterIP: None
  ports:
  - name: port
    port: 2379
    protocol: TCP

---
apiVersion: v1
kind: Endpoints
metadata:
  name: etcd-k8s
  namespace: kube-system
  labels:
    k8s-app: etcd
subsets:
- addresses:
  - ip: 172.16.138.40
    nodeName: etcd-k8s-master
  - ip: 172.16.138.41
    nodeName: etcd-k8s-node01
  ports:
  - name: port
    port: 2379
    protocol: TCP

咱們這裏建立的 Service 沒有采用前面經過 label 標籤的形式去匹配 Pod 的作法，由於前面咱們說過不少時候咱們建立的 etcd 集羣是獨立於集羣以外的，這種狀況下面咱們就須要自定義一個 Endpoints，要注意 metadata 區域的內容要和 Service 保持一致，Service 的 clusterIP 設置爲 None，對改知識點不太熟悉的，能夠去查看咱們前面關於 Service 部分的講解。

Endpoints 的 subsets 中填寫 etcd 集羣的地址便可，咱們這裏是建立的是高可用測試集羣，咱們建立的時候指定了node的主機IP地址（2個etcd也是不符合規範的。由於etcd是選舉制，2個就等於一個是同樣的。），直接建立該 Service 資源：

$ kubectl create -f prometheus-etcdService.yaml
service/etcd-k8s created
endpoints/etcd-k8s created

建立完成後，隔一下子去 Prometheus 的 Dashboard 中查看 targets，便會有 etcd 的監控項了：

數據採集到後，能夠在 grafana 中導入編號爲3070的 dashboard，獲取到 etcd 的監控圖表。

配置 PrometheusRule

如今咱們知道怎麼自定義一個 ServiceMonitor 對象了，可是若是須要自定義一個報警規則的話呢？好比如今咱們去查看 Prometheus Dashboard 的 Alert 頁面下面就已經有一些報警規則了，還有一些是已經觸發規則的了：

可是這些報警信息是哪裏來的呢？他們應該用怎樣的方式通知咱們呢？咱們知道以前咱們使用自定義的方式能夠在 Prometheus 的配置文件之中指定 AlertManager 實例和報警的 rules 文件，如今咱們經過 Operator 部署的呢？咱們能夠在 Prometheus Dashboard 的 Config 頁面下面查看關於 AlertManager 的配置：

alerting:
  alert_relabel_configs:
  - separator: ;
    regex: prometheus_replica
    replacement: $1
    action: labeldrop
  alertmanagers:
  - kubernetes_sd_configs:
    - role: endpoints
      namespaces:
        names:
        - monitoring
    scheme: http
    path_prefix: /
    timeout: 10s
    relabel_configs:
    - source_labels: [__meta_kubernetes_service_name]
      separator: ;
      regex: alertmanager-main
      replacement: $1
      action: keep
    - source_labels: [__meta_kubernetes_endpoint_port_name]
      separator: ;
      regex: web
      replacement: $1
      action: keep
rule_files:
- /etc/prometheus/rules/prometheus-k8s-rulefiles-0/*.yaml

上面 alertmanagers 實例的配置咱們能夠看到是經過角色爲 endpoints 的 kubernetes 的服務發現機制獲取的，匹配的是服務名爲 alertmanager-main，端口名未 web 的 Service 服務，咱們查看下 alertmanager-main 這個 Service：

kubectl describe svc alertmanager-main -n monitoring
Name:              alertmanager-main
Namespace:         monitoring
Labels:            alertmanager=main
Annotations:       kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"alertmanager":"main"},"name":"alertmanager-main","namespace":"monitoring"},...
Selector:          alertmanager=main,app=alertmanager
Type:              ClusterIP
IP:                10.110.43.207
Port:              web  9093/TCP
TargetPort:        web/TCP
Endpoints:         10.244.0.31:9093,10.244.2.42:9093,10.244.3.40:9093
Session Affinity:  None
Events:            <none>

能夠看到服務名正是 alertmanager-main，Port 定義的名稱也是 web，符合上面的規則，因此 Prometheus 和 AlertManager 組件就正確關聯上了。而對應的報警規則文件位於：/etc/prometheus/rules/prometheus-k8s-rulefiles-0/目錄下面全部的 YAML 文件。咱們能夠進入 Prometheus 的 Pod 中驗證下該目錄下面是否有 YAML 文件：

$ kubectl exec -it prometheus-k8s-0 /bin/sh -n monitoring
Defaulting container name to prometheus.
Use 'kubectl describe pod/prometheus-k8s-0 -n monitoring' to see all of the containers in this pod.
/prometheus $ ls /etc/prometheus/rules/prometheus-k8s-rulefiles-0/
monitoring-prometheus-k8s-rules.yaml
/prometheus $ cat /etc/prometheus/rules/prometheus-k8s-rulefiles-0/monitoring-prometheus-k8s-rules.yaml
groups:
- name: k8s.rules
  rules:
  - expr: |
      sum(rate(container_cpu_usage_seconds_total{job="kubelet", image!="", container_name!=""}[5m])) by (namespace)
    record: namespace:container_cpu_usage_seconds_total:sum_rate
  - expr: |
      sum by (namespace, pod_name, container_name) (
        rate(container_cpu_usage_seconds_total{job="kubelet", image!="", container_name!=""}[5m])
      )
    record: namespace_pod_name_container_name:container_cpu_usage_seconds_total:sum_rate
...........

這個 YAML 文件實際上就是咱們以前建立的一個 PrometheusRule 文件包含的：

$ cat prometheus-rules.yaml 
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    prometheus: k8s
    role: alert-rules
  name: prometheus-k8s-rules
  namespace: monitoring
spec:
  groups:
  - name: k8s.rules
    rules:
    - expr: |
        sum(rate(container_cpu_usage_seconds_total{job="kubelet", image!="", container_name!=""}[5m])) by (namespace)
      record: namespace:container_cpu_usage_seconds_total:sum_rate
    - expr: |
        sum by (namespace, pod_name, container_name) (
          rate(container_cpu_usage_seconds_total{job="kubelet", image!="", container_name!=""}[5m])
        )
      record: namespace_pod_name_container_name:container_cpu_usage_seconds_total:sum_rate
.....

咱們這裏的 PrometheusRule 的 name 爲 prometheus-k8s-rules，namespace 爲 monitoring，咱們能夠猜測到咱們建立一個 PrometheusRule 資源對象後，會自動在上面的 prometheus-k8s-rulefiles-0 目錄下面生成一個對應的<namespace>-<name>.yaml文件，因此若是之後咱們須要自定義一個報警選項的話，只須要定義一個 PrometheusRule 資源對象便可。至於爲何 Prometheus 可以識別這個 PrometheusRule 資源對象呢？這就須要查看咱們建立的 prometheus 這個資源對象了，裏面有很是重要的一個屬性 ruleSelector，用來匹配 rule 規則的過濾器，要求匹配具備 prometheus=k8s 和 role=alert-rules 標籤的 PrometheusRule 資源對象，如今明白了吧？

ruleSelector:
  matchLabels:
    prometheus: k8s
    role: alert-rules

因此咱們要想自定義一個報警規則，只須要建立一個具備 prometheus=k8s 和 role=alert-rules 標籤的 PrometheusRule 對象就好了，好比如今咱們添加一個 etcd 是否可用的報警，咱們知道 etcd 整個集羣有一半以上的節點可用的話集羣就是可用的，因此咱們判斷若是不可用的 etcd 數量超過了一半那麼就觸發報警，建立文件 prometheus-etcdRules.yaml：

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    prometheus: k8s
    role: alert-rules
  name: etcd-rules
  namespace: monitoring
spec:
  groups:
  - name: etcd
    rules:
    - alert: EtcdClusterUnavailable
      annotations:
        summary: etcd cluster small
        description: If one more etcd peer goes down the cluster will be unavailable
      expr: |
        count(up{job="etcd"} == 0) > (count(up{job="etcd"}) / 2 - 1)
      for: 3m
      labels:
        severity: critical
.....
$ kubectl create -f  prometheus-etcdRules.yam

注意 label 標籤必定至少要有 prometheus=k8s 和 role=alert-rules，建立完成後，隔一下子再去容器中查看下 rules 文件夾：

$ kubectl exec -it prometheus-k8s-0 /bin/sh -n monitoring
Defaulting container name to prometheus.
Use 'kubectl describe pod/prometheus-k8s-0 -n monitoring' to see all of the containers in this pod.
/prometheus $ ls /etc/prometheus/rules/prometheus-k8s-rulefiles-0/
monitoring-etcd-rules.yaml            monitoring-prometheus-k8s-rules.yaml

能夠看到咱們建立的 rule 文件已經被注入到了對應的 rulefiles 文件夾下面了，證實咱們上面的設想是正確的。而後再去 Prometheus Dashboard 的 Alert 頁面下面就能夠查看到上面咱們新建的報警規則了：

配置報警

咱們知道了如何去添加一個報警規則配置項，可是這些報警信息用怎樣的方式去發送呢？前面的課程中咱們知道咱們能夠經過 AlertManager 的配置文件去配置各類報警接收器，如今咱們是經過 Operator 提供的 alertmanager 資源對象建立的組件，應該怎樣去修改配置呢？

首先咱們將 alertmanager-main 這個 Service 改成 NodePort 類型的 Service，修改完成後咱們能夠在頁面上的 status 路徑下面查看 AlertManager 的配置信息:

$ kubectl edit svc alertmanager-main -n monitoring
......
  selector:
    alertmanager: main
    app: alertmanager
  sessionAffinity: None
  type: NodePort
.....

這些配置信息其實是來自於咱們以前在prometheus-operator/contrib/kube-prometheus/manifests目錄下面建立的 alertmanager-secret.yaml 文件：

apiVersion: v1
data:
  alertmanager.yaml: Imdsb2JhbCI6IAogICJyZXNvbHZlX3RpbWVvdXQiOiAiNW0iCiJyZWNlaXZlcnMiOiAKLSAibmFtZSI6ICJudWxsIgoicm91dGUiOiAKICAiZ3JvdXBfYnkiOiAKICAtICJqb2IiCiAgImdyb3VwX2ludGVydmFsIjogIjVtIgogICJncm91cF93YWl0IjogIjMwcyIKICAicmVjZWl2ZXIiOiAibnVsbCIKICAicmVwZWF0X2ludGVydmFsIjogIjEyaCIKICAicm91dGVzIjogCiAgLSAibWF0Y2giOiAKICAgICAgImFsZXJ0bmFtZSI6ICJEZWFkTWFuc1N3aXRjaCIKICAgICJyZWNlaXZlciI6ICJudWxsIg==
kind: Secret
metadata:
  name: alertmanager-main
  namespace: monitoring
type: Opaque

能夠將 alertmanager.yaml 對應的 value 值作一個 base64 解碼：

echo Imdsb2JhbCI6IAogICJyZXNvbHZlX3RpbWVvdXQiOiAiNW0iCiJyZWNlaXZlcnMiOiAKLSAibmFtZSI6ICJudWxsIgoicm91dGUiOiAKICAiZ3JvdXBfYnkiOiAKICAtICJqb2IiCiAgImdyb3VwX2ludGVydmFsIjogIjVtIgogICJncm91cF93YWl0IjogIjMwcyIKICAicmVjZWl2ZXIiOiAibnVsbCIKICAicmVwZWF0X2ludGVydmFsIjogIjEyaCIKICAicm91dGVzIjogCiAgLSAibWF0Y2giOiAKICAgICAgImFsZXJ0bmFtZSI6ICJEZWFkTWFuc1N3aXRjaCIKICAgICJyZWNlaXZlciI6ICJudWxsIg== | base64 -d

解碼出來的結果
"global":
  "resolve_timeout": "5m"
"receivers":
- "name": "null"
"route":
  "group_by":
  - "job"
  "group_interval": "5m"
  "group_wait": "30s"
  "receiver": "null"
  "repeat_interval": "12h"
  "routes":
  - "match":
      "alertname": "DeadMansSwitch"
    "receiver": "null"

咱們能夠看到內容和上面查看的配置信息是一致的，因此若是咱們想要添加本身的接收器，或者模板消息，咱們就能夠更改這個文件：

global:
  resolve_timeout: 5m
  smtp_smarthost: 'smtp.qq.com:587'
  smtp_from: 'zhaikun1992@qq.com'
  smtp_auth_username: 'zhaikun1992@qq.com'
  smtp_auth_password: '***'
  smtp_hello: 'qq.com'
  smtp_require_tls: true
templates:
  - "/etc/alertmanager-tmpl/wechat.tmpl"
route:
  group_by: ['job', 'severity']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 5m
  receiver: default
  routes:
  - receiver: 'wechat'
    group_wait: 10s
    match:
      alertname: CoreDNSDown
receivers:
- name: 'default'
  email_configs:
  - to: 'zhai_kun@suixingpay.com'
    send_resolved: true
- name: 'wechat'
  wechat_configs:
  - corp_id: '***'
    to_party: '*'
    to_user: "**"
    agent_id: '***'
    api_secret: '***'
    send_resolved: true

將上面文件保存爲 alertmanager.yaml，而後使用這個文件建立一個 Secret 對象：

#刪除原secret對象
kubectl delete secret alertmanager-main -n monitoring
secret "alertmanager-main" deleted
#將本身的配置文件導入到新的secret
kubectl create secret generic alertmanager-main --from-file=alertmanager.yaml -n monitoring

咱們添加了兩個接收器，默認的經過郵箱進行發送，對於 CoreDNSDown 這個報警咱們經過 wechat 來進行發送，上面的步驟建立完成後，很快咱們就會收到一條釘釘消息：

一樣郵箱中也會收到報警信息：

咱們再次查看 AlertManager 頁面的 status 頁面的配置信息能夠看到已經變成上面咱們的配置信息了：

AlertManager 配置也可使用模板(.tmpl文件)，這些模板能夠與 alertmanager.yaml 配置文件一塊兒添加到 Secret 對象中，好比：

apiVersion：v1
kind：secret
metadata：
   name：alertmanager-example
data：
  alertmanager.yaml：{BASE64_CONFIG}
  template_1.tmpl：{BASE64_TEMPLATE_1}
  template_2.tmpl：{BASE64_TEMPLATE_2}
  ...

模板會被放置到與配置文件相同的路徑，固然要使用這些模板文件，還須要在 alertmanager.yaml 配置文件中指定：

templates:
- '*.tmpl'

建立成功後，Secret 對象將會掛載到 AlertManager 對象建立的 AlertManager Pod 中去。

樣例：咱們建立一個alertmanager-tmpl.yaml文件，添加以下內容：

{{ define "wechat.default.message" }}
{{ range .Alerts }}
========start==========
告警程序: prometheus_alert
告警級別: {{ .Labels.severity }}
告警類型: {{ .Labels.alertname }}
故障主機: {{ .Labels.instance }}
告警主題: {{ .Annotations.summary }}
告警詳情: {{ .Annotations.description }}
觸發時間: {{ .StartsAt.Format "2013-12-02 15:04:05" }}
========end==========
{{ end }}
{{ end }}

刪除原secret對象

$ kubectl delete secret alertmanager-main -n monitoring
secret "alertmanager-main" deleted

建立新的secret對象

$ kubectl create secret generic alertmanager-main --from-file=alertmanager.yaml --from-file=alertmanager-tmpl.yaml -n monitoring
secret/alertmanager-main created

過一會咱們的微信就會收到告警信息。固然這裏標籤訂義的問題，獲取的值不全，咱們能夠根據實際狀況自定義。

自動發現配置

咱們想一個問題，若是在咱們的 Kubernetes 集羣中有了不少的 Service/Pod，那麼咱們都須要一個一個的去創建一個對應的 ServiceMonitor 對象來進行監控嗎？這樣豈不是又變得麻煩起來了？

爲解決這個問題，Prometheus Operator 爲咱們提供了一個額外的抓取配置的來解決這個問題，咱們能夠經過添加額外的配置來進行服務發現進行自動監控。和前面自定義的方式同樣，咱們想要在 Prometheus Operator 當中去自動發現並監控具備prometheus.io/scrape=true這個 annotations 的 Service，以前咱們定義的 Prometheus 的配置以下：

- job_name: 'kubernetes-service-endpoints'
  kubernetes_sd_configs:
  - role: endpoints
  relabel_configs:
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
    action: keep
    regex: true
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
    action: replace
    target_label: __scheme__
    regex: (https?)
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
    action: replace
    target_label: __metrics_path__
    regex: (.+)
  - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
    action: replace
    target_label: __address__
    regex: ([^:]+)(?::\d+)?;(\d+)
    replacement: $1:$2
  - action: labelmap
    regex: __meta_kubernetes_service_label_(.+)
  - source_labels: [__meta_kubernetes_namespace]
    action: replace
    target_label: kubernetes_namespace
  - source_labels: [__meta_kubernetes_service_name]
    action: replace
    target_label: kubernetes_name

要想自動發現集羣中的 Service，就須要咱們在 Service 的annotation區域添加prometheus.io/scrape=true的聲明，將上面文件直接保存爲 prometheus-additional.yaml，而後經過這個文件建立一個對應的 Secret 對象：

$ kubectl create secret generic additional-configs --from-file=prometheus-additional.yaml -n monitoring
secret/additional-configs created

建立完成後，會將上面配置信息進行 base64 編碼後做爲 prometheus-additional.yaml 這個 key 對應的值存在：

$ kubectl get secret additional-configs -n monitoring -o yaml
apiVersion: v1
data:
  prometheus-additional.yaml: LSBqb2JfbmFtZTogJ2t1YmVybmV0ZXMtc2VydmljZS1lbmRwb2ludHMnCiAga3ViZXJuZXRlc19zZF9jb25maWdzOgogIC0gcm9sZTogZW5kcG9pbnRzCiAgcmVsYWJlbF9jb25maWdzOgogIC0gc291cmNlX2xhYmVsczogW19fbWV0YV9rdWJlcm5ldGVzX3NlcnZpY2VfYW5ub3RhdGlvbl9wcm9tZXRoZXVzX2lvX3NjcmFwZV0KICAgIGFjdGlvbjoga2VlcAogICAgcmVnZXg6IHRydWUKICAtIHNvdXJjZV9sYWJlbHM6IFtfX21ldGFfa3ViZXJuZXRlc19zZXJ2aWNlX2Fubm90YXRpb25fcHJvbWV0aGV1c19pb19zY2hlbWVdCiAgICBhY3Rpb246IHJlcGxhY2UKICAgIHRhcmdldF9sYWJlbDogX19zY2hlbWVfXwogICAgcmVnZXg6IChodHRwcz8pCiAgLSBzb3VyY2VfbGFiZWxzOiBbX19tZXRhX2t1YmVybmV0ZXNfc2VydmljZV9hbm5vdGF0aW9uX3Byb21ldGhldXNfaW9fcGF0aF0KICAgIGFjdGlvbjogcmVwbGFjZQogICAgdGFyZ2V0X2xhYmVsOiBfX21ldHJpY3NfcGF0aF9fCiAgICByZWdleDogKC4rKQogIC0gc291cmNlX2xhYmVsczogW19fYWRkcmVzc19fLCBfX21ldGFfa3ViZXJuZXRlc19zZXJ2aWNlX2Fubm90YXRpb25fcHJvbWV0aGV1c19pb19wb3J0XQogICAgYWN0aW9uOiByZXBsYWNlCiAgICB0YXJnZXRfbGFiZWw6IF9fYWRkcmVzc19fCiAgICByZWdleDogKFteOl0rKSg/OjpcZCspPzsoXGQrKQogICAgcmVwbGFjZW1lbnQ6ICQxOiQyCiAgLSBhY3Rpb246IGxhYmVsbWFwCiAgICByZWdleDogX19tZXRhX2t1YmVybmV0ZXNfc2VydmljZV9sYWJlbF8oLispCiAgLSBzb3VyY2VfbGFiZWxzOiBbX19tZXRhX2t1YmVybmV0ZXNfbmFtZXNwYWNlXQogICAgYWN0aW9uOiByZXBsYWNlCiAgICB0YXJnZXRfbGFiZWw6IGt1YmVybmV0ZXNfbmFtZXNwYWNlCiAgLSBzb3VyY2VfbGFiZWxzOiBbX19tZXRhX2t1YmVybmV0ZXNfc2VydmljZV9uYW1lXQogICAgYWN0aW9uOiByZXBsYWNlCiAgICB0YXJnZXRfbGFiZWw6IGt1YmVybmV0ZXNfbmFtZQo=
kind: Secret
metadata:
  creationTimestamp: 2019-03-20T03:38:37Z
  name: additional-configs
  namespace: monitoring
  resourceVersion: "29056864"
  selfLink: /api/v1/namespaces/monitoring/secrets/additional-configs
  uid: a579495b-4ac1-11e9-baf3-005056930126
type: Opaque

而後咱們只須要在聲明 prometheus 的資源對象文件中添加上這個額外的配置：(prometheus-prometheus.yaml)

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  labels:
    prometheus: k8s
  name: k8s
  namespace: monitoring
spec:
  alerting:
    alertmanagers:
    - name: alertmanager-main
      namespace: monitoring
      port: web
  baseImage: quay.io/prometheus/prometheus
  nodeSelector:
    beta.kubernetes.io/os: linux
  replicas: 2
  secrets:
  - etcd-certs
  resources:
    requests:
      memory: 400Mi
  ruleSelector:
    matchLabels:
      prometheus: k8s
      role: alert-rules
  securityContext:
    fsGroup: 2000
    runAsNonRoot: true
    runAsUser: 1000
  additionalScrapeConfigs:
    name: additional-configs
    key: prometheus-additional.yaml
  serviceAccountName: prometheus-k8s
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector: {}
  version: v2.5.0

添加完成後，直接更新 prometheus 這個 CRD 資源對象：

$ kubectl apply -f prometheus-prometheus.yaml
prometheus.monitoring.coreos.com/k8s configured

隔一小會兒，能夠前往 Prometheus 的 Dashboard 中查看配置是否生效：

在 Prometheus Dashboard 的配置頁面下面咱們能夠看到已經有了對應的的配置信息了，可是咱們切換到 targets 頁面下面卻並無發現對應的監控任務，查看 Prometheus 的 Pod 日誌：

$ kubectl logs -f prometheus-k8s-0 prometheus -n monitoring
evel=error ts=2019-03-20T03:55:01.298281581Z caller=main.go:240 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:302: Failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list pods at the cluster scope"
level=error ts=2019-03-20T03:55:02.29813427Z caller=main.go:240 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:301: Failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list services at the cluster scope"
level=error ts=2019-03-20T03:55:02.298431046Z caller=main.go:240 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:300: Failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list endpoints at the cluster scope"
level=error ts=2019-03-20T03:55:02.299312874Z caller=main.go:240 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:302: Failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list pods at the cluster scope"
level=error ts=2019-03-20T03:55:03.299674406Z caller=main.go:240 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:301: Failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list services at the cluster scope"
level=error ts=2019-03-20T03:55:03.299757543Z caller=main.go:240 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:300: Failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list endpoints at the cluster scope"
level=error ts=2019-03-20T03:55:03.299907982Z caller=main.go:240 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:302: Failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list pods at the cluster scope"

能夠看到有不少錯誤日誌出現，都是xxx is forbidden，這說明是 RBAC 權限的問題，經過 prometheus 資源對象的配置能夠知道 Prometheus 綁定了一個名爲 prometheus-k8s 的 ServiceAccount 對象，而這個對象綁定的是一個名爲 prometheus-k8s 的 ClusterRole：（prometheus-clusterRole.yaml）

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus-k8s
rules:
- apiGroups:
  - ""
  resources:
  - nodes/metrics
  verbs:
  - get
- nonResourceURLs:
  - /metrics
  verbs:
  - get

上面的權限規則中咱們能夠看到明顯沒有對 Service 或者 Pod 的 list 權限，因此報錯了，要解決這個問題，咱們只須要添加上須要的權限便可：

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus-k8s
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  - services
  - endpoints
  - pods
  - nodes/proxy
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - configmaps
  - nodes/metrics
  verbs:
  - get
- nonResourceURLs:
  - /metrics
  verbs:
  - get

更新上面的 ClusterRole 這個資源對象，而後重建下 Prometheus 的全部 Pod，正常就能夠看到 targets 頁面下面有 kubernetes-service-endpoints 這個監控任務了：

$ kubectl apply -f prometheus-clusterRole.yaml
clusterrole.rbac.authorization.k8s.io/prometheus-k8s configured

咱們這裏自動監控了兩個 Service，這兩個都是coredns的，咱們在 Service 中有兩個特殊的 annotations：

$ kubectl describe svc kube-dns -n kube-system
Name:              kube-dns
Namespace:         kube-system
....
Annotations:       prometheus.io/port=9153
                   prometheus.io/scrape=true
...

因此被自動發現了，固然咱們也能夠用一樣的方式去配置 Pod、Ingress 這些資源對象的自動發現。

數據持久化

上面咱們在修改完權限的時候，重啓了 Prometheus 的 Pod，若是咱們仔細觀察的話會發現咱們以前採集的數據已經沒有了，這是由於咱們經過 prometheus 這個 CRD 建立的 Prometheus 並無作數據的持久化，咱們能夠直接查看生成的 Prometheus Pod 的掛載狀況就清楚了：

............
    volumeMounts:
    - mountPath: /etc/prometheus/config_out
      name: config-out
      readOnly: true
    - mountPath: /prometheus
      name: prometheus-k8s-db
    - mountPath: /etc/prometheus/rules/prometheus-k8s-rulefiles-0 
.........
 volumes:
  - name: config
    secret:
      defaultMode: 420
      secretName: prometheus-k8s
  - emptyDir: {}

咱們能夠看到 Prometheus 的數據目錄 /prometheus 其實是經過 emptyDir 進行掛載的，咱們知道 emptyDir 掛載的數據的生命週期和 Pod 生命週期一致的，因此若是 Pod 掛掉了，數據也就丟失了，這也就是爲何咱們重建 Pod 後以前的數據就沒有了的緣由，對應線上的監控數據確定須要作數據的持久化的，一樣的 prometheus 這個 CRD 資源也爲咱們提供了數據持久化的配置方法，因爲咱們的 Prometheus 最終是經過 Statefulset 控制器進行部署的，因此咱們這裏須要經過 storageclass 來作數據持久化，咱們以前用rook已經搭建過storageclass。因此咱們就能夠直接用了。咱們讓prometheus 的 CRD 資源對象（prometheus-prometheus.yaml）中添加以下配置：

  storage:
    volumeClaimTemplate:
      spec:
        storageClassName: rook-ceph-block
        resources:
          requests:
            storage: 10Gi

注意這裏的 storageClassName 名字爲上面咱們建立的 StorageClass 對象名稱，而後更新 prometheus 這個 CRD 資源。更新完成後會自動生成兩個 PVC 和 PV 資源對象：

$ kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS    CLAIM                                           STORAGECLASS      REASON    AGE
pvc-dba11961-4ad6-11e9-baf3-005056930126   10Gi       RWO            Delete           Bound     monitoring/prometheus-k8s-db-prometheus-k8s-0   rook-ceph-block             1m
pvc-dbc6bac5-4ad6-11e9-baf3-005056930126   10Gi       RWO            Delete           Bound     monitoring/prometheus-k8s-db-prometheus-k8s-1   rook-ceph-block             1m
$ kubectl get pvc -n monitoring
NAME                                 STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
prometheus-k8s-db-prometheus-k8s-0   Bound     pvc-dba11961-4ad6-11e9-baf3-005056930126   10Gi       RWO            rook-ceph-block   2m
prometheus-k8s-db-prometheus-k8s-1   Bound     pvc-dbc6bac5-4ad6-11e9-baf3-005056930126   10Gi       RWO            rook-ceph-block   2m

如今咱們再去看 Prometheus Pod 的數據目錄就能夠看到是關聯到一個 PVC 對象上了。

.......
volumeMounts:
    - mountPath: /etc/prometheus/config_out
      name: config-out
      readOnly: true
    - mountPath: /prometheus
      name: prometheus-k8s-db
      subPath: prometheus-db
    - mountPath: /etc/prometheus/rules/prometheus-k8s-rulefiles-0
      name: prometheus-k8s-rulefiles-0
.........
  volumes:
  - name: prometheus-k8s-db
    persistentVolumeClaim:
      claimName: prometheus-k8s-db-prometheus-k8s-0
.........

如今即便咱們的 Pod 掛掉了，數據也不會丟失了。讓咱們測試一下。

咱們先隨便查一下數據

刪除pod

kubectl delete pod prometheus-k8s-1 -n monitorin
kubectl delete pod prometheus-k8s-0 -n monitorin

查看pod狀態

kubectl get pod -n monitoring
NAME                                   READY     STATUS              RESTARTS   AGE
alertmanager-main-0                    2/2       Running             0          2d
alertmanager-main-1                    2/2       Running             0          2d
alertmanager-main-2                    2/2       Running             0          2d
grafana-7489c49998-pkl8w               1/1       Running             0          2d
kube-state-metrics-d6cf6c7b5-7dwpg     4/4       Running             0          2d
node-exporter-dlp25                    2/2       Running             0          2d
node-exporter-fghlp                    2/2       Running             0          2d
node-exporter-mxwdm                    2/2       Running             0          2d
node-exporter-r9v92                    2/2       Running             0          2d
prometheus-adapter-84cd9c96c9-n92n4    1/1       Running             0          2d
prometheus-k8s-0                       0/3       ContainerCreating   0          3s
prometheus-k8s-1                       3/3       Running             0          9s
prometheus-operator-7b74946bd6-vmbcj   1/1       Running             0          2d