kubernetes之監控Prometheus實戰--prometheus介紹--獲取監控（一）

時間 2019-11-12

標籤 kubernetes 監控 prometheus 實戰介紹獲取简体版

原文原文鏈接

Prometheus介紹

Prometheus是一個最初在SoundCloud上構建的開源監控系統。它如今是一個獨立的開源項目，爲了強調這一點，並說明項目的治理結構，Prometheus 於2016年加入CNCF，做爲繼Kubernetes以後的第二個託管項目。node

特色

具備由 metric 名稱和鍵/值對標識的時間序列數據的多維數據模型
PromQL，有一個靈活的查詢語言
不依賴分佈式存儲，只和本地磁盤有關
經過 HTTP 的服務拉取時間序列數據
也支持推送的方式來添加時間序列數據
經過服務發現或靜態配置發現目標
多種圖形和儀表板支持

組件

Prometheus系統由多個組件組成，其中許多組件是可選的：git

Prometheus Server：用於抓取指標、存儲時間序列數據
exporter：暴露指標讓任務來抓
pushgateway：push 的方式將指標數據推送到該網關
alertmanager：處理報警的報警組件
adhoc：用於數據查詢

大多數 Prometheus 組件都是用 Go 編寫的，所以很容易構建和部署爲靜態的二進制文件。github

架構圖

此圖說明prometheus的體系結構及其一些系統組件web

總體流程比較簡單，Prometheus 直接接收或者經過中間的 Pushgateway 網關被動獲取指標數據，在本地存儲全部的獲取的指標數據，並對這些數據進行一些規則整理，用來生成一些聚合數據或者報警信息，Grafana 或者其餘工具用來可視化這些數據。正則表達式

安裝

因爲 Prometheus 是 Golang 編寫的程序，因此要安裝的話也很是簡單，只須要將二進制文件下載下來直接執行便可，前往地址：https://prometheus.io/download 下載咱們對應的版本便可。redis

Prometheus 是經過一個 YAML 配置文件來進行啓動的，若是咱們使用二進制的方式來啓動的話，可使用下面的命令：docker

./prometheus --config.file=prometheus.yml

prometheus.yml配置文件後端

global:
  scrape_interval:     15s
  evaluation_interval: 15s

rule_files:
  # - "first.rules"
  # - "second.rules"

scrape_configs:
  - job_name: prometheus
    static_configs:
      - targets: ['localhost:9090']

配置文件中配置的三個模塊：global，rule_files，和scrape_configsapi

global 模塊是prometheus的全局配置：跨域

scrape_interval：表示 prometheus 抓取指標數據的頻率，默認是15s，咱們能夠覆蓋這個值
evaluation_interval：用來控制評估規則的頻率，prometheus 使用規則產生新的時間序列數據或者產生警報

rule_files 模塊制定了規則所在的位置，prometheus 能夠根據這個配置加載規則，用於生成新的時間序列數據或者報警信息，當前咱們沒有配置任何規則。

scrape_configs 用於控制 prometheus 監控哪些資源。因爲 prometheus 經過 HTTP 的方式來暴露的它自己的監控數據，prometheus 也可以監控自己的健康狀況。在默認的配置裏有一個單獨的 job，叫作prometheus，它採集 prometheus 服務自己的時間序列數據。這個 job 包含了一個單獨的、靜態配置的目標：監聽 localhost 上的9090端口。prometheus 默認會經過目標的/metrics路徑採集 metrics。因此，默認的 job 經過 URL：http://localhost:9090/metrics採集 metrics。收集到的時間序列包含 prometheus 服務自己的狀態和性能。若是咱們還有其餘的資源須要監控的話，直接配置在該模塊下面就能夠了。

在kubernetes中部署安裝

這裏咱們把prometheus相關的服務都部署在kube-ops這個namespace下

一、咱們把prometheus.yml中部署成ConfigMap

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
  namespace: kube-ops
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s
      scrape_timeout: 15s
    scrape_configs:
    - job_name: 'prometheus'
      static_configs:
      - targets: ['localhost:9090']

二、建立prometheus相關pod資源

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: prometheus
  namespace: kube-ops
  labels:
    app: prometheus
spec:
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      containers:
      - image: prom/prometheus:v2.6.0
        name: prometheus
        imagePullPolicy: IfNotPresent
        args:
        - "--config.file=/etc/prometheus/prometheus.yml"
        - "--storage.tsdb.path=/prometheus"
        - "--storage.tsdb.retention=7d"
        - "--web.enable-admin-api"
        - "--web.enable-lifecycle"
        ports:
        - containerPort: 9090
          name: http
        volumeMounts:
        - mountPath: "/prometheus"
          subPath: prometheus
          name: data
        - mountPath: "/etc/prometheus"
          name: config
        resources:
          requests:
            cpu: 1000m
            memory: 2Gi
          limits:
            cpu: 1000m
            memory: 2Gi
      securityContext:
        runAsUser: 0
      volumes:
      - name: config
        configMap:
          name: prometheus-config
      - name: data
        persistentVolumeClaim:
          claimName: prometheus

經過storage.tsdb.path指定了 TSDB 數據的存儲路徑
經過storage.tsdb.retention設置了保留多長時間的數據
經過web.enable-admin-api參數能夠用來開啓對 admin api 的訪問權限
經過web.enable-lifecycle很是重要，用來開啓支持熱更新的，有了這個參數以後，prometheus.yml 配置文件只要更新了，經過執行http://localhost:9090/-/reload就會當即生效，因此必定要加上這個參數

咱們這裏將 prometheus.yml 文件對應的 ConfigMap 對象經過 volume 的形式掛載進了 Pod，這樣 ConfigMap 更新後，對應的 Pod 裏面的文件也會熱更新的，而後咱們再執行上面的 reload 請求，Prometheus 配置就生效了

爲了將時間序列數據進行持久化，咱們將數據目錄和一個 pvc 對象進行了綁定，因此咱們須要提早建立好這個 pvc 對象(這裏咱們使用的storageclass)

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: prometheus
  namespace: kube-ops
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: "rook-ceph-block"

除了上面的注意事項外，咱們這裏還須要配置 rbac 認證，由於咱們須要在 prometheus 中去訪問 Kubernetes 的相關信息，因此咱們這裏管理了一個名爲 prometheus 的 serviceAccount 對象：

apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
  namespace: kube-ops
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  - services
  - endpoints
  - pods
  - nodes/proxy
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - configmaps
  - nodes/metrics
  verbs:
  - get
- nonResourceURLs:
  - /metrics
  verbs:
  - get
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: kube-ops

因爲咱們要獲取的資源信息，在每個 namespace 下面都有可能存在，因此咱們這裏使用的是 ClusterRole 的資源對象，值得一提的是咱們這裏的權限規則聲明中有一個nonResourceURLs的屬性，是用來對非資源型 metrics 進行操做的權限聲明。

還有一個要注意的地方是咱們這裏必需要添加一個securityContext的屬性，將其中的runAsUser設置爲0，這是由於如今的 prometheus 運行過程當中使用的用戶是 nobody，不然會出現下面的permission denied之類的權限錯誤：

level=error ts=2018-10-22T14:34:58.632016274Z caller=main.go:617 err="opening storage failed: lock DB directory: open /data/lock: permission denied"

這裏咱們還須要一個svc服務，做爲外部訪問。

apiVersion: v1
kind: Service
metadata:
  name: prometheus
  namespace: kube-ops
  labels:
    app: prometheus
spec:
  selector:
    app: prometheus
  type: NodePort
  ports:
    - name: web
      port: 9090
      targetPort: http

文件準備完成後咱們可使用如下命令構建

kubectl apply -f .

訪問

kubectl get svc -n kube-ops
NAME         TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
prometheus   NodePort   10.111.210.47   <none>        9090:31990/TCP   7d

咱們能夠經過任意節點訪問31990端口便可訪問

Kubernetes 集羣的監控方案

目前主要有如下集中方案

Heapster: 是一個集羣範圍的監控和數據聚合工具，以 Pod 的形式運行在集羣中。除了 Kubelet/cAdvisor 以外，咱們還能夠向 Heapster 添加其餘指標源數據好比 kube-state-metrics
cAdvisor: 是Google開源的容器資源監控和性能分析工具，它是專門爲容器而生，自己也支持 Docker 容器，在 Kubernetes 中，咱們不須要單獨去安裝，cAdvisor 做爲 kubelet 內置的一部分程序能夠直接使用。
Kube-state-metrics: 經過監聽 API Server 生成有關資源對象的狀態指標，好比 Deployment、Node、Pod，須要注意的是 kube-state-metrics 只是簡單提供一個 metrics 數據，並不會存儲這些指標數據，因此咱們可使用 Prometheus 來抓取這些數據而後存儲。
metrics-server: 也是一個集羣範圍內的資源數據聚合工具，是 Heapster 的替代品，一樣的，metrics-server 也只是顯示數據，並不提供數據存儲服務。

不過 kube-state-metrics 和 metrics-server 之間仍是有很大不一樣的，兩者的主要區別以下：

kube-state-metrics 主要關注的是業務相關的一些元數據，好比 Deployment、Pod、副本狀態等
metrics-server 主要關注的是資源度量 API 的實現，好比 CPU、文件描述符、內存、請求延時等指標。

監控kubernetes集羣節點

對於集羣的監控通常咱們須要考慮如下幾個方面：

Kubernetes 節點的監控：好比節點的 cpu、load、disk、memory 等指標
內部系統組件的狀態：好比 kube-scheduler、kube-controller-manager、kubedns/coredns 等組件的詳細運行狀態
編排級的 metrics：好比 Deployment 的狀態、資源請求、調度和 API 延遲等數據指標

這裏經過 Prometheus 來採集節點的監控指標數據，能夠經過node_exporter來獲取，顧名思義，node_exporter 抓喲就是用於採集服務器節點的各類運行指標的，目前 node_exporter 支持幾乎全部常見的監控點，好比 conntrack，cpu，diskstats，filesystem，loadavg，meminfo，netstat等，詳細的監控點列表能夠參考其Github repo。

咱們能夠經過 DaemonSet 控制器來部署該服務，這樣每個節點都會自動運行一個這樣的 Pod，若是咱們從集羣中刪除或者添加節點後，也會進行自動擴展。

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: node-exporter
  namespace: kube-ops
  labels:
    name: node-exporter
spec:
  template:
    metadata:
      labels:
        name: node-exporter
    spec:
      hostPID: true
      hostIPC: true
      hostNetwork: true
      containers:
      - name: node-exporter
        image: prom/node-exporter:v0.17.0
        ports:
        - containerPort: 9100
        resources:
          requests:
            cpu: 0.15
        securityContext:
          privileged: true
        args:
        - --path.procfs
        - /host/proc
        - --path.sysfs
        - /host/sys
        - --collector.filesystem.ignored-mount-points
        - '"^/(sys|proc|dev|host|etc)($|/)"'
        volumeMounts:
        - name: dev
          mountPath: /host/dev
        - name: proc
          mountPath: /host/proc
        - name: sys
          mountPath: /host/sys
        - name: rootfs
          mountPath: /rootfs
      tolerations:
      - key: "node-role.kubernetes.io/master"
        operator: "Exists"
        effect: "NoSchedule"
      volumes:
        - name: proc
          hostPath:
            path: /proc
        - name: dev
          hostPath:
            path: /dev
        - name: sys
          hostPath:
            path: /sys
        - name: rootfs
          hostPath:
            path: /

因爲咱們要獲取到的數據是主機的監控指標數據，而咱們的 node-exporter 是運行在容器中的，因此咱們在 Pod 中須要配置一些 Pod 的安全策略，這裏咱們就添加了hostPID: true、hostIPC: true、hostNetwork: true3個策略，用來使用主機的 PID namespace、IPC namespace 以及主機網絡，這些 namespace 就是用於容器隔離的關鍵技術，要注意這裏的 namespace 和集羣中的 namespace 是兩個徹底不相同的概念。

另外咱們還將主機的/dev、/proc、/sys這些目錄掛載到容器中，這些由於咱們採集的不少節點數據都是經過這些文件夾下面的文件來獲取到的，好比咱們在使用top命令能夠查看當前cpu使用狀況，數據就來源於文件/proc/stat，使用free命令能夠查看當前內存使用狀況，其數據來源是來自/proc/meminfo文件。

另外因爲咱們集羣使用的是 kubeadm 搭建的，因此若是但願 master 節點也一塊兒被監控，則須要添加響應的容忍。

建立上面的資源對象便可：

$ kubectl apply -f node-exporter.yaml
$ kubectl get pods -n kube-ops -o wide | grep node-exporter
node-exporter-48b6g           1/1       Running   0          7d        172.16.138.42   k8s-node02
node-exporter-4swrs           1/1       Running   0          7d        172.16.138.43   k8s-node03
node-exporter-4w2dd           1/1       Running   0          7d        172.16.138.40   k8s-master
node-exporter-fcp9x           1/1       Running   0          7d        172.16.138.41   k8s-node01

部署完成後，咱們能夠看到在3個節點上都運行了一個 Pod，有的同窗可能會說咱們這裏不須要建立一個 Service 嗎？咱們應該怎樣去獲取/metrics數據呢？咱們上面是否是指定了hostNetwork=true，因此在每一個節點上就會綁定一個端口 9100，咱們能夠經過這個端口去獲取到監控指標數據：

服務發現

因爲咱們這裏3個節點上面都運行了 node-exporter 程序，若是咱們經過一個 Service 來將數據收集到一塊兒用靜態配置的方式配置到 Prometheus 去中，就只會顯示一條數據，咱們得本身在指標數據中去過濾每一個節點的數據，那麼有沒有一種方式可讓 Prometheus 去自動發現咱們節點的 node-exporter 程序，而且按節點進行分組呢？是有的，就是咱們前面和你們提到過的服務發現。

在 Kubernetes 下，Promethues 經過與 Kubernetes API 集成，目前主要支持5中服務發現模式，分別是：Node、Service、Pod、Endpoints、Ingress。

可是要讓 Prometheus 也可以獲取到當前集羣中的全部節點信息的話，咱們就須要利用 Node 的服務發現模式，一樣的，在 prometheus.yml 文件中配置以下的 job 任務便可：

    - job_name: "kubernetes-nodes"
        kubernetes_sd_configs:
        - role: node
        relabel_configs:
        - source_labels: [__address__]
          regex: '(.*):10250'
          replacement: '${1}:9100'
          target_label: __address__
          action: replace

- action: labelmap regex: __meta_kubernetes_node_label_(.+)

經過指定kubernetes_sd_configs的模式爲node，Prometheus 就會自動從 Kubernetes 中發現全部的 node 節點並做爲當前 job 監控的目標實例，發現的節點/metrics接口是默認的 kubelet 的 HTTP 接口。

配置文件說明：

這裏就是一個正則表達式，去匹配__address__，而後將 host 部分保留下來，port 替換成了9100。

由於咱們是經過prometheus 去發現 Node 模式的服務的時候，訪問的端口默認是10250，而如今該端口下面已經沒有了/metrics指標數據了,由於咱們是要去配置上面經過node-exporter抓取到的節點指標數據，而咱們上面是否是指定了hostNetwork=true，因此在每一個節點上就會綁定一個端口9100，因此咱們應該將這裏的10250替換成9100。

這裏咱們就須要使用到 Prometheus 提供的relabel_configs中的replace能力了，relabel 能夠在 Prometheus 採集數據以前，經過Target 實例的 Metadata 信息，動態從新寫入 Label 的值。除此以外，咱們還能根據 Target 實例的 Metadata 信息選擇是否採集或者忽略該 Target 實例。好比咱們這裏就能夠去匹配__address__這個 Label 標籤，而後替換掉其中的端口。

經過labelmap這個屬性來將 Kubernetes 的 Label 標籤添加爲 Prometheus 的指標標籤，添加了一個 action 爲labelmap，正則表達式是__meta_kubernetes_node_label_(.+)的配置，這裏的意思就是表達式中匹配都的數據也添加到指標數據的 Label 標籤中去。

對於 kubernetes_sd_configs 下面可用的標籤以下：

__meta_kubernetes_node_name：節點對象的名稱
__meta_kubernetes_node_label：節點對象中的每一個標籤
__meta_kubernetes_node_annotation：來自節點對象的每一個註釋
__meta_kubernetes_node_address：每一個節點地址類型的第一個地址（若是存在） *

prometheus 的 ConfigMap 更新完成後，一樣的咱們執行 reload 操做，讓配置生效：

$ kubectl delete -f prome-cm.yaml
$ kubectl create -f prome-cm.yaml
$ kubectl get svc -n kube-ops
$ curl -X POST "http://10.111.210.47:9090/-/reload"

配置生效後，咱們再去 prometheus 的 dashboard 中查看 Targets 是否可以正常抓取數據，訪問任意節點IP:30358：

另外因爲 kubelet 也自帶了一些監控指標數據，因此咱們這裏也把 kubelet 的監控任務也一併配置上：

    - job_name: 'kubernetes-kubelet'
      kubernetes_sd_configs:
      - role: node
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)

上面的配置和咱們以前配置 node-exporter 的時候幾乎是同樣的，區別是咱們這裏使用了 https 的協議，另外須要注意的是配置了 ca.cart 和 token 這兩個文件，這兩個文件是 Pod 啓動後自動注入進來的，經過這兩個文件咱們能夠在 Pod 中訪問 apiserver。

如今咱們再去更新下配置文件，執行 reload 操做，讓配置生效，而後訪問 Prometheus 的 Dashboard 查看 Targets 路徑：

監控 Kubernetes 集羣應用

exporter 監控應用

咱們這裏經過一個redis-exporter的服務來監控 redis 服務，對於這類應用，咱們通常會以 sidecar 的形式和主應用部署在同一個 Pod 中，好比咱們這裏來部署一個 redis 應用，並用 redis-exporter 的方式來採集監控數據供 Prometheus 使用。

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: redis
  namespace: kube-ops
spec:
  template:
    metadata:
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "9121"
      labels:
        app: redis
    spec:
      containers:
      - name: redis
        image: redis:4
        resources:
          requests:
            cpu: 100m
            memory: 100Mi
        ports:
        - containerPort: 6379
      - name: redis-exporter
        image: oliver006/redis_exporter:latest
        resources:
          requests:
            cpu: 100m
            memory: 100Mi
        ports:
        - containerPort: 9121
---
kind: Service
apiVersion: v1
metadata:
  name: redis
  namespace: kube-ops
spec:
  selector:
    app: redis
  ports:
  - name: redis
    port: 6379
    targetPort: 6379
  - name: prom
    port: 9121
    targetPort: 9121

能夠看到上面咱們在 redis 這個 Pod 中包含了兩個容器，一個就是 redis 自己的主應用，另一個容器就是 redis_exporter。如今直接建立上面的應用：

$ kubectl create -f prome-redis.yaml
deployment.extensions "redis" created
service "redis" created

咱們能夠經過 9121 端口來校驗是否可以採集到數據：

$ curl 10.104.131.44:9121/metrics
# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 0
go_gc_duration_seconds{quantile="0.25"} 0
go_gc_duration_seconds{quantile="0.5"} 0
go_gc_duration_seconds{quantile="0.75"} 0
go_gc_duration_seconds{quantile="1"} 0
go_gc_duration_seconds_sum 0
go_gc_duration_seconds_count 0
......
# HELP redis_used_cpu_user_children used_cpu_user_childrenmetric
# TYPE redis_used_cpu_user_children gauge
redis_used_cpu_user_children{addr="redis://localhost:6379",alias=""} 0

一樣的，如今咱們只須要更新 Prometheus 的配置文件：

- job_name: 'redis'
  static_configs:
  - targets: ['redis:9121']

配置文件更新後，從新加載：

$ kubectl delete -f prome-cm.yaml
$ kubectl create -f prome-cm.yaml
$ curl -X POST "http://10.111.210.47:9090/-/reload"

Grafana 的安裝使用

前面咱們使用 Prometheus 採集了 Kubernetes 集羣中的一些監控數據指標，咱們也嘗試使用promQL語句查詢出了一些數據，而且在 Prometheus 的 Dashboard 中進行了展現，可是明顯能夠感受到 Prometheus 的圖表功能相對較弱，因此通常狀況下咱們會一個第三方的工具來展現這些數據，今天咱們要和你們使用到的就是grafana。

安裝

grafana 是一個可視化面板，有着很是漂亮的圖表和佈局展現，功能齊全的度量儀表盤和圖形編輯器，支持 Graphite、zabbix、InfluxDB、Prometheus、OpenTSDB、Elasticsearch 等做爲數據源，比 Prometheus 自帶的圖表展現功能強大太多，更加靈活，有豐富的插件，功能更增強大。

接下來咱們就來直接安裝，一樣的，咱們將 grafana 安裝到 Kubernetes 集羣中，第一步一樣是去查看 grafana 的 docker 鏡像的介紹，咱們能夠在 dockerhub 上去搜索，也能夠在官網去查看相關資料，鏡像地址以下：https://hub.docker.com/r/grafana/grafana/，咱們能夠看到介紹中運行 grafana 容器的命令很是簡單：

$ docker run -d --name=grafana -p 3000:3000 grafana/grafana

咱們將部署grafana的pod資源

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: grafana
  namespace: kube-ops
  labels:
    app: grafana
spec:
  revisionHistoryLimit: 10
  template:
    metadata:
      labels:
        app: grafana
    spec:
      containers:
      - name: grafana
        image: grafana/grafana:5.4.2
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 3000
          name: grafana
        env:
        - name: GF_SECURITY_ADMIN_USER
          value: admin
        - name: GF_SECURITY_ADMIN_PASSWORD
          value: admin321
        readinessProbe:
          failureThreshold: 10
          httpGet:
            path: /api/health
            port: 3000
            scheme: HTTP
          initialDelaySeconds: 60
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 30
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /api/health
            port: 3000
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources:
          limits:
            cpu: 100m
            memory: 256Mi
          requests:
            cpu: 100m
            memory: 256Mi
        volumeMounts:
        - mountPath: /var/lib/grafana
          subPath: grafana
          name: storage
      securityContext:
        fsGroup: 472
        runAsUser: 472
      volumes:
      - name: storage
        persistentVolumeClaim:
          claimName: grafana

咱們使用了最新的鏡像grafana/grafana:5.4.2，而後添加了監控檢查、資源聲明，另外兩個比較重要的環境變量GF_SECURITY_ADMIN_USER和GF_SECURITY_ADMIN_PASSWORD，用來配置 grafana 的管理員用戶和密碼的，因爲 grafana 將 dashboard、插件這些數據保存在/var/lib/grafana這個目錄下面的，因此咱們這裏若是須要作數據持久化的話，就須要針對這個目錄進行 volume 掛載聲明，其餘的和咱們以前的 Deployment 沒什麼區別，因爲上面咱們剛剛提到的 Changelog 中 grafana 的 userid 和 groupid 有所變化，因此咱們這裏須要增長一個securityContext的聲明來進行聲明。

固然若是要使用一個 pvc 對象來持久化數據，咱們就須要添加一個可用的 pv 供 pvc 綁定使用：

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: grafana
  namespace: kube-ops
  annotations:
    volume.beta.kubernetes.io/storage-class: "rook-ceph-block"
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 10Gi
  storageClassName: "rook-ceph-block"

最後，咱們須要對外暴露 grafana 這個服務，因此咱們須要一個對應的 Service 對象，固然用 NodePort 或者再創建一個 ingress 對象都是可行的：

apiVersion: v1
kind: Service
metadata:
  name: grafana
  namespace: kube-ops
  labels:
    app: grafana
spec:
  type: NodePort
  ports:
    - port: 3000
  selector:
    app: grafana

如今咱們直接建立上面的這些資源對象：

$ kubectl create -f .

這個時候咱們能夠查看 Service 對象啓動的端口：

kubectl get svc -n kube-ops
NAME         TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
grafana      NodePort   10.97.81.127    <none>        3000:30489/TCP   5d
prometheus   NodePort   10.111.210.47   <none>        9090:31990/TCP   7d

這裏咱們看到grafana啓動端口是30489。咱們能夠經過任意節點+30489訪問

因爲上面咱們配置了管理員的，因此第一次打開的時候會跳轉到登陸界面，而後就能夠用上面咱們配置的兩個環境變量的值來進行登陸了。

配置

接下來點擊Add data source進入添加數據源界面。

咱們這個地方配置的數據源是 Prometheus，因此選擇這個 Type 便可，給改數據源添加一個 name：prometheus，最主要的是下面HTTP區域是配置數據源的訪問模式。

訪問模式是用來控制如何處理對數據源的請求的：

服務器(Server)訪問模式（默認）：全部請求都將從瀏覽器發送到 Grafana 後端的服務器，後者又將請求轉發到數據源，經過這種方式能夠避免一些跨域問題，其實就是在 Grafana 後端作了一次轉發，須要從Grafana 後端服務器訪問該 URL。
瀏覽器(Browser)訪問模式：全部請求都將從瀏覽器直接發送到數據源，可是有可能會有一些跨域的限制，使用此訪問模式，須要從瀏覽器直接訪問該 URL。

因爲咱們這個地方 Prometheus 經過 NodePort 的方式的對外暴露的服務，因此咱們這個地方是否是可使用瀏覽器訪問模式直接訪問 Prometheus 的外網地址，可是這種方式顯然不是最好的，至關於走的是外網，而咱們這裏 Prometheus 和 Grafana 都處於 kube-ops 這同一個 namespace 下面，是否是在集羣內部直接經過 DNS 的形式就能夠訪問了，並且還都是走的內網流量，因此咱們這裏用服務器訪問模式顯然更好，數據源地址：http://prometheus:9090（由於在同一個 namespace 下面因此直接用 Service 名也能夠），而後其餘的配置信息就根據實際狀況了，好比 Auth 認證，咱們這裏沒有，因此跳過便可，點擊最下方的Save & Test提示成功證實咱們的數據源配置正確：

數據源添加完成後，就能夠來添加 Dashboard 了。

配置Dashboard

一樣，切換到主頁，咱們能夠根據本身的需求手動新建一個 Dashboard，除此以外，grafana 的官方網站上還有不少公共的 Dashboard 能夠供咱們使用，咱們這裏可使用Kubernetes cluster monitoring (via Prometheus)(dashboard id 爲162)這個 Dashboard 來展現 Kubernetes 集羣的監控信息，在左側側邊欄 Create 中點擊import導入：

接下來輸入162號導入（我如今已經導入，因此報一存在）

須要注意的是在執行上面的 import 以前要記得選擇咱們的prometheus這個名字的數據源，執行import操做，就能夠進入到 dashboard 頁面：

若是這裏數據沒出現有兩個緣由：

一、時間選擇UTC時間

查詢語句不正確

例如點擊Edit的，咱們能夠看到查詢語句

(sum(node_memory_MemTotal) - sum(node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached) ) / sum(node_memory_MemTotal) * 100

這就是咱們以前在 Prometheus 裏面查詢的promQL語句，咱們能夠將上面的查詢語句複製到 Prometheus 的 Graph 頁面進行查詢，其實能夠預想到是沒有對應的數據的，由於咱們用node_exporter採集到的數據指標不是node_memory_MemTotal關鍵字，而是node_memory_MemTotal_bytes，將上面的promQL語句作相應的更改：

接下來按照此方法，依次修改Dashboard中的其餘圖標。

除此以外，咱們也能夠前往 grafana dashboard 的頁面去搜索其餘的關於 Kubernetes 的監控頁面，地址：https://grafana.com/dashboards，好比id 爲747和741的這兩個 dashboard。

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。