詳解k8s原生的集羣監控方案(Heapster+InfluxDB+Grafana) - kubernetes

一、淺析監控方案

heapster是一個監控計算、存儲、網絡等集羣資源的工具，以k8s內置的cAdvisor做爲數據源收集集羣信息，並彙總出有價值的性能數據(Metrics)：cpu、內存、network、filesystem等，而後將這些數據輸出到外部存儲(backend)，如InfluxDB，最後再經過相應的UI界面進行可視化展現，如grafana。另外heapster的數據源和外部存儲都是可插拔的，因此能夠很靈活的組建出不少監控方案，如：Heapster+ElasticSearch+Kibana等等。 Heapster的總體架構圖： javascript

二、部署

本篇咱們將實踐 Heapster + InfluxDB + Grafana 的監控方案。使用官方提供的yml文件有一些小問題，請參考如下改動和說明：java

2.一、建立InfluxDB資源對象

apiVersion: apps/v1
kind: Deployment
metadata:
  name: monitoring-influxdb
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      task: monitoring
      k8s-app: influxdb
  template:
    metadata:
      labels:
        task: monitoring
        k8s-app: influxdb
    spec:
      containers:
      - name: influxdb
        image: k8s.gcr.io/heapster-influxdb-amd64:v1.3.3
        volumeMounts:
        - mountPath: /data
          name: influxdb-storage
      volumes:
      - name: influxdb-storage
        emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
  labels:
    task: monitoring
    kubernetes.io/cluster-service: 'true'
    kubernetes.io/name: monitoring-influxdb
  name: monitoring-influxdb
  namespace: kube-system
spec:
  type: NodePort
  ports:
  - nodePort: 31001
    port: 8086
    targetPort: 8086
  selector:
    k8s-app: influxdb

注意：這裏咱們使用NotePort暴露monitoring-influxdb服務在主機的31001端口上，那麼InfluxDB服務端的地址：http://[host-ip]:31001 ，記下這個地址，以便建立heapster和爲grafana配置數據源時，能夠直接使用。，node

2.一、建立Grafana資源對象

apiVersion: apps/v1
kind: Deployment
metadata:
  name: monitoring-grafana
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      task: monitoring
      k8s-app: grafana
  template:
    metadata:
      labels:
        task: monitoring
        k8s-app: grafana
    spec:
      containers:
      - name: grafana
        image: k8s.gcr.io/heapster-grafana-amd64:v4.4.3
        ports:
        - containerPort: 3000
          protocol: TCP
        volumeMounts:
        - mountPath: /etc/ssl/certs
          name: ca-certificates
          readOnly: true
        - mountPath: /var
          name: grafana-storage
        env:
        - name: INFLUXDB_HOST
          value: monitoring-influxdb
        - name: GF_SERVER_HTTP_PORT
          value: "3000"
          # The following env variables are required to make Grafana accessible via
          # the kubernetes api-server proxy. On production clusters, we recommend
          # removing these env variables, setup auth for grafana, and expose the grafana
          # service using a LoadBalancer or a public IP.
        - name: GF_AUTH_BASIC_ENABLED
          value: "false"
        - name: GF_AUTH_ANONYMOUS_ENABLED
          value: "true"
        - name: GF_AUTH_ANONYMOUS_ORG_ROLE
          value: Admin
        - name: GF_SERVER_ROOT_URL
          # If you're only using the API Server proxy, set this value instead:
          # value: /api/v1/namespaces/kube-system/services/monitoring-grafana/proxy
          value: /
      volumes:
      - name: ca-certificates
        hostPath:
          path: /etc/ssl/certs
      - name: grafana-storage
        emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
  labels:
    # For use as a Cluster add-on (https://github.com/kubernetes/kubernetes/tree/master/cluster/addons)
    # If you are NOT using this as an addon, you should comment out this line.
    kubernetes.io/cluster-service: 'true'
    kubernetes.io/name: monitoring-grafana
  name: monitoring-grafana
  namespace: kube-system
spec:
  # In a production setup, we recommend accessing Grafana through an external Loadbalancer
  # or through a public IP.
  # type: LoadBalancer
  # You could also use NodePort to expose the service at a randomly-generated port
  type: NodePort
  ports:
  - nodePort: 30108
    port: 80
    targetPort: 3000
  selector:
    k8s-app: grafana

雖然Heapster已經預先配置好了Grafana的Datasource和Dashboard，可是爲了方便訪問，這裏咱們使用NotePort暴露monitoring-grafana服務在主機的30108上，那麼Grafana服務端的地址：http://registry.wuling.com:30108 ，經過瀏覽器訪問，爲Grafana修改數據源，以下： 標紅的地方，爲上一步記錄下的InfluxDB服務端的地址。git

2.二、建立Heapster資源對象

apiVersion: v1
kind: ServiceAccount
metadata:
  name: heapster
  namespace: kube-system
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: heapster
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      task: monitoring
      k8s-app: heapster
  template:
    metadata:
      labels:
        task: monitoring
        k8s-app: heapster
    spec:
      serviceAccountName: heapster
      containers:
      - name: heapster
        image: k8s.gcr.io/heapster-amd64:v1.4.2
        imagePullPolicy: IfNotPresent
        command:
        - /heapster
        - --source=kubernetes:https://kubernetes.default 
        - --sink=influxdb:http://150.109.39.33:31001  # 這裏填寫剛剛記錄下的InfluxDB服務端的地址。
---
apiVersion: v1
kind: Service
metadata:
  labels:
    task: monitoring
    # For use as a Cluster add-on (https://github.com/kubernetes/kubernetes/tree/master/cluster/addons)
    # If you are NOT using this as an addon, you should comment out this line.
    kubernetes.io/cluster-service: 'true'
    kubernetes.io/name: Heapster
  name: heapster
  namespace: kube-system
spec:
  ports:
  - port: 80
    targetPort: 8082
  selector:
    k8s-app: heapster

--source 爲heapster指定獲取集羣信息的數據源。參考：https://github.com/kubernetes/heapster/blob/master/docs/source-configuration.md --sink 爲heaster指定後端存儲，這裏咱們使用InfluxDB，其餘的，請參考：https://github.com/kubernetes/heapster/blob/master/docs/sink-owners.md 這裏heapster留下了一個的坑，請繼續往下看，當我部署完heapster，查看Heapster容器組的標準輸出：不少人都覺得是https或者k8s配置的問題，因而去就慌忙的去配置InSecure http方式，致使坑愈來愈深，透明度愈來愈低，更是無從下手，我也是這樣弄了好久，都較上勁了，此處省略一萬字。。。，當這些路子都走遍了，再次品讀下面的原文：才發現是權限的問題，heaster默認使用一個令牌(Token)與ApiServer進行認證，經過查看heapster.yml發現 serviceAccountName: heapster ，如今明白了吧，就是heaster沒有權限，那麼如何受權呢-----給heaster綁定一個有權限的角色就好了，以下：github

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: heapster
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: heapster
  namespace: kube-system

當建立heapster資源的時候，直接把這段代碼加上，就好了。數據庫

三、從不一樣維度查看應用程序性能指標

在k8s集羣中，應用程序的性能指標，須要從不一樣的維度(containers, pods, services, and whole clusters)進行統計。以便於使用戶深刻了解他們的應用程序是如何執行的以及可能出現的應用程序瓶頸。後端

3.一、經過dashboard查看集羣概況

整個監控方案部署成功後，從上圖能夠看到，在不一樣粒度/維度下，dashboard上能夠呈現對象的具體CPU和內存使用率。api

3.二、經過Grafana查看集羣詳情(cpu、memory、filesystem、network)

經過Grafana能夠查看某個Node或Pod的全部資源使用率，包括集羣節點、不一樣NameSpace下的單個Pod等，一部分截圖以下所示：從上面能夠看到，Heapster無縫銜接Grafana，提供了完美的數據展現，很直觀、友好。咱們也能夠學習 Grafana 來自定製出更美觀和知足特定業務需求的Dashboard。瀏覽器

四、總結

本篇咱們詳解了k8s原生的監控方案，它主要監控的是pod和node，對於kubernetes其餘組件(API Server、Scheduler、Controller Manager等)的監控顯得力不從心，而prometheus(一套開源的監控&報警&時間序列數據庫的組合)功能更全面，後面有時間會進行實戰。監控是一個很是大的話題，監控的目的是爲預警，預警的目的是爲了指導系統自愈。只有把監控=》預警 =》自愈三個環節都完成了，實現自動對應用程序性能和故障管理，纔算得上是一個真正意義的應用程序性能管理系統(APM)，因此這個系列會一直朝着這個目標努力下去，請你們繼續關注。若是有什麼好的想法，歡迎評論區交流。網絡

延伸閱讀

https://github.com/kubernetes/heapster

若是你以爲本篇文章對您有幫助的話，感謝您的【推薦】。 若是你對 kubernets 感興趣的話能夠關注我，我會按期的在博客分享個人學習心得。