K8S(13)監控實戰-部署prometheus

k8s監控實戰-部署prometheus

1 prometheus前言相關

因爲docker容器的特殊性,傳統的zabbix沒法對k8s集羣內的docker狀態進行監控,因此須要使用prometheus來進行監控
prometheus官網:官網地址java

1.1 Prometheus的特色

  • 多維度數據模型,使用時間序列數據庫TSDB而不使用mysql。
  • 靈活的查詢語言PromQL。
  • 不依賴分佈式存儲,單個服務器節點是自主的。
  • 主要基於HTTP的pull方式主動採集時序數據
  • 也可經過pushgateway獲取主動推送到網關的數據。
  • 經過服務發現或者靜態配置來發現目標服務對象。
  • 支持多種多樣的圖表和界面展現,好比Grafana等。

1.2 基本原理

1.2.1 原理說明

Prometheus的基本原理是經過各類exporter提供的HTTP協議接口
週期性抓取被監控組件的狀態,任意組件只要提供對應的HTTP接口就能夠接入監控。
不須要任何SDK或者其餘的集成過程,很是適合作虛擬化環境監控系統,好比VM、Docker、Kubernetes等。
互聯網公司經常使用的組件大部分都有exporter能夠直接使用,如Nginx、MySQL、Linux系統信息等。node

1.2.2 架構圖:

img

1.2.3 三大套件

  • Server 主要負責數據採集和存儲,提供PromQL查詢語言的支持。
  • Alertmanager 警告管理器,用來進行報警。
  • Push Gateway 支持臨時性Job主動推送指標的中間網關。

1.2.4 架構服務過程

  1. Prometheus Daemon負責定時去目標上抓取metrics(指標)數據
    每一個抓取目標須要暴露一個http服務的接口給它定時抓取。
    支持經過配置文件、文本文件、Zookeeper、DNS SRV Lookup等方式指定抓取目標。
  2. PushGateway用於Client主動推送metrics到PushGateway
    而Prometheus只是定時去Gateway上抓取數據。
    適合一次性、短生命週期的服務
  3. Prometheus在TSDB數據庫存儲抓取的全部數據
    經過必定規則進行清理和整理數據,並把獲得的結果存儲到新的時間序列中。
  4. Prometheus經過PromQL和其餘API可視化地展現收集的數據。
    支持Grafana、Promdash等方式的圖表數據可視化。
    Prometheus還提供HTTP API的查詢方式,自定義所須要的輸出。
  5. Alertmanager是獨立於Prometheus的一個報警組件
    支持Prometheus的查詢語句,提供十分靈活的報警方式。

1.2.5 經常使用的exporter

prometheus不一樣於zabbix,沒有agent,使用的是針對不一樣服務的exporter
正常狀況下,監控k8s集羣及node,pod,經常使用的exporter有四個:mysql

  • kube-state-metrics
    收集k8s集羣master&etcd等基本狀態信息
  • node-exporter
    收集k8s集羣node信息
  • cadvisor
    收集k8s集羣docker容器內部使用資源信息
  • blackbox-exporte
    收集k8s集羣docker容器服務是否存活

2 部署4個exporter

老套路,下載docker鏡像,準備資源配置清單,應用資源配置清單:web

2.1 部署kube-state-metrics

2.1.1 準備docker鏡像

docker pull quay.io/coreos/kube-state-metrics:v1.5.0
docker tag  91599517197a harbor.zq.com/public/kube-state-metrics:v1.5.0
docker push harbor.zq.com/public/kube-state-metrics:v1.5.0

準備目錄sql

mkdir /data/k8s-yaml/kube-state-metrics
cd /data/k8s-yaml/kube-state-metrics

2.1.2 準備rbac資源清單

cat >rbac.yaml <<'EOF'
apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    addonmanager.kubernetes.io/mode: Reconcile
    kubernetes.io/cluster-service: "true"
  name: kube-state-metrics
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    addonmanager.kubernetes.io/mode: Reconcile
    kubernetes.io/cluster-service: "true"
  name: kube-state-metrics
rules:
- apiGroups:
  - ""
  resources:
  - configmaps
  - secrets
  - nodes
  - pods
  - services
  - resourcequotas
  - replicationcontrollers
  - limitranges
  - persistentvolumeclaims
  - persistentvolumes
  - namespaces
  - endpoints
  verbs:
  - list
  - watch
- apiGroups:
  - policy
  resources:
  - poddisruptionbudgets
  verbs:
  - list
  - watch
- apiGroups:
  - extensions
  resources:
  - daemonsets
  - deployments
  - replicasets
  verbs:
  - list
  - watch
- apiGroups:
  - apps
  resources:
  - statefulsets
  verbs:
  - list
  - watch
- apiGroups:
  - batch
  resources:
  - cronjobs
  - jobs
  verbs:
  - list
  - watch
- apiGroups:
  - autoscaling
  resources:
  - horizontalpodautoscalers
  verbs:
  - list
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    addonmanager.kubernetes.io/mode: Reconcile
    kubernetes.io/cluster-service: "true"
  name: kube-state-metrics
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: kube-state-metrics
subjects:
- kind: ServiceAccount
  name: kube-state-metrics
  namespace: kube-system
EOF

2.1.3 準備Dp資源清單

cat >dp.yaml <<'EOF'
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "2"
  labels:
    grafanak8sapp: "true"
    app: kube-state-metrics
  name: kube-state-metrics
  namespace: kube-system
spec:
  selector:
    matchLabels:
      grafanak8sapp: "true"
      app: kube-state-metrics
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      labels:
        grafanak8sapp: "true"
        app: kube-state-metrics
    spec:
      containers:
      - name: kube-state-metrics
        image: harbor.zq.com/public/kube-state-metrics:v1.5.0
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 8080
          name: http-metrics
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /healthz
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 5
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
      serviceAccountName: kube-state-metrics
EOF

2.1.4 應用資源配置清單

任意node節點執行docker

kubectl apply -f http://k8s-yaml.zq.com/kube-state-metrics/rbac.yaml
kubectl apply -f http://k8s-yaml.zq.com/kube-state-metrics/dp.yaml

驗證測試數據庫

kubectl get pod -n kube-system -o wide|grep kube-state-metrices
~]# curl http://172.7.21.4:8080/healthz
ok

返回OK表示已經成功運行。vim

2.2 部署node-exporter

因爲node-exporter是監控node的,須要每一個節點啓動一個,因此使用ds控制器api

2.2.1 準備docker鏡像

docker pull prom/node-exporter:v0.15.0
docker tag 12d51ffa2b22 harbor.zq.com/public/node-exporter:v0.15.0
docker push harbor.zq.com/public/node-exporter:v0.15.0

準備目錄瀏覽器

mkdir /data/k8s-yaml/node-exporter
cd /data/k8s-yaml/node-exporter

2.2.2 準備ds資源清單

cat >ds.yaml <<'EOF'
kind: DaemonSet
apiVersion: extensions/v1beta1
metadata:
  name: node-exporter
  namespace: kube-system
  labels:
    daemon: "node-exporter"
    grafanak8sapp: "true"
spec:
  selector:
    matchLabels:
      daemon: "node-exporter"
      grafanak8sapp: "true"
  template:
    metadata:
      name: node-exporter
      labels:
        daemon: "node-exporter"
        grafanak8sapp: "true"
    spec:
      volumes:
      - name: proc
        hostPath: 
          path: /proc
          type: ""
      - name: sys
        hostPath:
          path: /sys
          type: ""
      containers:
      - name: node-exporter
        image: harbor.zq.com/public/node-exporter:v0.15.0
        imagePullPolicy: IfNotPresent
        args:
        - --path.procfs=/host_proc
        - --path.sysfs=/host_sys
        ports:
        - name: node-exporter
          hostPort: 9100
          containerPort: 9100
          protocol: TCP
        volumeMounts:
        - name: sys
          readOnly: true
          mountPath: /host_sys
        - name: proc
          readOnly: true
          mountPath: /host_proc
      hostNetwork: true
EOF

主要用途就是將宿主機的/proc,sys目錄掛載給容器,是容器能獲取node節點宿主機信息

2.2.3 應用資源配置清單:

任意node節點

kubectl apply -f http://k8s-yaml.zq.com/node-exporter/ds.yaml
kubectl get pod -n kube-system -o wide|grep node-exporter

2.3 部署cadvisor

2.3.1 準備docker鏡像

docker pull google/cadvisor:v0.28.3
docker tag 75f88e3ec333 harbor.zq.com/public/cadvisor:0.28.3
docker push harbor.zq.com/public/cadvisor:0.28.3

準備目錄

mkdir /data/k8s-yaml/cadvisor
cd /data/k8s-yaml/cadvisor

2.3.2 準備ds資源清單

cadvisor因爲要獲取每一個node上的pod信息,所以也須要使用daemonset方式運行

cat >ds.yaml <<'EOF'
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: cadvisor
  namespace: kube-system
  labels:
    app: cadvisor
spec:
  selector:
    matchLabels:
      name: cadvisor
  template:
    metadata:
      labels:
        name: cadvisor
    spec:
      hostNetwork: true
#------pod的tolerations與node的Taints配合,作POD指定調度----
      tolerations:
      - key: node-role.kubernetes.io/master
        effect: NoSchedule
#-------------------------------------
      containers:
      - name: cadvisor
        image: harbor.zq.com/public/cadvisor:v0.28.3
        imagePullPolicy: IfNotPresent
        volumeMounts:
        - name: rootfs
          mountPath: /rootfs
          readOnly: true
        - name: var-run
          mountPath: /var/run
        - name: sys
          mountPath: /sys
          readOnly: true
        - name: docker
          mountPath: /var/lib/docker
          readOnly: true
        ports:
          - name: http
            containerPort: 4194
            protocol: TCP
        readinessProbe:
          tcpSocket:
            port: 4194
          initialDelaySeconds: 5
          periodSeconds: 10
        args:
          - --housekeeping_interval=10s
          - --port=4194
      terminationGracePeriodSeconds: 30
      volumes:
      - name: rootfs
        hostPath:
          path: /
      - name: var-run
        hostPath:
          path: /var/run
      - name: sys
        hostPath:
          path: /sys
      - name: docker
        hostPath:
          path: /data/docker
EOF

2.3.3 應用資源配置清單:

應用清單前,先在每一個node上作如下軟鏈接,不然服務可能報錯

mount -o remount,rw /sys/fs/cgroup/
ln -s /sys/fs/cgroup/cpu,cpuacct /sys/fs/cgroup/cpuacct,cpu

應用清單

kubectl apply -f http://k8s-yaml.zq.com/cadvisor/ds.yaml

檢查:

kubectl -n kube-system get pod -o wide|grep cadvisor

2.4 部署blackbox-exporter

2.4.1 準備docker鏡像

docker pull prom/blackbox-exporter:v0.15.1
docker tag  81b70b6158be  harbor.zq.com/public/blackbox-exporter:v0.15.1
docker push harbor.zq.com/public/blackbox-exporter:v0.15.1

準備目錄

mkdir /data/k8s-yaml/blackbox-exporter
cd /data/k8s-yaml/blackbox-exporter

2.4.2 準備cm資源清單

cat >cm.yaml <<'EOF'
apiVersion: v1
kind: ConfigMap
metadata:
  labels:
    app: blackbox-exporter
  name: blackbox-exporter
  namespace: kube-system
data:
  blackbox.yml: |-
    modules:
      http_2xx:
        prober: http
        timeout: 2s
        http:
          valid_http_versions: ["HTTP/1.1", "HTTP/2"]
          valid_status_codes: [200,301,302]
          method: GET
          preferred_ip_protocol: "ip4"
      tcp_connect:
        prober: tcp
        timeout: 2s
EOF

2.4.3 準備dp資源清單

cat >dp.yaml <<'EOF'
kind: Deployment
apiVersion: extensions/v1beta1
metadata:
  name: blackbox-exporter
  namespace: kube-system
  labels:
    app: blackbox-exporter
  annotations:
    deployment.kubernetes.io/revision: 1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: blackbox-exporter
  template:
    metadata:
      labels:
        app: blackbox-exporter
    spec:
      volumes:
      - name: config
        configMap:
          name: blackbox-exporter
          defaultMode: 420
      containers:
      - name: blackbox-exporter
        image: harbor.zq.com/public/blackbox-exporter:v0.15.1
        imagePullPolicy: IfNotPresent
        args:
        - --config.file=/etc/blackbox_exporter/blackbox.yml
        - --log.level=info
        - --web.listen-address=:9115
        ports:
        - name: blackbox-port
          containerPort: 9115
          protocol: TCP
        resources:
          limits:
            cpu: 200m
            memory: 256Mi
          requests:
            cpu: 100m
            memory: 50Mi
        volumeMounts:
        - name: config
          mountPath: /etc/blackbox_exporter
        readinessProbe:
          tcpSocket:
            port: 9115
          initialDelaySeconds: 5
          timeoutSeconds: 5
          periodSeconds: 10
          successThreshold: 1
          failureThreshold: 3
EOF

2.4.4 準備svc資源清單

cat >svc.yaml <<'EOF'
kind: Service
apiVersion: v1
metadata:
  name: blackbox-exporter
  namespace: kube-system
spec:
  selector:
    app: blackbox-exporter
  ports:
    - name: blackbox-port
      protocol: TCP
      port: 9115
EOF

2.4.5 準備ingress資源清單

cat >ingress.yaml <<'EOF'
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: blackbox-exporter
  namespace: kube-system
spec:
  rules:
  - host: blackbox.zq.com
    http:
      paths:
      - path: /
        backend:
          serviceName: blackbox-exporter
          servicePort: blackbox-port
EOF

2.4.6 添加域名解析

這裏用到了一個域名,添加解析

vi /var/named/zq.com.zone
blackbox       A    10.4.7.10

systemctl restart named

2.4.7 應用資源配置清單

kubectl apply -f http://k8s-yaml.zq.com/blackbox-exporter/cm.yaml
kubectl apply -f http://k8s-yaml.zq.com/blackbox-exporter/dp.yaml
kubectl apply -f http://k8s-yaml.zq.com/blackbox-exporter/svc.yaml
kubectl apply -f http://k8s-yaml.zq.com/blackbox-exporter/ingress.yaml

2.4.8 訪問域名測試

訪問http://blackbox.zq.com,顯示以下界面,表示blackbox已經運行成
mark

3 部署prometheus server

3.1 準備prometheus server環境

3.1.1 準備docker鏡像

docker pull prom/prometheus:v2.14.0
docker tag  7317640d555e harbor.zq.com/infra/prometheus:v2.14.0
docker push harbor.zq.com/infra/prometheus:v2.14.0

準備目錄

mkdir /data/k8s-yaml/prometheus-server
cd /data/k8s-yaml/prometheus-server

3.1.2 準備rbac資源清單

cat >rbac.yaml <<'EOF'
apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    addonmanager.kubernetes.io/mode: Reconcile
    kubernetes.io/cluster-service: "true"
  name: prometheus
  namespace: infra
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    addonmanager.kubernetes.io/mode: Reconcile
    kubernetes.io/cluster-service: "true"
  name: prometheus
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  - nodes/metrics
  - services
  - endpoints
  - pods
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - configmaps
  verbs:
  - get
- nonResourceURLs:
  - /metrics
  verbs:
  - get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    addonmanager.kubernetes.io/mode: Reconcile
    kubernetes.io/cluster-service: "true"
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: infra
EOF

3.1.3 準備dp資源清單

加上--web.enable-lifecycle啓用遠程熱加載配置文件,配置文件改變後不用重啓prometheus
調用指令是curl -X POST http://localhost:9090/-/reload
storage.tsdb.min-block-duration=10m只加載10分鐘數據到內
storage.tsdb.retention=72h 保留72小時數據

cat >dp.yaml <<'EOF'
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "5"
  labels:
    name: prometheus
  name: prometheus
  namespace: infra
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 7
  selector:
    matchLabels:
      app: prometheus
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      containers:
      - name: prometheus
        image: harbor.zq.com/infra/prometheus:v2.14.0
        imagePullPolicy: IfNotPresent
        command:
        - /bin/prometheus
        args:
        - --config.file=/data/etc/prometheus.yml
        - --storage.tsdb.path=/data/prom-db
        - --storage.tsdb.min-block-duration=10m
        - --storage.tsdb.retention=72h
        - --web.enable-lifecycle
        ports:
        - containerPort: 9090
          protocol: TCP
        volumeMounts:
        - mountPath: /data
          name: data
        resources:
          requests:
            cpu: "1000m"
            memory: "1.5Gi"
          limits:
            cpu: "2000m"
            memory: "3Gi"
      imagePullSecrets:
      - name: harbor
      securityContext:
        runAsUser: 0
      serviceAccountName: prometheus
      volumes:
      - name: data
        nfs:
          server: hdss7-200
          path: /data/nfs-volume/prometheus
EOF

3.1.4 準備svc資源清單

cat >svc.yaml <<'EOF'
apiVersion: v1
kind: Service
metadata:
  name: prometheus
  namespace: infra
spec:
  ports:
  - port: 9090
    protocol: TCP
    targetPort: 9090
  selector:
    app: prometheus
EOF

3.1.5 準備ingress資源清單

cat >ingress.yaml <<'EOF'
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: traefik
  name: prometheus
  namespace: infra
spec:
  rules:
  - host: prometheus.zq.com
    http:
      paths:
      - path: /
        backend:
          serviceName: prometheus
          servicePort: 9090
EOF

3.1.6 添加域名解析

這裏用到一個域名prometheus.zq.com,添加解析:

vi /var/named/od.com.zone
prometheus         A    10.4.7.10

systemctl restart named

3.2 部署prometheus server

3.2.1 準備目錄和證書

mkdir -p /data/nfs-volume/prometheus/etc
mkdir -p /data/nfs-volume/prometheus/prom-db
cd /data/nfs-volume/prometheus/etc/

# 拷貝配置文件中用到的證書:
cp /opt/certs/ca.pem ./
cp /opt/certs/client.pem ./
cp /opt/certs/client-key.pem ./

3.2.2 建立prometheus配置文件

配置文件說明:
此配置爲通用配置,除第一個jobetcd是作的靜態配置外,其餘8個job都是作的自動發現
所以只須要修改etcd的配置後,就能夠直接用於生產環境

cat >/data/nfs-volume/prometheus/etc/prometheus.yml <<'EOF'
global:
  scrape_interval:     15s
  evaluation_interval: 15s
scrape_configs:
- job_name: 'etcd'
  tls_config:
    ca_file: /data/etc/ca.pem
    cert_file: /data/etc/client.pem
    key_file: /data/etc/client-key.pem
  scheme: https
  static_configs:
  - targets:
    - '10.4.7.12:2379'
    - '10.4.7.21:2379'
    - '10.4.7.22:2379'
    
- job_name: 'kubernetes-apiservers'
  kubernetes_sd_configs:
  - role: endpoints
  scheme: https
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  relabel_configs:
  - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
    action: keep
    regex: default;kubernetes;https
    
- job_name: 'kubernetes-pods'
  kubernetes_sd_configs:
  - role: pod
  relabel_configs:
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
    action: keep
    regex: true
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
    action: replace
    target_label: __metrics_path__
    regex: (.+)
  - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
    action: replace
    regex: ([^:]+)(?::\d+)?;(\d+)
    replacement: $1:$2
    target_label: __address__
  - action: labelmap
    regex: __meta_kubernetes_pod_label_(.+)
  - source_labels: [__meta_kubernetes_namespace]
    action: replace
    target_label: kubernetes_namespace
  - source_labels: [__meta_kubernetes_pod_name]
    action: replace
    target_label: kubernetes_pod_name
    
- job_name: 'kubernetes-kubelet'
  kubernetes_sd_configs:
  - role: node
  relabel_configs:
  - action: labelmap
    regex: __meta_kubernetes_node_label_(.+)
  - source_labels: [__meta_kubernetes_node_name]
    regex: (.+)
    target_label: __address__
    replacement: ${1}:10255
    
- job_name: 'kubernetes-cadvisor'
  kubernetes_sd_configs:
  - role: node
  relabel_configs:
  - action: labelmap
    regex: __meta_kubernetes_node_label_(.+)
  - source_labels: [__meta_kubernetes_node_name]
    regex: (.+)
    target_label: __address__
    replacement: ${1}:4194
    
- job_name: 'kubernetes-kube-state'
  kubernetes_sd_configs:
  - role: pod
  relabel_configs:
  - action: labelmap
    regex: __meta_kubernetes_pod_label_(.+)
  - source_labels: [__meta_kubernetes_namespace]
    action: replace
    target_label: kubernetes_namespace
  - source_labels: [__meta_kubernetes_pod_name]
    action: replace
    target_label: kubernetes_pod_name
  - source_labels: [__meta_kubernetes_pod_label_grafanak8sapp]
    regex: .*true.*
    action: keep
  - source_labels: ['__meta_kubernetes_pod_label_daemon', '__meta_kubernetes_pod_node_name']
    regex: 'node-exporter;(.*)'
    action: replace
    target_label: nodename
    
- job_name: 'blackbox_http_pod_probe'
  metrics_path: /probe
  kubernetes_sd_configs:
  - role: pod
  params:
    module: [http_2xx]
  relabel_configs:
  - source_labels: [__meta_kubernetes_pod_annotation_blackbox_scheme]
    action: keep
    regex: http
  - source_labels: [__address__, __meta_kubernetes_pod_annotation_blackbox_port,  __meta_kubernetes_pod_annotation_blackbox_path]
    action: replace
    regex: ([^:]+)(?::\d+)?;(\d+);(.+)
    replacement: $1:$2$3
    target_label: __param_target
  - action: replace
    target_label: __address__
    replacement: blackbox-exporter.kube-system:9115
  - source_labels: [__param_target]
    target_label: instance
  - action: labelmap
    regex: __meta_kubernetes_pod_label_(.+)
  - source_labels: [__meta_kubernetes_namespace]
    action: replace
    target_label: kubernetes_namespace
  - source_labels: [__meta_kubernetes_pod_name]
    action: replace
    target_label: kubernetes_pod_name
    
- job_name: 'blackbox_tcp_pod_probe'
  metrics_path: /probe
  kubernetes_sd_configs:
  - role: pod
  params:
    module: [tcp_connect]
  relabel_configs:
  - source_labels: [__meta_kubernetes_pod_annotation_blackbox_scheme]
    action: keep
    regex: tcp
  - source_labels: [__address__, __meta_kubernetes_pod_annotation_blackbox_port]
    action: replace
    regex: ([^:]+)(?::\d+)?;(\d+)
    replacement: $1:$2
    target_label: __param_target
  - action: replace
    target_label: __address__
    replacement: blackbox-exporter.kube-system:9115
  - source_labels: [__param_target]
    target_label: instance
  - action: labelmap
    regex: __meta_kubernetes_pod_label_(.+)
  - source_labels: [__meta_kubernetes_namespace]
    action: replace
    target_label: kubernetes_namespace
  - source_labels: [__meta_kubernetes_pod_name]
    action: replace
    target_label: kubernetes_pod_name
    
- job_name: 'traefik'
  kubernetes_sd_configs:
  - role: pod
  relabel_configs:
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
    action: keep
    regex: traefik
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
    action: replace
    target_label: __metrics_path__
    regex: (.+)
  - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
    action: replace
    regex: ([^:]+)(?::\d+)?;(\d+)
    replacement: $1:$2
    target_label: __address__
  - action: labelmap
    regex: __meta_kubernetes_pod_label_(.+)
  - source_labels: [__meta_kubernetes_namespace]
    action: replace
    target_label: kubernetes_namespace
  - source_labels: [__meta_kubernetes_pod_name]
    action: replace
    target_label: kubernetes_pod_name
EOF

3.2.3 應用資源配置清單

kubectl apply -f http://k8s-yaml.zq.com/prometheus-server/rbac.yaml
kubectl apply -f http://k8s-yaml.zq.com/prometheus-server/dp.yaml
kubectl apply -f http://k8s-yaml.zq.com/prometheus-server/svc.yaml
kubectl apply -f http://k8s-yaml.zq.com/prometheus-server/ingress.yaml

3.2.4 瀏覽器驗證

訪問http://prometheus.zq.com,若是能成功訪問的話,表示啓動成功
點擊status->configuration就是咱們的配置文件
img

4 使服務能被prometheus自動監控

點擊status->targets,展現的就是咱們在prometheus.yml中配置的job-name,這些targets基本能夠知足咱們收集數據的需求。
mark

5個編號的job-name已經被發現並獲取數據
接下來就須要將剩下的4個ob-name對應的服務歸入監控
歸入監控的方式是給須要收集數據的服務添加annotations

4.1 讓traefik能被自動監控

4.1.1 修改traefik的yaml

修改fraefik的yaml文件,跟labels同級,添加annotations配置

vim /data/k8s-yaml/traefik/ds.yaml
........
spec:
  template:
    metadata:
      labels:
        k8s-app: traefik-ingress
        name: traefik-ingress
#--------增長內容--------
      annotations:
        prometheus_io_scheme: "traefik"
        prometheus_io_path: "/metrics"
        prometheus_io_port: "8080"
#--------增長結束--------
    spec:
      serviceAccountName: traefik-ingress-controller
........

任意節點從新應用配置

kubectl delete -f http://k8s-yaml.zq.com/traefik/ds.yaml
kubectl apply  -f http://k8s-yaml.zq.com/traefik/ds.yaml

4.1.2 應用配置查看

等待pod重啓之後,再在prometheus上查看traefik是否能正常獲取數據了
mark

4.2 用blackbox檢測TCP/HTTP服務狀態

blackbox是檢測容器內服務存活性的,也就是端口健康狀態檢查,分爲tcp和http兩種方法
能用http的狀況儘可能用http,沒有提供http接口的服務才用tcp

4.2.1 被檢測服務準備

使用測試環境的dubbo服務來作演示,其餘環境相似

  1. dashboard中開啓apollo-portal和test空間中的apollo
  2. dubbo-demo-service使用tcp的annotation
  3. dubbo-demo-consumer使用HTTP的annotation

4.2.2 添加tcp的annotation

等兩個服務起來之後,首先在dubbo-demo-service資源中添加一個TCP的annotation

vim /data/k8s-yaml/test/dubbo-demo-server/dp.yaml
......
spec:
......
  template:
    metadata:
      labels:
        app: dubbo-demo-service
        name: dubbo-demo-service
#--------增長內容--------
      annotations:
        blackbox_port: "20880"
        blackbox_scheme: "tcp"
#--------增長結束--------
    spec:
      containers:
        image: harbor.zq.com/app/dubbo-demo-service:apollo_200512_0746

任意節點從新應用配置

kubectl delete -f http://k8s-yaml.zq.com/test/dubbo-demo-server/dp.yaml
kubectl apply  -f http://k8s-yaml.zq.com/test/dubbo-demo-server/dp.yaml

瀏覽器中查看http://blackbox.zq.com/和http://prometheus.zq.com/targets
咱們運行的dubbo-demo-server服務,tcp端口20880已經被發現並在監控中
mark

4.2.3 添加http的annotation

接下來在dubbo-demo-consumer資源中添加一個HTTP的annotation:

vim /data/k8s-yaml/test/dubbo-demo-consumer/dp.yaml 
spec:
......
  template:
    metadata:
      labels:
        app: dubbo-demo-consumer
        name: dubbo-demo-consumer
#--------增長內容--------
      annotations:
        blackbox_path: "/hello?name=health"
        blackbox_port: "8080"
        blackbox_scheme: "http"
#--------增長結束--------
    spec:
      containers:
      - name: dubbo-demo-consumer
......

任意節點從新應用配置

kubectl delete -f http://k8s-yaml.zq.com/test/dubbo-demo-consumer/dp.yaml
kubectl apply  -f http://k8s-yaml.zq.com/test/dubbo-demo-consumer/dp.yaml

mark

4.3 添加監控jvm信息

dubbo-demo-service和dubbo-demo-consumer都添加下列annotation註解,以便監控pod中的jvm信息

vim /data/k8s-yaml/test/dubbo-demo-server/dp.yaml
vim /data/k8s-yaml/test/dubbo-demo-consumer/dp.yaml 

      annotations:
        #....已有略....
        prometheus_io_scrape: "true"
        prometheus_io_port: "12346"
        prometheus_io_path: "/"

12346是dubbo的POD啓動命令中使用jmx_javaagent用到的端口,所以能夠用來收集jvm信息

任意節點從新應用配置

kubectl apply  -f http://k8s-yaml.zq.com/test/dubbo-demo-server/dp.yaml
kubectl apply  -f http://k8s-yaml.zq.com/test/dubbo-demo-consumer/dp.yaml

img

至此,全部9個服務,都獲取了數據

相關文章
相關標籤/搜索