1、介紹prometheus-operator 2、查看配置rbac受權 3、helm安裝prometheus-operator 4、配置監控k8s組件 5、granafa添加新數據源 6、監控mysql
7、alertmanager配置 最後、卸載prometheus-operator
1、概述html
The Prometheus resource 聲明性地描述了Prometheus deployment所需的狀態,而ServiceMonitor描述了由Prometheus 監視的目標集前端
Servicenode
ServiceMonitormysql
經過selector匹配service。ps:這裏的team:frontend,下面會說起到。經過標籤選擇endpoints,實現動態發現服務linux
port:web #對應service的端口名git
Prometheusgithub
經過matchLabels匹配ServiceMonitor的標籤web
規則綁定:經過ruleSelector(匹配標籤 prometheus:service-prometheus)選擇PrometheusRule裏面的labels prometheus:service-prometheussql
PrometheusRulejson
規則配置
上面的架構配置後,使得前端團隊可以建立新的servicemonitor和serive,從而容許對Prometheus進行動態從新配置
Altertmanager
apiVersion: monitoring.coreos.com/v1 kind: Alertmanager metadata: generation: 1 labels: app: prometheus-operator-alertmanager chart: prometheus-operator-0.1.27 heritage: Tiller release: my-release name: my-release-prometheus-oper-alertmanager namespace: default spec: baseImage: quay.io/prometheus/alertmanager externalUrl: http://my-release-prometheus-oper-alertmanager.default:9093 listenLocal: false logLevel: info paused: false replicas: 1 retention: 120h routePrefix: / serviceAccountName: my-release-prometheus-oper-alertmanager version: v0.15.2
2、查看配置rbac受權(默認下面的不用配置)
若是激活了RBAC受權,則必須爲prometheus和prometheus-operator建立RBAC規則,爲prometheus-operator建立了一個ClusterRole和一個ClusterRoleBinding。
2.1 爲prometheus sa賦予相關權限
apiVersion: v1 kind: ServiceAccount metadata: name: prometheus apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRole metadata: name: prometheus rules: - apiGroups: [""] resources: - nodes - services - endpoints - pods verbs: ["get", "list", "watch"] - apiGroups: [""] resources: - configmaps verbs: ["get"] - nonResourceURLs: ["/metrics"] verbs: ["get"] apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRoleBinding metadata: name: prometheus roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: prometheus subjects: - kind: ServiceAccount name: prometheus namespace: default
2.2爲prometheus-operator sa賦予相關權限,詳細參考官方文檔,這裏就補貼出來了
https://coreos.com/operators/prometheus/docs/latest/user-guides/getting-started.html
3、經過helm安裝prometheus-operator
github官方連接
https://github.com/helm/charts/tree/master/stable/prometheus-operator
安裝命令
$ helm install --name my-release stable/prometheus-operator
安裝指定參數,好比prometheus的serivce type改成nodeport,默認爲ClusterIP,(prometheus-operator service文件 官方的文檔設置了cluster:None致使不能直接修改,辦法是部署後,再經過kubectl -f service.yaml實現修改成nodeport)
$ helm install --name my-release stable/prometheus-operator --set prometheus.service.type=NodePort --set prometheus.service.nodePort=30090
或者安裝指定yaml文件
$ helm install --name my-release stable/prometheus-operator -f values1.yaml,values2.yaml
4、配置監控k8s組件
4.1配置監控kubelet(默認沒監控上,由於名字爲kubelet的servicemonitor 使用了http方式訪問endpoint的10255,我在rancher搭建的k8s上是使用https的10250端口),默認配置以下:
參考官方文檔https://coreos.com/operators/prometheus/docs/latest/user-guides/cluster-monitoring.html,修改servicemonitor,以下
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: kubelet
labels:
k8s-app: kubelet
spec:
jobLabel: k8s-app
endpoints: #這裏默認使用http方式,並且沒有使用tls,修改成以下紅色配置
- port: https-metrics scheme: https interval: 30s tlsConfig: insecureSkipVerify: true bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token - port: https-metrics scheme: https path: /metrics/cadvisor interval: 30s honorLabels: true tlsConfig: insecureSkipVerify: true bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
selector:
matchLabels:
k8s-app: kubelet
namespaceSelector:
matchNames:
- kube-system
執行修改kubectl apply -f 上面的文件.yaml
4.2配置監控kube-controller-manager
因爲我這裏部署的kube-controller-manager不是pod形式啓動的,而是直接容器啓動,致使Service selector沒法選擇對應的pod,所以查看Endpoints的配置是沒有subset.ip的,最後致使prometheus的target不能抓取到數據,所以我修改endpoints文件(添加紅色字段的內容,ip改成master運行的主機ip),同時取消Service的selector以下:
kubectl apply -f 上面的文件.yaml
kubectl edit svc my-release-prometheus-oper-kube-scheduler 畫面以下,把紅色的selector刪除,:wq保存
4.3同理配置kube-scheduler,端口改成10252,省略。
4.4配置etcd
Service配置:
ServiceMonitor配置:
4.5jobLabel的做用:
我配置Service的jobLabel爲kube-schedulerservi
target顯示(刷新頁面等待一些時間,纔會看到結果)以下:
5、granafa添加新數據源(默認有一個數據源,爲了區分應用和默認的監控,這裏再添加一個應用的)
5.1定義資源Prometheus
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
labels:
app: prometheus
prometheus: service-prometheus
name: service-prometheus
namespace: monitoring
spec:
....
5.2 查看grafana-datasource configmap默認配置
kubectl get configmap my-release-prometheus-oper-grafana-datasource -o yaml
apiVersion: v1
data:
datasource.yaml: |-
apiVersion: 1
datasources:
- name: service-prometheus
type: prometheus
url: http://service-ip:9090/ #這個沒測試過,有空再研究
access: proxy
isDefault: true
kind: ConfigMa
5.3修改grafana-datasource configmap
6、監控mysql
要修改的默認值以下,values.yaml
mysqlRootPassword: testing mysqlUser: mysqlu mysqlPassword: mysql123 mysqlDatabase: mydb metrics: enabled: true image: prom/mysqld-exporter imageTag: v0.10.0 imagePullPolicy: IfNotPresent resources: {} annotations: {} # prometheus.io/scrape: "true" # prometheus.io/port: "9104" livenessProbe: initialDelaySeconds: 15 timeoutSeconds: 5 readinessProbe: initialDelaySeconds: 5 timeoutSeconds: 1
6.1安裝mysql
helm install --name my-release2 -f values.yaml stable/mysql
6.2建立pv
apiVersion: v1 kind: PersistentVolume metadata: name: my-release2-mysql spec: capacity: storage: 8Gi accessModes: - ReadWriteOnce persistentVolumeReclaimPolicy: Recycle hostPath: path: /data
6.3建立mysql對應ServiceMonitor
apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: labels: app: my-release2-mysql heritage: Tiller release: my-release name: my-release2-mysql namespace: default spec: endpoints: - interval: 15s port: metrics jobLabel: jobLabel namespaceSelector: matchNames: - default selector: matchLabels: app: my-release2-mysql release: my-release2
6.4granafa配置
https://grafana.com/dashboards/6239 ,這裏下載json模版
而後導入granafa,datasource選擇默認的就能夠了。
7、alertmanager配置(默認不用配置)
7.1那prometheus資源如何識別alertmanager呢?那是經過prometheus的字段alerting實現匹配alertmanager service,以下:
prometheus實例
apiVersion: monitoring.coreos.com/v1 kind: Prometheus metadata: labels: app: prometheus-operator-prometheus name: my-release-prometheus-oper-prometheus namespace: default spec: alerting: alertmanagers: - name: my-release-prometheus-oper-alertmanager #匹配名爲my-release-prometheus-alertmanager 的service namespace: default pathPrefix: / port: web ruleSelector: #選擇label爲以下的PrometheusRule matchLabels: app: promethetus-operator release: my-release
alertmanager實例
apiVersion: monitoring.coreos.com/v1 kind: Alertmanager metadata: labels: app: prometheus-operator-alertmanager chart: prometheus-operator-0.1.27 heritage: Tiller release: my-release name: my-release-prometheus-oper-alertmanager #secretname用到這裏的name namespace: default spec: baseImage: quay.io/prometheus/alertmanager externalUrl: http://my-release-prometheus-oper-alertmanager.default:9093 listenLocal: false logLevel: info paused: false replicas: 1 retention: 120h routePrefix: / serviceAccountName: my-release-prometheus-oper-alertmanager version: v0.15.2
7.2 alertmanager實例如何從新讀取alertmanager的配置文件配置呢???是經過prometheus-operator/deployment.yaml裏面的- --config-reloader-image=quay.io/coreos/configmap-reload:v0.0.1實現
secret22.yaml
apiVersion: v1 data: alertmanager.yaml: Z2xvYmFsOgogIHJlc29sdmVfdGltZW91dDogNW0KcmVjZWl2ZXJzOgotIG5hbWU6ICJudWxsIgpyb3V0ZToKICBncm91cF9ieToKICAtIGpvYgogIGdyb3VwX2ludGVydmFsOiA1bQogIGdyb3VwX3dhaXQ6IDMwcwogIHJlY2VpdmVyOiAibnVsbCIKICByZXBlYXRfaW50ZXJ2YWw6IDEyaAogIHJvdXRlczoKICAtIG1hdGNoOgogICAgICBhbGVydG5hbWU6IERlYWRNYW5zU3dpdGNoCiAgICByZWNlaXZlcjogIm51bGwiCg== kind: Secret #這些加密內容是alertmanager的配置參數,在linux能夠經過 echo "上面data序列"|base64 -d 解密 metadata: labels: app: prometheus-operator-alertmanager chart: prometheus-operator-0.1.27 heritage: Tiller release: my-release name: alertmanager-my-release-prometheus-oper-alertmanager #必須爲alertmanager-名字 namespace: default type: Opaque
詳情:https://github.com/helm/charts/blob/master/stable/prometheus-operator/templates/prometheus-operator/deployment.yaml
apiVersion: apps/v1beta2 kind: Deployment metadata: name: my-release-prometheus-oper-operator namespace: default template: spec: containers: - args: - --kubelet-service=kube-system/my-release-prometheus-oper-kubelet - --localhost=127.0.0.1 - --prometheus-config-reloader=quay.io/coreos/prometheus-config-reloader:v0.25.0 - --config-reloader-image=quay.io/coreos/configmap-reload:v0.0.1 #經過這個容器從新加載alertmanager的配置,具體實現官網沒寫 image: quay.io/coreos/prometheus-operator:v0.25.0
PrometheusRule實現規則讀取
all.rules.yaml 參考:https://github.com/helm/charts/blob/master/stable/prometheus-operator/templates/alertmanager/rules/all.rules.yaml
apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: name: prometheus-operator labels: app: prometheus-operator #Prometheus資源的ruleSelector會選擇這個標籤
7.3 重點:從新加載alertmanager配置的操做,以下:
7.3.1:定義alertmanager.yaml文件
global: resolve_timeout: 5m route: group_by: ['job'] group_wait: 30s group_interval: 5m repeat_interval: 12h receiver: 'webhook' receivers: - name: 'webhook' webhook_configs: - url: 'http://alertmanagerwh:30500/'
ps:不能用tab做爲空格,不然會報錯
7.3.2:先刪除再建立名爲alertmanager-{ALERTMANAGER_NAME}的secret(其中{ALERTMANAGER_NAME}對應alertmanager實例名稱,按照上面例子就是my-release-prometheus-oper-alertmanager)
kubectl delete secret alertmanager-my-release-prometheus-oper-alertmanager
kubectl create secret generic alertmanager-my-release-prometheus-oper-alertmanager --from-file=alertmanager.yaml
7.3.3 :查看是否生效
等幾秒鐘中,在alertmanager的ui界面status就能夠看看是否生效了。其餘配置請查看https://prometheus.io/docs/alerting/configuration/
微信告警方法 https://www.cnblogs.com/jiuchongxiao/p/9024211.html
最後、如何卸載prometheus-operator(從新安裝,能夠參考這個)
一、直接經過helm delete刪除
$ helm delete my-release
二、刪除相關crd (helm install的時候自動安裝了crd資源)
kubectl delete crd prometheuses.monitoring.coreos.com
kubectl delete crd prometheusrules.monitoring.coreos.com
kubectl delete crd servicemonitors.monitoring.coreos.com
kubectl delete crd alertmanagers.monitoring.coreos.com
三、刪除helm 上的my-release
helm del --purge my-release
其餘