前文咱們聊到了k8s的apiservice資源結合自定義apiserver擴展原生apiserver功能的相關話題,回顧請參考:http://www.javashuo.com/article/p-vancffxe-nz.html;今天咱們來聊一聊監控k8s集羣相關話題;html
前文咱們使用自定義apiserver metrics server擴展了原生apiserver的功能,讓其原生apiserver可以經過kubectl top node/pod 命令來獲取對應節點或名稱空間下pod的cpu和內存指標數據;這些指標數據在必定程度上可以讓咱們清楚的知道對應pod或節點資源使用狀況,本質上這也是一種監控方式;可是metrics server 採集的數據只有內存和cpu指標數據,在必定程度上不能知足咱們瞭解節點或pod的其餘數據;這樣一來咱們就須要有一款專業的監控系統來幫助咱們監控k8s集羣節點或pod;Prometheus是一款高性能的監控程序,其內部主要有3個組件,Retrieval組件主要負責數據收集工做,它能夠結合外部其餘程序收集數據;TSDB組件主要是用來存儲指標數據,該組件是一個時間序列存儲系統;HttpServer組件主要用來對外提供restful api接口,爲客戶端提供查詢接口;默認監聽在9090端口;node
prometheus監控系統總體topweb
提示:上圖是Prometheus監控系統的top圖;Pushgateway組件相似Prometheus retrieval代理,它主要負責收集主動推送指標數據的pod的指標數據,在Prometheus 監控系統中也有主動監控和被動監控的概念,主動監控是指被監控端主動推送數據到server,被動監控是指被監控端被動等待server來拉去數據,默認狀況Prometheus是工做爲被動監控模式,即server主動到被監控端採集數據;節點級別metrics 數據能夠使用node-exporter來收集,固然node-exporter也能夠收集pod容器裏的指標數據;alertmanager主要用來爲Prometheus監控系統提供告警功能;Prometheus web ui主要做用是爲其提供一個web查詢頁面;api
Prometheus 監控系統組件bash
kube-state-metrics:該組件主要用來爲監控k8s集羣中的指標數據提供計數能力;好比k8s節點有幾個,pod的數量等等;restful
node-exporter:該組件主要做用是用來收集對應節點上的指標數據;網絡
alertmanager:該組件主要用來爲Prometheus監控系統提供告警功能;app
prometheus-server:該組件主要用來存儲指標數據,處理指標數據,以及爲用戶提供一個restful api查詢接口;性能
控制pod可以被Prometheus抓取數據的註解信息網站
prometheus.io/scrape:該註解信息主要用來描述對應pod是否容許抓取指標數據,true表示容許,false表示不容許;
prometheus.io/path:用於描述抓取指標數據使用的url路徑,通常爲/metrics
prometheus.io/port:用於描述對應抓取指標數據使用的端口信息;
部署Prometheus監控系統
一、部署kube-state-metrics
建立kube-state-metrics rbac受權相關清單
[root@master01 kube-state-metrics]# cat kube-state-metrics-rbac.yaml apiVersion: v1 kind: ServiceAccount metadata: name: kube-state-metrics namespace: kube-system labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: kube-state-metrics labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile rules: - apiGroups: [""] resources: - configmaps - secrets - nodes - pods - services - resourcequotas - replicationcontrollers - limitranges - persistentvolumeclaims - persistentvolumes - namespaces - endpoints verbs: ["list", "watch"] - apiGroups: ["extensions","apps"] resources: - daemonsets - deployments - replicasets verbs: ["list", "watch"] - apiGroups: ["apps"] resources: - statefulsets verbs: ["list", "watch"] - apiGroups: ["batch"] resources: - cronjobs - jobs verbs: ["list", "watch"] - apiGroups: ["autoscaling"] resources: - horizontalpodautoscalers verbs: ["list", "watch"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: kube-state-metrics-resizer namespace: kube-system labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile rules: - apiGroups: [""] resources: - pods verbs: ["get"] - apiGroups: ["extensions","apps"] resources: - deployments resourceNames: ["kube-state-metrics"] verbs: ["get", "update"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: kube-state-metrics labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: kube-state-metrics subjects: - kind: ServiceAccount name: kube-state-metrics namespace: kube-system --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: kube-state-metrics namespace: kube-system labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: kube-state-metrics-resizer subjects: - kind: ServiceAccount name: kube-state-metrics namespace: kube-system [root@master01 kube-state-metrics]#
提示:上述清單主要建立了一個sa用戶,和兩個角色,並將sa用戶綁定之對應的角色上;讓其對應sa用戶擁有對應角色的相關權限;
建立kube-state-metrics service配置清單
[root@master01 kube-state-metrics]# cat kube-state-metrics-service.yaml apiVersion: v1 kind: Service metadata: name: kube-state-metrics namespace: kube-system labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile kubernetes.io/name: "kube-state-metrics" annotations: prometheus.io/scrape: 'true' spec: ports: - name: http-metrics port: 8080 targetPort: http-metrics protocol: TCP - name: telemetry port: 8081 targetPort: telemetry protocol: TCP selector: k8s-app: kube-state-metrics [root@master01 kube-state-metrics]#
建立kube-state-metrics 部署清單
[root@master01 kube-state-metrics]# cat kube-state-metrics-deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: kube-state-metrics namespace: kube-system labels: k8s-app: kube-state-metrics kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile version: v2.0.0-beta spec: selector: matchLabels: k8s-app: kube-state-metrics version: v2.0.0-beta replicas: 1 template: metadata: labels: k8s-app: kube-state-metrics version: v2.0.0-beta spec: priorityClassName: system-cluster-critical serviceAccountName: kube-state-metrics containers: - name: kube-state-metrics image: quay.io/coreos/kube-state-metrics:v2.0.0-beta ports: - name: http-metrics containerPort: 8080 - name: telemetry containerPort: 8081 readinessProbe: httpGet: path: /healthz port: 8080 initialDelaySeconds: 5 timeoutSeconds: 5 - name: addon-resizer image: k8s.gcr.io/addon-resizer:1.8.7 resources: limits: cpu: 100m memory: 30Mi requests: cpu: 100m memory: 30Mi env: - name: MY_POD_NAME valueFrom: fieldRef: fieldPath: metadata.name - name: MY_POD_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace volumeMounts: - name: config-volume mountPath: /etc/config command: - /pod_nanny - --config-dir=/etc/config - --container=kube-state-metrics - --cpu=100m - --extra-cpu=1m - --memory=100Mi - --extra-memory=2Mi - --threshold=5 - --deployment=kube-state-metrics volumes: - name: config-volume configMap: name: kube-state-metrics-config --- # Config map for resource configuration. apiVersion: v1 kind: ConfigMap metadata: name: kube-state-metrics-config namespace: kube-system labels: k8s-app: kube-state-metrics kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile data: NannyConfiguration: |- apiVersion: nannyconfig/v1alpha1 kind: NannyConfiguration [root@master01 kube-state-metrics]#
應用上述三個清單,部署kube-state-metrics組件
[root@master01 kube-state-metrics]# ls kube-state-metrics-deployment.yaml kube-state-metrics-rbac.yaml kube-state-metrics-service.yaml [root@master01 kube-state-metrics]# kubectl apply -f . deployment.apps/kube-state-metrics created configmap/kube-state-metrics-config created serviceaccount/kube-state-metrics created clusterrole.rbac.authorization.k8s.io/kube-state-metrics created role.rbac.authorization.k8s.io/kube-state-metrics-resizer created clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created rolebinding.rbac.authorization.k8s.io/kube-state-metrics created service/kube-state-metrics created [root@master01 kube-state-metrics]#
驗證:查看對應的pod和service是否都成功建立?
提示:能夠看到對應pod和svc都已經正常建立;
驗證:訪問對應service的8080端口,url爲/metrics,看看是否可以訪問到數據?
提示:能夠看到訪問對應service的8080端口,url爲/metrics可以訪問到對應數據,說明kube-state-metrics組件安裝部署完成;
二、部署node-exporter
建立node-export service配置清單
[root@master01 node_exporter]# cat node-exporter-service.yaml apiVersion: v1 kind: Service metadata: name: node-exporter namespace: kube-system annotations: prometheus.io/scrape: "true" labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile kubernetes.io/name: "NodeExporter" spec: clusterIP: None ports: - name: metrics port: 9100 protocol: TCP targetPort: 9100 selector: k8s-app: node-exporter [root@master01 node_exporter]#
建立node-export 部署清單
[root@master01 node_exporter]# cat node-exporter-ds.yml apiVersion: apps/v1 kind: DaemonSet metadata: name: node-exporter namespace: kube-system labels: k8s-app: node-exporter kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile version: v1.0.1 spec: selector: matchLabels: k8s-app: node-exporter version: v1.0.1 updateStrategy: type: OnDelete template: metadata: labels: k8s-app: node-exporter version: v1.0.1 spec: priorityClassName: system-node-critical containers: - name: prometheus-node-exporter image: "prom/node-exporter:v1.0.1" imagePullPolicy: "IfNotPresent" args: - --path.procfs=/host/proc - --path.sysfs=/host/sys ports: - name: metrics containerPort: 9100 hostPort: 9100 volumeMounts: - name: proc mountPath: /host/proc readOnly: true - name: sys mountPath: /host/sys readOnly: true resources: limits: memory: 50Mi requests: cpu: 100m memory: 50Mi hostNetwork: true hostPID: true volumes: - name: proc hostPath: path: /proc - name: sys hostPath: path: /sys tolerations: - key: node-role.kubernetes.io/master operator: Exists effect: NoSchedule [root@master01 node_exporter]#
提示:上述清單主要用daemonSet控制器來運行node-exporter pod,並在對應pod上作了共享宿主機網絡名稱空間和pid,以及對主節點污點的容忍度;這樣node-exporter就能夠在k8s的全部節點上運行一個pod,經過對應pod來採集對應節點上的指標數據;
應用上述兩個配置清單部署 node-exporter
[root@master01 node_exporter]# ls node-exporter-ds.yml node-exporter-service.yaml [root@master01 node_exporter]# kubectl apply -f . daemonset.apps/node-exporter created service/node-exporter created [root@master01 node_exporter]#
驗證:查看對應pod和svc是否正常建立?
[root@master01 node_exporter]# kubectl get pods -l "k8s-app=node-exporter" -n kube-system NAME READY STATUS RESTARTS AGE node-exporter-6zgkz 1/1 Running 0 107s node-exporter-9mvxr 1/1 Running 0 107s node-exporter-jbll7 1/1 Running 0 107s node-exporter-s7vvt 1/1 Running 0 107s node-exporter-xmrjh 1/1 Running 0 107s [root@master01 node_exporter]# kubectl get svc -n kube-system NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 39d kube-state-metrics ClusterIP 10.110.110.216 <none> 8080/TCP,8081/TCP 20m metrics-server ClusterIP 10.98.59.116 <none> 443/TCP 46h node-exporter ClusterIP None <none> 9100/TCP 116s [root@master01 node_exporter]#
驗證:訪問任意節點上的9100端口,url爲/metrics,看看是否可以訪問到指標數據?
提示:能夠看到對應端口下/metrics url可以訪問到對應的數據,說明node-exporter組件部署成功;
三、部署alertmanager
建立alertmanager pvc配置清單
[root@master01 alertmanager]# cat alertmanager-pvc.yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: alertmanager namespace: kube-system labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: EnsureExists spec: # storageClassName: standard accessModes: - ReadWriteOnce resources: requests: storage: "2Gi" [root@master01 alertmanager]#
建立pv
[root@master01 ~]# cat pv-demo.yaml apiVersion: v1 kind: PersistentVolume metadata: name: nfs-pv-v1 spec: capacity: storage: 5Gi volumeMode: Filesystem accessModes: ["ReadWriteOnce","ReadWriteMany","ReadOnlyMany"] persistentVolumeReclaimPolicy: Retain mountOptions: - hard - nfsvers=4.1 nfs: path: /data/v1 server: 192.168.0.99 --- apiVersion: v1 kind: PersistentVolume metadata: name: nfs-pv-v2 spec: capacity: storage: 5Gi volumeMode: Filesystem accessModes: ["ReadWriteOnce","ReadWriteMany","ReadOnlyMany"] persistentVolumeReclaimPolicy: Retain mountOptions: - hard - nfsvers=4.1 nfs: path: /data/v2 server: 192.168.0.99 --- apiVersion: v1 kind: PersistentVolume metadata: name: nfs-pv-v3 spec: capacity: storage: 5Gi volumeMode: Filesystem accessModes: ["ReadWriteOnce","ReadWriteMany","ReadOnlyMany"] persistentVolumeReclaimPolicy: Retain mountOptions: - hard - nfsvers=4.1 nfs: path: /data/v3 server: 192.168.0.99 [root@master01 ~]#
應用清單建立pv
[root@master01 ~]# kubectl apply -f pv-demo.yaml persistentvolume/nfs-pv-v1 created persistentvolume/nfs-pv-v2 created persistentvolume/nfs-pv-v3 created [root@master01 ~]# kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE nfs-pv-v1 5Gi RWO,ROX,RWX Retain Available 4s nfs-pv-v2 5Gi RWO,ROX,RWX Retain Available 4s nfs-pv-v3 5Gi RWO,ROX,RWX Retain Available 4s [root@master01 ~]#
建立alertmanager service配置清單
[root@master01 alertmanager]# cat alertmanager-service.yaml apiVersion: v1 kind: Service metadata: name: alertmanager namespace: kube-system labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile kubernetes.io/name: "Alertmanager" spec: ports: - name: http port: 80 protocol: TCP targetPort: 9093 nodePort: 30093 selector: k8s-app: alertmanager type: "NodePort" [root@master01 alertmanager]#
建立alertmanager cm配置清單
[root@master01 alertmanager]# cat alertmanager-configmap.yaml apiVersion: v1 kind: ConfigMap metadata: name: alertmanager-config namespace: kube-system labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: EnsureExists data: alertmanager.yml: | global: null receivers: - name: default-receiver route: group_interval: 5m group_wait: 10s receiver: default-receiver repeat_interval: 3h [root@master01 alertmanager]#
建立alertmanager 部署清單
[root@master01 alertmanager]# cat alertmanager-deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: alertmanager namespace: kube-system labels: k8s-app: alertmanager kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile version: v0.14.0 spec: replicas: 1 selector: matchLabels: k8s-app: alertmanager version: v0.14.0 template: metadata: labels: k8s-app: alertmanager version: v0.14.0 spec: priorityClassName: system-cluster-critical containers: - name: prometheus-alertmanager image: "prom/alertmanager:v0.14.0" imagePullPolicy: "IfNotPresent" args: - --config.file=/etc/config/alertmanager.yml - --storage.path=/data - --web.external-url=/ ports: - containerPort: 9093 readinessProbe: httpGet: path: /#/status port: 9093 initialDelaySeconds: 30 timeoutSeconds: 30 volumeMounts: - name: config-volume mountPath: /etc/config - name: storage-volume mountPath: "/data" subPath: "" resources: limits: cpu: 10m memory: 50Mi requests: cpu: 10m memory: 50Mi # - name: prometheus-alertmanager-configmap-reload # image: "jimmidyson/configmap-reload:v0.1" # imagePullPolicy: "IfNotPresent" # args: # - --volume-dir=/etc/config # - --webhook-url=http://localhost:9093/-/reload # volumeMounts: # - name: config-volume # mountPath: /etc/config # readOnly: true # resources: # limits: # cpu: 10m # memory: 10Mi # requests: # cpu: 10m # memory: 10Mi volumes: - name: config-volume configMap: name: alertmanager-config - name: storage-volume persistentVolumeClaim: claimName: alertmanager [root@master01 alertmanager]#
應用上述4個清單,部署alertmanager
[root@master01 alertmanager]# ls alertmanager-configmap.yaml alertmanager-deployment.yaml alertmanager-pvc.yaml alertmanager-service.yaml [root@master01 alertmanager]# kubectl apply -f . configmap/alertmanager-config created deployment.apps/alertmanager created persistentvolumeclaim/alertmanager created service/alertmanager created [root@master01 alertmanager]#
驗證:查看對應pod和svc是否正常建立?
[root@master01 alertmanager]# kubectl get pods -l "k8s-app=alertmanager" -n kube-system NAME READY STATUS RESTARTS AGE alertmanager-6546bf7676-lt9jq 1/1 Running 0 85s [root@master01 alertmanager]# kubectl get svc -n kube-system NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE alertmanager NodePort 10.99.246.148 <none> 80:30093/TCP 92s kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 39d kube-state-metrics ClusterIP 10.110.110.216 <none> 8080/TCP,8081/TCP 31m metrics-server ClusterIP 10.98.59.116 <none> 443/TCP 47h node-exporter ClusterIP None <none> 9100/TCP 13m [root@master01 alertmanager]#
驗證:訪問任意節點的30093端口,看看是否可以訪問到alertmanager?
提示:訪問對應的端口可以訪問到上述界面,說明alertmanager 部署成功;
四、部署prometheus-server
建立Prometheus rabc相關受權配置清單
[root@master01 prometheus-server]# cat prometheus-rbac.yaml apiVersion: v1 kind: ServiceAccount metadata: name: prometheus namespace: kube-system labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: prometheus labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile rules: - apiGroups: - "" resources: - nodes - nodes/metrics - services - endpoints - pods verbs: - get - list - watch - apiGroups: - "" resources: - configmaps verbs: - get - nonResourceURLs: - "/metrics" verbs: - get --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: prometheus labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: prometheus subjects: - kind: ServiceAccount name: prometheus namespace: kube-system [root@master01 prometheus-server]#
建立Prometheus service配置清單
[root@master01 prometheus-server]# cat prometheus-service.yaml kind: Service apiVersion: v1 metadata: name: prometheus namespace: kube-system labels: kubernetes.io/name: "Prometheus" kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile spec: ports: - name: http port: 9090 protocol: TCP targetPort: 9090 nodePort: 30090 selector: k8s-app: prometheus type: NodePort [root@master01 prometheus-server]#
建立Prometheus cm配置清單
[root@master01 prometheus-server]# cat prometheus-configmap.yaml # Prometheus configuration format https://prometheus.io/docs/prometheus/latest/configuration/configuration/ apiVersion: v1 kind: ConfigMap metadata: name: prometheus-config namespace: kube-system labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: EnsureExists data: prometheus.yml: | scrape_configs: - job_name: prometheus static_configs: - targets: - localhost:9090 - job_name: kubernetes-apiservers kubernetes_sd_configs: - role: endpoints relabel_configs: - action: keep regex: default;kubernetes;https source_labels: - __meta_kubernetes_namespace - __meta_kubernetes_service_name - __meta_kubernetes_endpoint_port_name scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt insecure_skip_verify: true bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token - job_name: kubernetes-nodes-kubelet kubernetes_sd_configs: - role: node relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt insecure_skip_verify: true bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token - job_name: kubernetes-nodes-cadvisor kubernetes_sd_configs: - role: node relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - target_label: __metrics_path__ replacement: /metrics/cadvisor scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt insecure_skip_verify: true bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token - job_name: kubernetes-service-endpoints kubernetes_sd_configs: - role: endpoints relabel_configs: - action: keep regex: true source_labels: - __meta_kubernetes_service_annotation_prometheus_io_scrape - action: replace regex: (https?) source_labels: - __meta_kubernetes_service_annotation_prometheus_io_scheme target_label: __scheme__ - action: replace regex: (.+) source_labels: - __meta_kubernetes_service_annotation_prometheus_io_path target_label: __metrics_path__ - action: replace regex: ([^:]+)(?::\d+)?;(\d+) replacement: $1:$2 source_labels: - __address__ - __meta_kubernetes_service_annotation_prometheus_io_port target_label: __address__ - action: labelmap regex: __meta_kubernetes_service_label_(.+) - action: replace source_labels: - __meta_kubernetes_namespace target_label: kubernetes_namespace - action: replace source_labels: - __meta_kubernetes_service_name target_label: kubernetes_name - job_name: kubernetes-services kubernetes_sd_configs: - role: service metrics_path: /probe params: module: - http_2xx relabel_configs: - action: keep regex: true source_labels: - __meta_kubernetes_service_annotation_prometheus_io_probe - source_labels: - __address__ target_label: __param_target - replacement: blackbox target_label: __address__ - source_labels: - __param_target target_label: instance - action: labelmap regex: __meta_kubernetes_service_label_(.+) - source_labels: - __meta_kubernetes_namespace target_label: kubernetes_namespace - source_labels: - __meta_kubernetes_service_name target_label: kubernetes_name - job_name: kubernetes-pods kubernetes_sd_configs: - role: pod relabel_configs: - action: keep regex: true source_labels: - __meta_kubernetes_pod_annotation_prometheus_io_scrape - action: replace regex: (.+) source_labels: - __meta_kubernetes_pod_annotation_prometheus_io_path target_label: __metrics_path__ - action: replace regex: ([^:]+)(?::\d+)?;(\d+) replacement: $1:$2 source_labels: - __address__ - __meta_kubernetes_pod_annotation_prometheus_io_port target_label: __address__ - action: labelmap regex: __meta_kubernetes_pod_label_(.+) - action: replace source_labels: - __meta_kubernetes_namespace target_label: kubernetes_namespace - action: replace source_labels: - __meta_kubernetes_pod_name target_label: kubernetes_pod_name alerting: alertmanagers: - kubernetes_sd_configs: - role: pod tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token relabel_configs: - source_labels: [__meta_kubernetes_namespace] regex: kube-system action: keep - source_labels: [__meta_kubernetes_pod_label_k8s_app] regex: alertmanager action: keep - source_labels: [__meta_kubernetes_pod_container_port_number] regex: action: drop [root@master01 prometheus-server]#
建立Prometheus 部署清單
[root@master01 prometheus-server]# cat prometheus-statefulset.yaml apiVersion: apps/v1 kind: StatefulSet metadata: name: prometheus namespace: kube-system labels: k8s-app: prometheus kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile version: v2.24.0 spec: serviceName: "prometheus" replicas: 1 podManagementPolicy: "Parallel" updateStrategy: type: "RollingUpdate" selector: matchLabels: k8s-app: prometheus template: metadata: labels: k8s-app: prometheus spec: priorityClassName: system-cluster-critical serviceAccountName: prometheus initContainers: - name: "init-chown-data" image: "busybox:latest" imagePullPolicy: "IfNotPresent" command: ["chown", "-R", "65534:65534", "/data"] volumeMounts: - name: prometheus-data mountPath: /data subPath: "" containers: # - name: prometheus-server-configmap-reload # image: "jimmidyson/configmap-reload:v0.1" # imagePullPolicy: "IfNotPresent" # args: # - --volume-dir=/etc/config # - --webhook-url=http://localhost:9090/-/reload # volumeMounts: # - name: config-volume # mountPath: /etc/config # readOnly: true # resources: # limits: # cpu: 10m # memory: 10Mi # requests: # cpu: 10m # memory: 10Mi - name: prometheus-server image: "prom/prometheus:v2.24.0" imagePullPolicy: "IfNotPresent" args: - --config.file=/etc/config/prometheus.yml - --storage.tsdb.path=/data - --web.console.libraries=/etc/prometheus/console_libraries - --web.console.templates=/etc/prometheus/consoles - --web.enable-lifecycle ports: - containerPort: 9090 readinessProbe: httpGet: path: /-/ready port: 9090 initialDelaySeconds: 30 timeoutSeconds: 30 livenessProbe: httpGet: path: /-/healthy port: 9090 initialDelaySeconds: 30 timeoutSeconds: 30 # based on 10 running nodes with 30 pods each resources: limits: cpu: 200m memory: 1000Mi requests: cpu: 200m memory: 1000Mi volumeMounts: - name: config-volume mountPath: /etc/config - name: prometheus-data mountPath: /data subPath: "" terminationGracePeriodSeconds: 300 volumes: - name: config-volume configMap: name: prometheus-config volumeClaimTemplates: - metadata: name: prometheus-data spec: # storageClassName: standard accessModes: - ReadWriteOnce resources: requests: storage: "5Gi" [root@master01 prometheus-server]#
提示:應用上述清單前,請確保對應pv容量是否夠用;
應用上述4個清單部署Prometheus server
[root@master01 prometheus-server]# ls prometheus-configmap.yaml prometheus-rbac.yaml prometheus-service.yaml prometheus-statefulset.yaml [root@master01 prometheus-server]# kubectl apply -f . configmap/prometheus-config created serviceaccount/prometheus created clusterrole.rbac.authorization.k8s.io/prometheus created clusterrolebinding.rbac.authorization.k8s.io/prometheus created service/prometheus created statefulset.apps/prometheus created [root@master01 prometheus-server]#
驗證:查看對應pod和svc是否成功建立?
[root@master01 prometheus-server]# kubectl get pods -l "k8s-app=prometheus" -n kube-system NAME READY STATUS RESTARTS AGE prometheus-0 1/1 Running 0 2m20s [root@master01 prometheus-server]# kubectl get svc -n kube-system NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE alertmanager NodePort 10.99.246.148 <none> 80:30093/TCP 10m kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 39d kube-state-metrics ClusterIP 10.110.110.216 <none> 8080/TCP,8081/TCP 40m metrics-server ClusterIP 10.98.59.116 <none> 443/TCP 47h node-exporter ClusterIP None <none> 9100/TCP 22m prometheus NodePort 10.111.155.1 <none> 9090:30090/TCP 2m27s [root@master01 prometheus-server]#
驗證:訪問任意節點的30090端口,看看對應Prometheus 是否可以被訪問?
提示:可以訪問到上述頁面,表示Prometheus server部署沒有問題;
經過上述界面查看監控指標數據
提示:選擇對應要查看的指標數據項,點擊execute,對應圖像就會呈現出來;到此Prometheus監控系統就部署完成了,接下來部署grafana,並配置grafana使用Prometheus數據源展現監控數據;
部署grafana
建立grafana 部署清單
[root@master01 grafana]# cat grafana.yaml apiVersion: apps/v1 kind: Deployment metadata: name: monitoring-grafana namespace: kube-system spec: replicas: 1 selector: matchLabels: task: monitoring k8s-app: grafana template: metadata: labels: task: monitoring k8s-app: grafana spec: containers: - name: grafana image: k8s.gcr.io/heapster-grafana-amd64:v5.0.4 ports: - containerPort: 3000 protocol: TCP volumeMounts: - mountPath: /etc/ssl/certs name: ca-certificates readOnly: true - mountPath: /var name: grafana-storage env: # - name: INFLUXDB_HOST # value: monitoring-influxdb - name: GF_SERVER_HTTP_PORT value: "3000" - name: GF_AUTH_BASIC_ENABLED value: "false" - name: GF_AUTH_ANONYMOUS_ENABLED value: "true" - name: GF_AUTH_ANONYMOUS_ORG_ROLE value: Admin - name: GF_SERVER_ROOT_URL value: / volumes: - name: ca-certificates hostPath: path: /etc/ssl/certs - name: grafana-storage emptyDir: {} --- apiVersion: v1 kind: Service metadata: labels: kubernetes.io/cluster-service: 'true' kubernetes.io/name: monitoring-grafana name: monitoring-grafana namespace: kube-system spec: ports: - port: 80 targetPort: 3000 selector: k8s-app: grafana type: "NodePort" [root@master01 grafana]#
應用資源清單 部署grafana
[root@master01 grafana]# ls grafana.yaml [root@master01 grafana]# kubectl apply -f . deployment.apps/monitoring-grafana created service/monitoring-grafana created [root@master01 grafana]#
驗證:查看對應pod和svc是否都建立?
[root@master01 grafana]# kubectl get pods -l "k8s-app=grafana" -n kube-system NAME READY STATUS RESTARTS AGE monitoring-grafana-6c74ccc5dd-grjzf 1/1 Running 0 87s [root@master01 grafana]# kubectl get svc -n kube-system NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE alertmanager NodePort 10.99.246.148 <none> 80:30093/TCP 82m kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 39d kube-state-metrics ClusterIP 10.110.110.216 <none> 8080/TCP,8081/TCP 112m metrics-server ClusterIP 10.98.59.116 <none> 443/TCP 2d monitoring-grafana NodePort 10.100.230.71 <none> 80:30196/TCP 92s node-exporter ClusterIP None <none> 9100/TCP 94m prometheus NodePort 10.111.155.1 <none> 9090:30090/TCP 74m [root@master01 grafana]#
提示:能夠看到grafana svc暴露了30196端口;
驗證:訪問grafana service 暴露的端口,看看對應pod是否可以被訪問?
提示:可以訪問到上述頁面,表示grafana部署成功;
配置grafana
一、配置grafana的數據源爲Prometheus
二、新建監控面板
提示:進入grafana.com網站上,下載監控面板模板;
下載好模板文件之後,導入模板文件到grafana
提示:選擇下載的模板文件,而後再選擇對應的數據源,點擊import便可;上面沒有數據的緣由是對應指標名稱和Prometheus中指標名稱不一樣致使的;咱們能夠根據本身環境Prometheus中指標數據名稱來修改模板文件;