環境:node
k8s 1.11集羣版本,kubeadm部署linux
docker 17.3.2版本git
Centos 7系統github
阿里雲服務器web
倉庫下載prometheus operatordocker
$ git clone https://github.com/coreos/kube-prometheus.git $ cd kube-prometheus/manifests
進入到 manifests 目錄下面,這個目錄下面包含咱們全部的資源清單文件,咱們須要對其中的文件 prometheus-serviceMonitorKubelet.yaml 進行簡單的修改,由於默認狀況下,這個 ServiceMonitor 是關聯的 kubelet 的10250端口去採集的節點數據,而咱們前面說過爲了安全,這個 metrics 數據已經遷移到10255這個只讀端口上面去了,咱們只須要將文件中的https-metrics
更改爲http-metrics
便可,這個在 Prometheus-Operator 對節點端點同步的代碼中有相關定義,感興趣的能夠點此查看完整代碼:api
Subsets: []v1.EndpointSubset{ { Ports: []v1.EndpointPort{ { Name: "https-metrics", Port: 10250, }, { Name: "http-metrics", Port: 10255, }, { Name: "cadvisor", Port: 4194, }, }, }, },
須要注意將insecureSkipVerify參數配置爲false,http才生效 insecureSkipVerify: false安全
endpoints: - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token honorLabels: true interval: 30s port: http-metrics scheme: http tlsConfig: insecureSkipVerify: false - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token honorLabels: true interval: 30s metricRelabelings: - action: drop regex: container_(network_tcp_usage_total|network_udp_usage_total|tasks_state|cpu_load_average_10s) sourceLabels: - __name__ path: /metrics/cadvisor port: http-metrics scheme: http tlsConfig: insecureSkipVerify: false
配置釘釘路由文件,並建立爲secret對象,掛載到prometheus-prometheus,yaml文件中。這裏須要將prometheus數據就行持久化存儲,還須要定義一個storageClass或者pvc掛載進去。服務器
alertmanager-main.yamlapp
global: resolve_timeout: 5m route: group_by: ['alertname'] group_wait: 30s group_interval: 5m repeat_interval: 2h receiver: 'web.hook' receivers: - name: 'web.hook' webhook_configs: - url: 'http://prometheus-webhook-dingtalk.monitors.svc.cluster.local:8060/dingtalk/ops_dingding/send'
建立alertmanager 配置文件secret對象
kubectl -n monitoring create se^C [k8s@master ~]$ kubectl -n monitoring create secret generic altermanager-main --from-file=altermanager-main.yaml
建立storageClass對象,爲prometheus提供持久化存儲,這裏使用阿里雲提供的雲盤或NAS服務,建立自定義storageClass對象
這裏選擇雲盤建立的alicloud-disk-ssd 存儲對象
爲prometheus配置服務自動發現功能,將prometheus-additional.yaml文件建立爲secret對象
prometheus-additional.yaml
- job_name: 'kubernetes-cadvisor' kubernetes_sd_configs: - role: node scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - target_label: __address__ replacement: kubernetes.default.svc:443 - source_labels: [__meta_kubernetes_node_name] regex: (.+) target_label: __metrics_path__ replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor - job_name: 'kubernetes-service-endpoints' kubernetes_sd_configs: - role: endpoints relabel_configs: - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape] action: keep regex: true - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme] action: replace target_label: __scheme__ regex: (https?) - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+) - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port] action: replace target_label: __address__ regex: ([^:]+)(?::\d+)?;(\d+) replacement: $1:$2 - action: labelmap regex: __meta_kubernetes_service_label_(.+) - source_labels: [__meta_kubernetes_namespace] action: replace target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_service_name] action: replace target_label: kubernetes_name - job_name: 'kubernetes-services' kubernetes_sd_configs: - role: service metrics_path: /probe params: module: [http_2xx] relabel_configs: - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe] action: keep regex: true - source_labels: [__address__] target_label: __param_target - target_label: __address__ replacement: blackbox-exporter.example.com:9115 - source_labels: [__param_target] target_label: instance - action: labelmap regex: __meta_kubernetes_service_label_(.+) - source_labels: [__meta_kubernetes_namespace] target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_service_name] target_label: kubernetes_name - job_name: 'kubernetes-ingresses' kubernetes_sd_configs: - role: ingress relabel_configs: - source_labels: [__meta_kubernetes_ingress_annotation_prometheus_io_probe] action: keep regex: true - source_labels: [__meta_kubernetes_ingress_scheme,__address__,__meta_kubernetes_ingress_path] regex: (.+);(.+);(.+) replacement: ${1}://${2}${3} target_label: __param_target - target_label: __address__ replacement: blackbox-exporter.example.com:9115 - source_labels: [__param_target] target_label: instance - action: labelmap regex: __meta_kubernetes_ingress_label_(.+) - source_labels: [__meta_kubernetes_namespace] target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_ingress_name] target_label: kubernetes_name - job_name: 'kubernetes-pods' kubernetes_sd_configs: - role: pod relabel_configs: - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape] action: keep regex: true - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+) - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port] action: replace regex: ([^:]+)(?::\d+)?;(\d+) replacement: $1:$2 target_label: __address__ - action: labelmap regex: __meta_kubernetes_pod_label_(.+) - source_labels: [__meta_kubernetes_namespace] action: replace target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_pod_name] action: replace target_label: kubernetes_pod_name
建立secret對象additional-config
kubectl -n monitoring create secret generics additional-config --from-file=prometheus-additional.yaml
將前面自定的配置文件和存儲類寫進prometheus中,實現prometheus監控的自定義化
apiVersion: monitoring.coreos.com/v1 kind: Prometheus metadata: labels: prometheus: k8s name: k8s namespace: monitoring spec: alerting: alertmanagers: - name: alertmanager-main namespace: monitoring port: web storage: #配置持久化存儲 volumeClaimTemplate: spec: storageClassName: alicloud-disk-ssd #使用alicloud-disk-ssd存儲類 resources: requests: storage: 50Gi baseImage: quay.io/prometheus/prometheus nodeSelector: kubernetes.io/os: linux podMonitorSelector: {} replicas: 2 secrets: #etcd 證書secret配置文件 - etcd-certs resources: requests: memory: 400Mi ruleSelector: matchLabels: prometheus: k8s role: alert-rules securityContext: fsGroup: 2000 runAsNonRoot: true runAsUser: 1000 additionalScrapeConfigs: #配置服務發現功能 name: additional-configs #secret 資源對象名稱 key: prometheus-additional.yaml #secret 對象中的key serviceAccountName: prometheus-k8s serviceMonitorNamespaceSelector: {} serviceMonitorSelector: {} version: v2.11.0
etcd 使用的證書都對應在節點的 /etc/kubernetes/pki/etcd 這個路徑下面,因此首先咱們將須要使用到的證書經過 secret 對象保存到集羣中去:(在 etcd 運行的節點)
kubectl create secret generics etcd---from-file=/etc/kubernetes/pki/etcd/ca.pemcerts --from-file=/etc/kubernetes/pki/etcd/etcd-client.pem --from-file=/etc/kubernetes/pki/etcd/etcd-client-key.pem -n monitoring
如今 Prometheus 訪問 etcd 集羣的證書已經準備好了,接下來建立 ServiceMonitor 對象便可(prometheus-serviceMonitorEtcd.yaml)
apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: etcd-k8s namespace: monitoring labels: k8s-app: etcd-k8s spec: jobLabel: k8s-app endpoints: - port: port interval: 30s scheme: https tlsConfig: caFile: /etc/prometheus/secrets/etcd-certs/ca.pem certFile: /etc/prometheus/secrets/etcd-certs/etcd-client.pem keyFile: /etc/prometheus/secrets/etcd-certs/etcd-client-key.pem insecureSkipVerify: true selector: matchLabels: k8s-app: etcd namespaceSelector: matchNames: - kube-system
ServiceMonitor 建立完成了,可是如今尚未關聯的對應的 Service 對象,因此須要咱們去手動建立一個 Service 對象(prometheus-etcdService.yaml):
apiVersion: v1 kind: Service metadata: name: etcd-k8s namespace: kube-system labels: k8s-app: etcd spec: type: ClusterIP clusterIP: None ports: - name: port port: 2379 protocol: TCP --- apiVersion: v1 kind: Endpoints metadata: name: etcd-k8s namespace: kube-system labels: k8s-app: etcd subsets: - addresses: - ip: 172.16.23.231 nodeName: etcd-master ports: - name: port port: 2379 protocol: TCP
咱們這裏建立的 Service 沒有采用前面經過 label 標籤的形式去匹配 Pod 的作法,由於前面咱們說過不少時候咱們建立的 etcd 集羣是獨立於集羣以外的,這種狀況下面咱們就須要自定義一個 Endpoints,要注意 metadata 區域的內容要和 Service 保持一致,Service 的 clusterIP 設置爲 None,對改知識點不太熟悉的,能夠去查看咱們前面關於 Service 部分的講解。
Endpoints 的 subsets 中填寫 etcd 集羣的地址便可,咱們這裏是單節點的,填寫一個便可,若是etcd配置文件中配置地址爲127.0.0.1則有可能監控失敗,須要修改成0.0.0.0
kube-scheduler、kube-controller-manager組件綁定地址都爲127.0.0.1,須要進入配置文件進行修改成0.0.0.0才能訪問端口,進行監控
apiVersion: v1 kind: Service metadata: name: kube-scheduler namespace: kube-system labels: k8s-app: kube-scheduler spec: selector: component: kube-scheduler clusterIP: None ports: - name: http-metrics targetPort: 10251 port: 10251 protocol: TCP --- apiVersion: v1 kind: Service metadata: name: kube-controller-manager namespace: kube-system labels: k8s-app: kube-controller-manager spec: selector: component: kube-controller-manager clusterIP: None ports: - name: http-metrics targetPort: 10252 port: 10252 ##kubelet-service.yaml 文件省略
apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: prometheus-k8s rules: - apiGroups: - "" resources: - nodes - services - endpoints - pods - nodes/proxy verbs: - get - list - watch - apiGroups: - "" resources: - configmaps - nodes/metrics verbs: - get - nonResourceURLs: - /metrics verbs: - get
kubectl create -f .
到此prometheus-operator 生產環境中就已經部署完畢,grafana的圖表配置和alertmanager的告警優化模板通知功能還需補全。
一、建立storageClass對象,爲prometheus提供持久化存儲寫入prometheus-prometheus.yaml文件中 二、修改alermanager-secret.yaml secret對象中的配置數據,改爲自定義的釘釘報警路由或者 郵箱帳號配置 三、建立etcd、scheduler、controller Service對象 四、配置服務告警規則prometheus-etcdRules.yaml 文件或在源文件中添加 五、建立prometheus服務自動發現secret配置文件,並寫入prometheus-prometheus.yaml文件中 六、建立etcd證書secret 、serviceMonitorEtcd對象文件 七、修改promethus-clusterRule.yaml 權限 八、執行部署