前面咱們學習了Heapster+cAdvisor方式監控,這是Prometheus Operator出現以前的k8s監控方案。後來出現了Prometheus Operator,可是目前Prometheus Operator已經不包含完整功能,完整的解決方案已經變爲kube-prometheus。項目地址爲:https://github.com/coreos/kube-prometheushtml
這個倉庫包括:kubernetes清單、granfana dashboard以及promethues rules。同時還包括容易上手的安裝腳本。node
組件包括:linux
上圖是Prometheus-Operator
官方提供的架構圖,其中Operator
是最核心的部分,做爲一個控制器,他會去建立Prometheus
、ServiceMonitor
、AlertManager
以及PrometheusRule
4個CRD
資源對象,而後會一直監控並維持這4個資源對象的狀態。git
其中建立的prometheus
這種資源對象就是做爲Prometheus Server
存在,而ServiceMonitor
就是exporter
的各類抽象,exporter
前面咱們已經學習了,是用來提供專門提供metrics
數據接口的工具,Prometheus
就是經過ServiceMonitor
提供的metrics
數據接口去 pull 數據的,固然alertmanager
這種資源對象就是對應的AlertManager
的抽象,而PrometheusRule
是用來被Prometheus
實例使用的報警規則文件。github
這樣咱們要在集羣中監控什麼數據,就變成了直接去操做 Kubernetes 集羣的資源對象了,是否是方便不少了。上圖中的 Service 和 ServiceMonitor 都是 Kubernetes 的資源,一個 ServiceMonitor 能夠經過 labelSelector 的方式去匹配一類 Service,Prometheus 也能夠經過 labelSelector 去匹配多個ServiceMonitor。docker
下載安裝源碼flask
git clone https://github.com/coreos/kube-prometheus.git
安裝文件都在kube-prometheus/manifests/ 目錄下。api
官方把全部文件都放在一塊兒,這裏我複製了而後分類下架構
mkdir prometheus cp kube-prometheus/manifests/* prometheus/ cd prometheus/ mkdir -p operator node-exporter alertmanager grafana kube-state-metrics prometheus serviceMonitor adapter mv *-serviceMonitor* serviceMonitor/ mv 0prometheus-operator* operator/ mv grafana-* grafana/ mv kube-state-metrics-* kube-state-metrics/ mv alertmanager-* alertmanager/ mv node-exporter-* node-exporter/ mv prometheus-adapter* adapter/ mv prometheus-* prometheus/
注意:新版本的默認label變了,須要修改選擇器爲beta.kubernetes.io/os,否則安裝的時候會卡住。
app
修改選擇器
sed -ri '/linux/s#kubernetes.io#beta.&#' \ alertmanager/alertmanager-alertmanager.yaml \ prometheus/prometheus-prometheus.yaml \ node-exporter/node-exporter-daemonset.yaml \ kube-state-metrics/kube-state-metrics-deployment.yaml
鏡像使用dockerhub上的
sed -ri '/quay.io/s#quay.io/prometheus#prom#' \ alertmanager/alertmanager-alertmanager.yaml \ prometheus/prometheus-prometheus.yaml \ node-exporter/node-exporter-daemonset.yaml
使用能拉取到的谷歌鏡像
find -type f -exec sed -ri 's#k8s.gcr.io#gcr.azk8s.cn/google_containers#' {} \;
當前文件目錄:
一、生成namespace
kubectl apply -f .
二、安裝operater
kubectl apply -f operator/
三、依次安裝其餘組件
kubectl apply -f adapter/ kubectl apply -f alertmanager/ kubectl apply -f node-exporter/ kubectl apply -f kube-state-metrics/ kubectl apply -f grafana/ kubectl apply -f prometheus/ kubectl apply -f serviceMonitor/
四、查看總體狀態
kubectl -n monitoring get all
[root@vm10-0-0-12 prometheus]# kubectl -n monitoring get all NAME READY STATUS RESTARTS AGE pod/alertmanager-main-0 2/2 Running 0 27h pod/alertmanager-main-1 2/2 Running 0 27h pod/alertmanager-main-2 2/2 Running 0 27h pod/grafana-7b86fd9ffd-sslwf 1/1 Running 0 27h pod/kube-state-metrics-688965c565-knjbz 4/4 Running 0 27h pod/node-exporter-4vtgl 2/2 Running 0 27h pod/node-exporter-5bnfw 2/2 Running 0 27h pod/node-exporter-9nnsp 2/2 Running 0 27h pod/node-exporter-fd8ng 2/2 Running 0 27h pod/node-exporter-nh5q9 2/2 Running 0 27h pod/node-exporter-z69fb 2/2 Running 0 27h pod/prometheus-adapter-66fc7797fd-qpt4d 1/1 Running 0 27h pod/prometheus-k8s-0 3/3 Running 1 27h pod/prometheus-k8s-1 3/3 Running 1 27h pod/prometheus-operator-78678c7494-6954s 1/1 Running 0 27h NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/alertmanager-main ClusterIP 10.254.160.2 <none> 9093/TCP 27h service/alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 27h service/grafana LoadBalancer 10.254.2.98 120.92.212.201 3000:31423/TCP 27h service/kube-state-metrics ClusterIP None <none> 8443/TCP,9443/TCP 27h service/node-exporter ClusterIP None <none> 9100/TCP 27h service/prometheus-adapter ClusterIP 10.254.225.221 <none> 443/TCP 27h service/prometheus-k8s LoadBalancer 10.254.23.154 120.92.92.56 9090:32361/TCP 27h service/prometheus-operated ClusterIP None <none> 9090/TCP 27h service/prometheus-operator ClusterIP None <none> 8080/TCP 27h NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/node-exporter 6 6 6 6 6 beta.kubernetes.io/os=linux 27h NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/grafana 1/1 1 1 27h deployment.apps/kube-state-metrics 1/1 1 1 27h deployment.apps/prometheus-adapter 1/1 1 1 27h deployment.apps/prometheus-operator 1/1 1 1 27h NAME DESIRED CURRENT READY AGE replicaset.apps/grafana-7b86fd9ffd 1 1 1 27h replicaset.apps/kube-state-metrics-688965c565 1 1 1 27h replicaset.apps/kube-state-metrics-758f8b9855 0 0 0 27h replicaset.apps/prometheus-adapter-66fc7797fd 1 1 1 27h replicaset.apps/prometheus-operator-78678c7494 1 1 1 27h NAME READY AGE statefulset.apps/alertmanager-main 3/3 27h statefulset.apps/prometheus-k8s 2/2 27h
一、kube-controller-manager 和 kube-scheduler組件配置
咱們能夠看到大部分的配置都是正常的,只有兩三個沒有管理到對應的監控目標,好比 kube-controller-manager 和 kube-scheduler 這兩個系統組件,這就和 ServiceMonitor 的定義有關係了,咱們先來查看下 kube-scheduler 組件對應的 ServiceMonitor 資源的定義:(prometheus-serviceMonitorKubeScheduler.yaml)
apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: labels: k8s-app: kube-scheduler name: kube-scheduler namespace: monitoring spec: endpoints: - interval: 30s # 每30s獲取一次信息 port: http-metrics # 對應service的端口名 jobLabel: k8s-app namespaceSelector: # 表示去匹配某一命名空間中的service,若是想從全部的namespace中匹配用any: true matchNames: - kube-system selector: # 匹配的 Service 的labels,若是使用mathLabels,則下面的全部標籤都匹配時纔會匹配該service,若是使用matchExpressions,則至少匹配一個標籤的service都會被選擇 matchLabels: k8s-app: kube-scheduler
上面是一個典型的 ServiceMonitor 資源文件的聲明方式,上面咱們經過selector.matchLabels
在 kube-system 這個命名空間下面匹配具備k8s-app=kube-scheduler
這樣的 Service,可是咱們系統中根本就沒有對應的 Service,因此咱們須要手動建立一個 Service:(prometheus-kubeSchedulerService.yaml)
cat prometheus-kubeSchedulerService.yaml
apiVersion: v1 kind: Service metadata: namespace: kube-system name: kube-scheduler labels: k8s-app: kube-scheduler spec: type: ClusterIP clusterIP: None ports: - name: port port: 10251 protocol: TCP --- apiVersion: v1 kind: Endpoints metadata: labels: k8s-app: kube-scheduler name: kube-scheduler namespace: kube-system subsets: - addresses: - ip: 10.0.0.5 - ip: 10.0.0.15 - ip: 10.0.0.20 ports: - name: http-metrics port: 10251 protocol: TCP
同理prometheus-kubeControllerManagerService.yaml也須要修改一下:
apiVersion: v1 kind: Service metadata: namespace: kube-system name: kube-controller-manager labels: k8s-app: kube-controller-manager spec: selector: component: kube-controller-manager type: ClusterIP clusterIP: None ports: - name: http-metrics port: 10252 targetPort: 10252 protocol: TCP --- apiVersion: v1 kind: Endpoints metadata: labels: k8s-app: kube-controller-manager name: kube-controller-manager namespace: kube-system subsets: - addresses: - ip: 10.0.0.5 - ip: 10.0.0.15 - ip: 10.0.0.20 ports: - name: http-metrics port: 10252 protocol: TCP
二、coredns配置
我在測試環境和線上環境都遇到了coredns沒法發現的問題。
出現這個問題的緣由在於/data/monitor/kube-prometheus/manifests/prometheus-serviceMonitorCoreDNS.yaml這個文件:
上面的matchLabels爲kube-dns,而實際上不是這個label:
pod/coredns-58d6869b44-ddczz 1/1 Running 0 4d4h 10.8.5.2 10.0.0.5 <none> <none> pod/coredns-58d6869b44-zrkx4 1/1 Running 0 4d4h 10.8.4.2 10.0.0.15 <none> <none> service/coredns ClusterIP 10.254.0.10 <none> 53/UDP,53/TCP,9153/TCP 4d4h k8s-app=coredns
其實是coredns,因此須要修改。
同時,rules裏面也須要修改。
修改完成後:
恢復正常。
三、自定義監控項
除了 Kubernetes 集羣中的一些資源對象、節點以及組件須要監控,有的時候咱們可能還須要根據實際的業務需求去添加自定義的監控項,添加一個自定義監控的步驟也是很是簡單的。
好比,在我這套環境中,業務那邊在測試微服務,啓動了2個pod。
同時他開放了7000端口做爲數據接口:
新建serviceMonitor文件:
cat prometheus-serviceMonitorJx3recipe.yaml
apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: jx3recipe namespace: monitoring labels: k8s-app: jx3recipe spec: jobLabel: k8s-app endpoints: - port: port interval: 30s scheme: http selector: matchLabels: run: jx3recipe namespaceSelector: matchNames: - default
新建service文件
cat prometheus-jx3recipeService.yaml
apiVersion: v1 kind: Service metadata: name: jx3recipe namespace: default labels: run: jx3recipe spec: type: ClusterIP clusterIP: None ports: - name: port port: 7000 protocol: TCP --- apiVersion: v1 kind: Endpoints metadata: name: jx3recipe namespace: default labels: run: jx3recipe subsets: - addresses: - ip: 10.8.0.19 nodeName: jx3recipe-01 - ip: 10.8.2.17 nodeName: jx3recipe-02 ports: - name: port port: 7000 protocol: TCP
這裏有個問題,我把EP寫死了,可能容器重啓後,IP地址改變的話,就不行了,因此這裏有點問題。
kubectl apply -f prometheus-jx3recipeService.yaml kubectl apply -f prometheus-serviceMonitorJx3recipe.yaml
在promethues的targets裏面能夠看到效果:
隨便挑個item查看數據:
一切OK,後續咱們就可使用grafana定製dashboard了。
上面環境搭建完成後,服務都是對內的,咱們須要對外進行暴露,因爲環境是搭建在金山雲上,因此直接使用公有云的LB便可。
grafana暴露
只須要把service的type改成LB。
直接經過外網IP訪問便可。http://120.92.*.*:3000訪問grafana。
進入後能夠看到已經自帶了不少dashboard,比較豐富,包括cluster、node、pod以及k8s的各個組件的監控數據。
Node
promethues自己
固然,咱們還能夠從grafana導入一些更加炫酷的dashboard,便於咱們實時掌握資源使用狀況。
相關文件已經上傳到了github上,能夠直接導入使用:
https://github.com/loveqx/k8s-study