kubernetes監控終極方案-kube-promethues

kube-promethues簡介

前面咱們學習了Heapster+cAdvisor方式監控,這是Prometheus Operator出現以前的k8s監控方案。後來出現了Prometheus Operator,可是目前Prometheus Operator已經不包含完整功能,完整的解決方案已經變爲kube-prometheus。項目地址爲:https://github.com/coreos/kube-prometheushtml

這個倉庫包括:kubernetes清單、granfana dashboard以及promethues rules。同時還包括容易上手的安裝腳本。node

組件包括:linux

kube-promethues架構

上圖是Prometheus-Operator官方提供的架構圖,其中Operator是最核心的部分,做爲一個控制器,他會去建立PrometheusServiceMonitorAlertManager以及PrometheusRule4個CRD資源對象,而後會一直監控並維持這4個資源對象的狀態。git

其中建立的prometheus這種資源對象就是做爲Prometheus Server存在,而ServiceMonitor就是exporter的各類抽象,exporter前面咱們已經學習了,是用來提供專門提供metrics數據接口的工具,Prometheus就是經過ServiceMonitor提供的metrics數據接口去 pull 數據的,固然alertmanager這種資源對象就是對應的AlertManager的抽象,而PrometheusRule是用來被Prometheus實例使用的報警規則文件。github

這樣咱們要在集羣中監控什麼數據,就變成了直接去操做 Kubernetes 集羣的資源對象了,是否是方便不少了。上圖中的 Service 和 ServiceMonitor 都是 Kubernetes 的資源,一個 ServiceMonitor 能夠經過 labelSelector 的方式去匹配一類 Service,Prometheus 也能夠經過 labelSelector 去匹配多個ServiceMonitor。docker

kube-promethues部署

下載安裝源碼flask

git clone https://github.com/coreos/kube-prometheus.git

  安裝文件都在kube-prometheus/manifests/ 目錄下。api

官方把全部文件都放在一塊兒,這裏我複製了而後分類下架構

mkdir prometheus
cp kube-prometheus/manifests/* prometheus/
cd prometheus/
mkdir -p operator node-exporter alertmanager grafana kube-state-metrics prometheus serviceMonitor adapter
mv *-serviceMonitor* serviceMonitor/
mv 0prometheus-operator* operator/
mv grafana-* grafana/
mv kube-state-metrics-* kube-state-metrics/
mv alertmanager-* alertmanager/
mv node-exporter-* node-exporter/
mv prometheus-adapter* adapter/
mv prometheus-* prometheus/

  注意:新版本的默認label變了,須要修改選擇器爲beta.kubernetes.io/os,否則安裝的時候會卡住。app

修改選擇器

sed -ri '/linux/s#kubernetes.io#beta.&#' \
    alertmanager/alertmanager-alertmanager.yaml \
    prometheus/prometheus-prometheus.yaml \
    node-exporter/node-exporter-daemonset.yaml \
    kube-state-metrics/kube-state-metrics-deployment.yaml

 

鏡像使用dockerhub上的

sed -ri '/quay.io/s#quay.io/prometheus#prom#' \
  alertmanager/alertmanager-alertmanager.yaml \
  prometheus/prometheus-prometheus.yaml \
  node-exporter/node-exporter-daemonset.yaml

  

使用能拉取到的谷歌鏡像

find -type f -exec sed -ri 's#k8s.gcr.io#gcr.azk8s.cn/google_containers#' {} \;

  

當前文件目錄:

 

 

一、生成namespace

kubectl apply -f .

 

二、安裝operater

kubectl apply -f operator/

  

三、依次安裝其餘組件

kubectl apply -f adapter/
kubectl apply -f alertmanager/
kubectl apply -f node-exporter/
kubectl apply -f kube-state-metrics/
kubectl apply -f grafana/
kubectl apply -f prometheus/
kubectl apply -f serviceMonitor/

  

四、查看總體狀態

kubectl -n monitoring get all

  

[root@vm10-0-0-12 prometheus]# kubectl -n monitoring get all
NAME                                       READY   STATUS    RESTARTS   AGE
pod/alertmanager-main-0                    2/2     Running   0          27h
pod/alertmanager-main-1                    2/2     Running   0          27h
pod/alertmanager-main-2                    2/2     Running   0          27h
pod/grafana-7b86fd9ffd-sslwf               1/1     Running   0          27h
pod/kube-state-metrics-688965c565-knjbz    4/4     Running   0          27h
pod/node-exporter-4vtgl                    2/2     Running   0          27h
pod/node-exporter-5bnfw                    2/2     Running   0          27h
pod/node-exporter-9nnsp                    2/2     Running   0          27h
pod/node-exporter-fd8ng                    2/2     Running   0          27h
pod/node-exporter-nh5q9                    2/2     Running   0          27h
pod/node-exporter-z69fb                    2/2     Running   0          27h
pod/prometheus-adapter-66fc7797fd-qpt4d    1/1     Running   0          27h
pod/prometheus-k8s-0                       3/3     Running   1          27h
pod/prometheus-k8s-1                       3/3     Running   1          27h
pod/prometheus-operator-78678c7494-6954s   1/1     Running   0          27h

NAME                            TYPE           CLUSTER-IP       EXTERNAL-IP      PORT(S)                      AGE
service/alertmanager-main       ClusterIP      10.254.160.2     <none>           9093/TCP                     27h
service/alertmanager-operated   ClusterIP      None             <none>           9093/TCP,9094/TCP,9094/UDP   27h
service/grafana                 LoadBalancer   10.254.2.98      120.92.212.201   3000:31423/TCP               27h
service/kube-state-metrics      ClusterIP      None             <none>           8443/TCP,9443/TCP            27h
service/node-exporter           ClusterIP      None             <none>           9100/TCP                     27h
service/prometheus-adapter      ClusterIP      10.254.225.221   <none>           443/TCP                      27h
service/prometheus-k8s          LoadBalancer   10.254.23.154    120.92.92.56     9090:32361/TCP               27h
service/prometheus-operated     ClusterIP      None             <none>           9090/TCP                     27h
service/prometheus-operator     ClusterIP      None             <none>           8080/TCP                     27h

NAME                           DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                 AGE
daemonset.apps/node-exporter   6         6         6       6            6           beta.kubernetes.io/os=linux   27h

NAME                                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/grafana               1/1     1            1           27h
deployment.apps/kube-state-metrics    1/1     1            1           27h
deployment.apps/prometheus-adapter    1/1     1            1           27h
deployment.apps/prometheus-operator   1/1     1            1           27h

NAME                                             DESIRED   CURRENT   READY   AGE
replicaset.apps/grafana-7b86fd9ffd               1         1         1       27h
replicaset.apps/kube-state-metrics-688965c565    1         1         1       27h
replicaset.apps/kube-state-metrics-758f8b9855    0         0         0       27h
replicaset.apps/prometheus-adapter-66fc7797fd    1         1         1       27h
replicaset.apps/prometheus-operator-78678c7494   1         1         1       27h

NAME                                 READY   AGE
statefulset.apps/alertmanager-main   3/3     27h
statefulset.apps/prometheus-k8s      2/2     27h

  

 

kube-promethues配置

一、kube-controller-manager 和 kube-scheduler組件配置

咱們能夠看到大部分的配置都是正常的,只有兩三個沒有管理到對應的監控目標,好比 kube-controller-manager 和 kube-scheduler 這兩個系統組件,這就和 ServiceMonitor 的定義有關係了,咱們先來查看下 kube-scheduler 組件對應的 ServiceMonitor 資源的定義:(prometheus-serviceMonitorKubeScheduler.yaml)

 

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    k8s-app: kube-scheduler
  name: kube-scheduler
  namespace: monitoring
spec:
  endpoints:
  - interval: 30s # 每30s獲取一次信息
    port: http-metrics  # 對應service的端口名
  jobLabel: k8s-app
  namespaceSelector: # 表示去匹配某一命名空間中的service,若是想從全部的namespace中匹配用any: true
    matchNames:
    - kube-system
  selector:  # 匹配的 Service 的labels,若是使用mathLabels,則下面的全部標籤都匹配時纔會匹配該service,若是使用matchExpressions,則至少匹配一個標籤的service都會被選擇
    matchLabels:
      k8s-app: kube-scheduler

  上面是一個典型的 ServiceMonitor 資源文件的聲明方式,上面咱們經過selector.matchLabels在 kube-system 這個命名空間下面匹配具備k8s-app=kube-scheduler這樣的 Service,可是咱們系統中根本就沒有對應的 Service,因此咱們須要手動建立一個 Service:(prometheus-kubeSchedulerService.yaml)

cat prometheus-kubeSchedulerService.yaml

apiVersion: v1
kind: Service
metadata:
  namespace: kube-system
  name: kube-scheduler
  labels:
    k8s-app: kube-scheduler
spec:
  type: ClusterIP
  clusterIP: None
  ports:
  - name: port
    port: 10251
    protocol: TCP
---
apiVersion: v1
kind: Endpoints
metadata:
  labels:
    k8s-app: kube-scheduler
  name: kube-scheduler
  namespace: kube-system
subsets:
- addresses:
  - ip: 10.0.0.5
  - ip: 10.0.0.15
  - ip: 10.0.0.20
  ports:
  - name: http-metrics
    port: 10251
    protocol: TCP

  

  同理prometheus-kubeControllerManagerService.yaml也須要修改一下:

apiVersion: v1
kind: Service
metadata:
  namespace: kube-system
  name: kube-controller-manager
  labels:
    k8s-app: kube-controller-manager
spec:
  selector:
    component: kube-controller-manager
  type: ClusterIP
  clusterIP: None
  ports:
  - name: http-metrics
    port: 10252
    targetPort: 10252
    protocol: TCP

---
apiVersion: v1
kind: Endpoints
metadata:
  labels:
    k8s-app: kube-controller-manager
  name: kube-controller-manager
  namespace: kube-system
subsets:
- addresses:
  - ip: 10.0.0.5
  - ip: 10.0.0.15
  - ip: 10.0.0.20
  ports:
  - name: http-metrics
    port: 10252
    protocol: TCP

  

 二、coredns配置

我在測試環境和線上環境都遇到了coredns沒法發現的問題。

 

 

 出現這個問題的緣由在於/data/monitor/kube-prometheus/manifests/prometheus-serviceMonitorCoreDNS.yaml這個文件:

 

 

 上面的matchLabels爲kube-dns,而實際上不是這個label:

pod/coredns-58d6869b44-ddczz                    1/1     Running   0          4d4h   10.8.5.2    10.0.0.5    <none>           <none>
pod/coredns-58d6869b44-zrkx4                    1/1     Running   0          4d4h   10.8.4.2    10.0.0.15   <none>           <none>

service/coredns                   ClusterIP   10.254.0.10      <none>        53/UDP,53/TCP,9153/TCP    4d4h   k8s-app=coredns

  其實是coredns,因此須要修改。

同時,rules裏面也須要修改。

 

 

 

 

 

 修改完成後:

 

 

 

 

 

 恢復正常。

 

三、自定義監控項

除了 Kubernetes 集羣中的一些資源對象、節點以及組件須要監控,有的時候咱們可能還須要根據實際的業務需求去添加自定義的監控項,添加一個自定義監控的步驟也是很是簡單的。

  • 第一步創建一個 ServiceMonitor 對象,用於 Prometheus 添加監控項
  • 第二步爲 ServiceMonitor 對象關聯 metrics 數據接口的一個 Service 對象
  • 第三步確保 Service 對象能夠正確獲取到 metrics 數據

好比,在我這套環境中,業務那邊在測試微服務,啓動了2個pod。

 同時他開放了7000端口做爲數據接口:

 

新建serviceMonitor文件:

cat prometheus-serviceMonitorJx3recipe.yaml

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: jx3recipe
  namespace: monitoring
  labels:
    k8s-app: jx3recipe
spec:
  jobLabel: k8s-app
  endpoints:
  - port: port
    interval: 30s
    scheme: http
  selector:
    matchLabels:
      run: jx3recipe
  namespaceSelector:
    matchNames:
    - default

  

新建service文件

cat prometheus-jx3recipeService.yaml

apiVersion: v1
kind: Service
metadata:
  name: jx3recipe
  namespace: default
  labels:
    run: jx3recipe
spec:
  type: ClusterIP
  clusterIP: None
  ports:
  - name: port
    port: 7000
    protocol: TCP

---
apiVersion: v1
kind: Endpoints
metadata:
  name: jx3recipe 
  namespace: default
  labels:
    run: jx3recipe
subsets:
- addresses:
  - ip: 10.8.0.19
    nodeName: jx3recipe-01
  - ip: 10.8.2.17
    nodeName: jx3recipe-02
  ports:
  - name: port
    port: 7000
    protocol: TCP

  這裏有個問題,我把EP寫死了,可能容器重啓後,IP地址改變的話,就不行了,因此這裏有點問題。

kubectl apply -f  prometheus-jx3recipeService.yaml
kubectl apply -f  prometheus-serviceMonitorJx3recipe.yaml

  

在promethues的targets裏面能夠看到效果:

 

 

隨便挑個item查看數據:

 

 

 一切OK,後續咱們就可使用grafana定製dashboard了。

kube-promethues對外暴露

上面環境搭建完成後,服務都是對內的,咱們須要對外進行暴露,因爲環境是搭建在金山雲上,因此直接使用公有云的LB便可。

grafana暴露

 

 只須要把service的type改成LB。

 

 直接經過外網IP訪問便可。http://120.92.*.*:3000訪問grafana。

 

 

進入後能夠看到已經自帶了不少dashboard,比較豐富,包括cluster、node、pod以及k8s的各個組件的監控數據。

 

 

 

Node

 

 

 promethues自己

 

 

固然,咱們還能夠從grafana導入一些更加炫酷的dashboard,便於咱們實時掌握資源使用狀況。

 

 相關文件已經上傳到了github上,能夠直接導入使用:

https://github.com/loveqx/k8s-study

相關文章
相關標籤/搜索