kubernetes 部署Prometheus

時間 2020-04-03

標籤 kubernetes 部署 prometheus 简体版

原文原文鏈接

kubernetes 部署Prometheus

標籤（空格分隔）： kubernetes系列php

一：組件說明node

二： Prometheus的部署git

三： HPA 資源限制

一：組件說明

1.1 相關地址信息

Prometheus
github 地址：https://github.com/coreos/kube-prometheus

1.2 組件說明

1.MetricServer：是kubernetes集羣資源使用狀況的聚合器，收集數據給kubernetes集羣內使用，如 kubectl,hpa,scheduler等。

2.PrometheusOperator：是一個系統監測和警報工具箱，用來存儲監控數據。

3.NodeExporter：用於各node的關鍵度量指標狀態數據。 

4.KubeStateMetrics：收集kubernetes集羣內資源對象數據，制定告警規則。 

5.Prometheus：採用pull方式收集apiserver，scheduler，controller-manager，kubelet組件數據，經過http協議傳輸。 

6.Grafana：是可視化數據統計和監控平臺。

二： Prometheus的部署

mkdir Prometheus
cd Prometheus

git clone https://github.com/coreos/kube-prometheus.git

cd /root/kube-prometheus/manifests

修改 grafana-service.yaml 文件，使用 nodeport 方式訪問 grafana：

vim grafana-service.yaml
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: grafana
  name: grafana
  namespace: monitoring
spec:
  type: NodePort
  ports:
  - name: http
    port: 3000
    targetPort: http
    nodePort: 30100
  selector:
    app: grafana
---

修改 prometheus-service.yaml，改成 nodepode

vim prometheus-service.yaml

-----
apiVersion: v1
kind: Service
metadata:
  labels:
    prometheus: k8s
  name: prometheus-k8s
  namespace: monitoring
spec:
  type: NodePort
  ports:
  - name: web
    port: 9090
    targetPort: web
    nodePort: 30200
  selector:
    app: prometheus
    prometheus: k8s
  sessionAffinity: ClientIP
----
修改 alertmanager-service.yaml，改成 nodeport 

vim alertmanager-service.yaml
---
apiVersion: v1
kind: Service
metadata:
  labels:
    alertmanager: main
  name: alertmanager-main
  namespace: monitoring
spec:
  type: NodePort
  ports:
  - name: web
    port: 9093
    targetPort: web
    nodePort: 30300
  selector:
    alertmanager: main
    app: alertmanager
  sessionAffinity: ClientIP
---

導入鏡像處理(節點所有導入)

上傳 load-images.sh prometheus.tar.gz 到 /root 

tar -zxvf prometheus.tar.gz

chmod +x load-images.sh 

./load-images.sh

kubectl apply -f kube-prometheus/manifests/

連續執行兩次： 第一次會報錯

kubectl apply -f kube-prometheus/manifests/

kubectl get pod -n monitoring 

kubectl get svc -n monitoring 

kubectl top node

prometheus 對應的 nodeport 端口爲 30200，訪問 http://MasterIP:30200

http://192.168.100.11:30200/graph

prometheus 的 WEB 界面上提供了基本的查詢 K8S 集羣中每一個 POD 的 CPU 使用狀況，查詢條件以下：

sum by (pod_name)( rate(container_cpu_usage_seconds_total{image!="", pod_name!=""}[1m] ) )

查看 grafana 服務暴露的端口號：

kubectl get service -n monitoring | grep grafana

grafana NodePort 10.107.56.143 <none> 3000:30100/TCP 20h

默認的用戶名與 密碼 都是admin

而後重新修改密碼便可

三：HPA 的資源限制

上傳hpa-example.tar 而後導入 (全部節點)

docker load -i hpa-example.tar

3.1 Horizontal Pod Autoscaling

Horizontal Pod Autoscaling 能夠根據 CPU 利用率自動伸縮一個 Replication Controller、Deployment 或者Replica Set 中的 Pod 數量

kubectl run php-apache --image=gcr.io/google_containers/hpa-example --requests=cpu=200m --expose --port=80

kubectl get deploy 

kubectl edit deploy php-apache
----
修改：
imagePullPolicy: Always  改成

imagePullPolicy: IfNotPresent
----

kubectl get pod

建立 HPA 控制器

kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10

kubectl top pod

增長負載，查看負載節點數目

kubectl run -i --tty load-generator --image=busybox /bin/sh
while true; do wget -q -O- http://php-apache.default.svc.cluster.local; done

pod 開始擴展

kubectl get hpa

kubectl get pod

kubernetes 回收的速度比較慢（很是慢）
這是由於併發的問題，一單有 大流量過來，若是回收的速度比較快，很容易將某一個pod給壓死

3.2 k8s 的資源限制

資源限制 - Pod

Kubernetes 對資源的限制其實是經過 cgroup 來控制的，cgroup 是容器的一組用來控制內核如何運行進程的
相關屬性集合。針對內存、CPU 和各類設備都有對應的 cgroup

默認狀況下，Pod 運行沒有 CPU 和內存的限額。 這意味着系統中的任何 Pod 將可以像執行該 Pod 所在的節點一
樣，消耗足夠多的 CPU 和內存 。通常會針對某些應用的 pod 資源進行資源限制，這個資源限制是經過
resources 的 requests 和 limits 來實現
---
spec:
  containers:
  - image: xxxx
    imagePullPolicy: Always
    name: auth
    ports:
    - containerPort: 8080
      protocol: TCP
    resources:
      limits:
        cpu: "4"
        memory: 2Gi
      requests:
        cpu: 250m
        memory: 250Mi
----

requests 要分分配的資源，limits 爲最高請求的資源值。能夠簡單理解爲初始值和最大值

資源限制 - 名稱空間

一、計算資源配額

apiVersion: v1
kind: ResourceQuota
metadata:
  name: compute-resources
  namespace: spark-cluster
spec:
  hard:
    pods: "20"
    requests.cpu: "20"
    requests.memory: 100Gi
    limits.cpu: "40"
    limits.memory: 200Gi

2. 配置對象數量配額限制
apiVersion: v1
kind: ResourceQuota
metadata:
  name: object-counts
  namespace: spark-cluster
spec:
  hard:
    configmaps: "10"
    persistentvolumeclaims: "4"
    replicationcontrollers: "20"
    secrets: "10"
    services: "10"
    services.loadbalancers: "2"

3. 配置 CPU 和 內存 LimitRange

apiVersion: v1
kind: LimitRange
metadata:
  name: mem-limit-range
spec:
  limits:
  - default:
      memory: 50Gi
      cpu: 5
   defaultRequest:
     memory: 1Gi
     cpu: 1
     type: Container
----
default 即 limit 的值
defaultRequest 即 request 的值