部署好了 kube-prometheus 與 k8s-prometheus-adapter (詳見以前的博文 k8s 安裝 prometheus 過程記錄),使用下面的配置文件部署 HPA(Horizontal Pod Autoscaling) 卻失敗。html
apiVersion: autoscaling/v2beta2 kind: HorizontalPodAutoscaler metadata: name: blog-web spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: blog-web minReplicas: 2 maxReplicas: 12 metrics: - type: Pods pods: metric: name: http_requests target: type: AverageValue averageValue: 100
錯誤信息以下:node
unable to get metric http_requests: unable to fetch metrics from custom metrics API: the server could not find the metric http_requests for pods
經過下面的命令查看 custom.metrics.k8s.io api 支持的 http_requests(每秒請求數QPS)監控指標:git
$kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/ | jq . | egrep pods/.*http_requests "name": "pods/alertmanager_http_requests_in_flight", "name": "pods/prometheus_http_requests"
發現只有 prometheus_http_requests 指標 ,沒有所需的 http_requests 開頭的指標。github
打開 prometheus 控制檯,發現 /service-discovery 中沒有出現咱們想監控的應用 blog-web ,網上查找資料後知道了須要部署 ServiceMonitor 讓 prometheus 發現所監控的 service 。web
添加下面的 ServiceMonitor 配置文件:redis
kind: ServiceMonitor apiVersion: monitoring.coreos.com/v1 metadata: name: blog-web-monitor labels: app: blog-web-monitor spec: selector: matchLabels: app: blog-web endpoints: - port: http
部署後仍是沒有被 prometheus 發現,查看 prometheus 的日誌發現下面的錯誤:docker
Failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list resource \"pods\" in API group \"\" at the cluster scope
在園子裏的博文 PrometheusOperator服務自動發現-監控redis樣例 中找到了解決方法,將 prometheus-clusterRole.yaml 改成下面的配置:json
apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: prometheus-k8s rules: - apiGroups: - "" resources: - nodes - services - endpoints - pods - nodes/proxy verbs: - get - list - watch - apiGroups: - "" resources: - configmaps - nodes/metrics verbs: - get - nonResourceURLs: - /metrics verbs: - get
從新部署便可api
kubectl apply -f prometheus-clusterRole.yaml
注1:若是採用上面的方法仍是沒被發現,須要強制刷新 prometheus 的配置,參考 部署 ServiceMonitor 以後如何讓 Prometheus 當即發現 。
注2:也能夠將 prometheus 配置爲自動發現 service 與 pod ,參考園子裏的博文 prometheus配置pod和svc的自動發現和監控 與 PrometheusOperator服務自動發現-監控redis樣例 。bash
可是這時還有問題,雖然 service 被 prometheus 發現了,但 service 所對應的 pod 一個都沒被發現。
production/blog-web-monitor/0 (0/19 active targets)
排查後發現是由於 ServiceMonitor 與 Service 配置不對應,Service 配置文件中缺乏 ServiceMonitor 配置中 matchLabels
所對應的 label
,ServiceMonitor 中的 port
沒有對應 Service 中的 ports
配置,修正後的配置以下:
service-blog-web.yaml
apiVersion: v1 kind: Service metadata: name: blog-web labels: app: blog-web spec: type: NodePort selector: app: blog-web ports: - name: http-blog-web nodePort: 30080 port: 80 targetPort: 80
servicemonitor-blog-web.yaml
kind: ServiceMonitor apiVersion: monitoring.coreos.com/v1 metadata: name: blog-web-monitor labels: app: blog-web spec: selector: matchLabels: app: blog-web endpoints: - port: http-blog-web
用修正後的配置部署後,pod 終於被發現了:
production/blog-web-monitor/0 (0/5 up)
可是這些 pod 所有處於 down 狀態。
Endpoint State Scrape Duration Error http://192.168.107.233:80/metrics DOWN server returned HTTP status 400 Bad Request
經過園子裏的博文 使用Kubernetes演示金絲雀發佈 知道了原來須要應用本身提供 metrics 監控指標數據讓 prometheus 抓取。
標準Tomcat自帶的應用沒有/metrics這個路徑,prometheus獲取不到它能識別的格式數據,而指標數據就是從/metrics這裏獲取的。因此咱們使用標準Tomcat不行或者你就算有這個/metrics這個路徑,可是返回的格式不符合prometheus的規範也是不行的。
咱們的應用是用 ASP.NET Core 開發的,因此選用了 prometheus-net ,由它提供 metrics 數據給 prometheus 抓取。
dotnet add package prometheus-net.AspNetCore
app.UseRouting(); app.UseHttpMetrics();
app.UseEndpoints(endpoints => { endpoints.MapMetrics(); };
當經過下面的命令確認經過 /metrics 路徑能夠獲取監控數據時,
$ docker exec -t $(docker ps -f name=blog-web_blog-web -q | head -1) curl 127.0.0.1/metrics | grep http_request_duration_seconds_sum http_request_duration_seconds_sum{code="200",method="GET",controller="AggSite",action="SiteHome"} 0.44973779999999997 http_request_duration_seconds_sum{code="200",method="GET",controller="",action=""} 0.0631272
Prometheus 控制檯 /targets 頁面就能看到 blog-web 對應的 pod 都處於 up 狀態。
production/blog-web-monitor/0 (5/5 up)
這時經過 custom metrics api 能夠查詢到一些 http_requests 相關的指標。
$ kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/ | jq . | egrep pods/*/http_requests "name": "pods/http_requests_in_progress", "name": "pods/http_requests_received"
這裏的 http_requests_received
就是 QPS(每秒請求數) 指標數據,用下面的命令請求 custom metrics api 獲取數據:
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/production/pods/*/http_requests_received | jq .
其中1個 pod 的 http_requests_received 指標數據以下:
{ "kind": "MetricValueList", "apiVersion": "custom.metrics.k8s.io/v1beta1", "metadata": { "selfLink": "/apis/custom.metrics.k8s.io/v1beta1/namespaces/production/pods/%2A/http_requests_received" }, "items": [ { "describedObject": { "kind": "Pod", "namespace": "production", "name": "blog-web-65f7bdc996-8qp5c", "apiVersion": "/v1" }, "metricName": "http_requests_received", "timestamp": "2020-01-18T14:35:34Z", "value": "133m", "selector": null } ] }
其中的 133m
表示 0.133
。
而後就能夠在 HPA 配置文件中基於這個指標進行自動伸縮
apiVersion: autoscaling/v2beta2 kind: HorizontalPodAutoscaler metadata: name: blog-web spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: blog-web minReplicas: 5 maxReplicas: 12 metrics: - type: Pods pods: metric: name: http_requests_received target: type: AverageValue averageValue: 100
終於搞定了!
# kubectl get hpa NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE blog-web Deployment/blog-web 133m/100 5 12 5 4d