kubernetes 1.11+/openshift3.11html
首選須要註冊一個apiservice(custom metrics API)。node
當HPA請求metrics時,kube-aggregator
(apiservice的controller)會將請求轉發到adapter,adapter做爲kubernentes集羣的pod,實現了Kubernetes resource metrics API and custom metrics API,它會根據配置的rules從Prometheus抓取並處理metrics,在處理(如重命名metrics等)完後將metric經過custom metrics API返回給HPA。最後HPA經過獲取的metrics的value對Deployment/ReplicaSet進行擴縮容。nginx
adapter做爲extension-apiserver(即本身實現的pod)
,充當了代理kube-apiserver請求Prometheus的功能。git
以下是k8s-prometheus-adapter apiservice的定義,kube-aggregator
經過下面的service
將請求轉發給adapter。v1beta1.custom.metrics.k8s.io
是寫在k8s-prometheus-adapter代碼中的,所以不能任意改變。github
apiVersion: apiregistration.k8s.io/v1beta1 kind: APIService metadata: name: v1beta1.custom.metrics.k8s.io spec: service: name: custom-metrics-apiserver namespace: custom-metrics group: custom.metrics.k8s.io version: v1beta1 insecureSkipTLSVerify: true groupPriorityMinimum: 100 versionPriority: 100
github下載k8s-prometheus-adapter正則表達式
參照官方文檔部署adapter:算法
pull鏡像:directxman12/k8s-prometheus-adapter:latest
,修改鏡像tag並push到本地鏡像倉庫docker
生成證書:運行以下shell腳本(來自官方)生成cm-adapter-serving-certs.yaml,並將其拷貝到manifests/
目錄下,該證書用於kube-aggregator
與adapter通訊時認證adapter。注意下面證書有效時間爲5年(43800h)以及受權的域名。shell
#!/usr/bin/env bash # exit immediately when a command fails set -e # only exit with zero if all commands of the pipeline exit successfully set -o pipefail # error on unset variables set -u # Detect if we are on mac or should use GNU base64 options case $(uname) in Darwin) b64_opts='-b=0' ;; *) b64_opts='--wrap=0' esac go get -v -u github.com/cloudflare/cfssl/cmd/... export PURPOSE=metrics echo '{"signing":{"default":{"expiry":"43800h","usages":["signing","key encipherment","'${PURPOSE}'"]}}}' > "ca-config.json" export SERVICE_NAME=custom-metrics-apiserver export ALT_NAMES='"custom-metrics-apiserver.custom-metrics","custom-metrics-apiserver.custom-metrics.svc"' echo "{\"CN\":\"${SERVICE_NAME}\", \"hosts\": [${ALT_NAMES}], \"key\": {\"algo\": \"rsa\",\"size\": 2048}}" | \ cfssl gencert -ca=ca.crt -ca-key=ca.key -config=ca-config.json - | cfssljson -bare apiserver cat <<-EOF > cm-adapter-serving-certs.yaml apiVersion: v1 kind: Secret metadata: name: cm-adapter-serving-certs data: serving.crt: $(base64 ${b64_opts} < apiserver.pem) serving.key: $(base64 ${b64_opts} < apiserver-key.pem) EOF
能夠在custom-metrics-apiservice.yaml中設置insecureSkipTLSVerify: true
時,kube-aggregator
不會校驗adapter的如上證書。若是須要啓用校驗,則須要在caBundle中添加openshift集羣的ca證書(非openshift集羣的自簽證書會被認爲是不可信任的證書),將openshift集羣master節點的/etc/origin/master/ca.crt進行base64轉碼黏貼到caBundle字段便可。json
base64 ca.crt
也能夠黏貼openshift集羣master節點的/root/.kube/config文件中的clusters.cluster.certificate-authority-data
字段
kubectl create namespace custom-metrics
openshift的kube-system下面可能沒有role extension-apiserver-authentication-reader
,若是不存在,則須要建立
apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: annotations: rbac.authorization.kubernetes.io/autoupdate: "true" labels: kubernetes.io/bootstrapping: rbac-defaults name: extension-apiserver-authentication-reader namespace: kube-system rules: - apiGroups: - "" resourceNames: - extension-apiserver-authentication resources: - configmaps verbs: - get
修改custom-metrics-apiserver-deployment.yaml的--prometheus-url
字段,指向正確的prometheus
建立其餘組件:kubectl create -f manifests/
在部署時會建立一個名爲custom-metrics-resource-reader
的clusterRole
,用於受權adapter讀取kubernetes cluster的資源,能夠看到其容許讀取的資源爲namespaces/pods/services
apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: custom-metrics-resource-reader rules: - apiGroups: - "" resources: - namespaces - pods - services verbs: - get - list
部署demo:
部署官方demo
# cat sample-app.deploy.yaml apiVersion: apps/v1 kind: Deployment metadata: name: sample-app labels: app: sample-app spec: replicas: 1 selector: matchLabels: app: sample-app template: metadata: labels: app: sample-app spec: containers: - image: docker-local.art.aliocp.csvw.com/openshift3/autoscale-demo:v0.1.2 name: metrics-provider ports: - name: http containerPort: 8080
建立service
apiVersion: v1 kind: Service metadata: labels: app: sample-app name: sample-app namespace: custom-metrics spec: ports: - name: http port: 80 protocol: TCP targetPort: 8080 selector: app: sample-app type: ClusterIP
在custom-metrics
命名空間下驗證能夠獲取到metrics
curl http://$(kubectl get service sample-app -o jsonpath='{ .spec.clusterIP }')/metrics
部署serviceMonitor
因爲HPA須要用到namespace
和pod
等kubernetes的資源信息,所以須要使用servicemonitor註冊方式來爲metrics添加這些信息
openshift Prometheus operator對servicemonitor的限制以下
serviceMonitorNamespaceSelector: matchExpressions: - key: openshift.io/cluster-monitoring operator: Exists serviceMonitorSelector: matchExpressions: - key: k8s-app operator: Exists
所以須要給custom-metrics
命名空間添加標籤
oc label namespace custom-metrics openshift.io/cluster-monitoring=true
在openshift-monitoring
命名空間中建立service-monitor
# cat service-monitor.yaml kind: ServiceMonitor apiVersion: monitoring.coreos.com/v1 metadata: name: sample-app labels: k8s-app: testsample app: sample-app spec: namespaceSelector: any: true selector: matchLabels: app: sample-app endpoints: - port: http
添加權限
oc adm policy add-cluster-role-to-user view system:serviceaccount:openshift-monitoring:prometheus-k8s oc adm policy add-role-to-user view system:serviceaccount:openshift-monitoring:prometheus-k8s -n custom-metrics
測試HPA
建立HPA,表示1秒請求大於0.5個時開始擴容
# cat sample-app-hpa.yaml kind: HorizontalPodAutoscaler apiVersion: autoscaling/v2beta1 metadata: name: sample-app spec: scaleTargetRef: # point the HPA at the sample application # you created above apiVersion: apps/v1 kind: Deployment name: sample-app # autoscale between 1 and 10 replicas minReplicas: 1 maxReplicas: 10 metrics: # use a "Pods" metric, which takes the average of the # given metric across all pods controlled by the autoscaling target - type: Pods pods: # use the metric that you used above: pods/http_requests metricName: http_requests_per_second # target 500 milli-requests per second, # which is 1 request every two seconds targetAverageValue: 500m
經過oc describe hpa sample-app
查看hpa是否運行正常
持續執行命令curl http://$(kubectl get service sample-app -o jsonpath='{ .spec.clusterIP }')/metrics
發出請求
經過命令kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/custom-metrics/pods/*/http_requests_per_second"
查看其對應的value
值,當其值大於500m時開始擴容
# oc get pod NAME READY STATUS RESTARTS AGE sample-app-6d55487cdd-dc6qz 1/1 Running 0 18h sample-app-6d55487cdd-w6bbb 1/1 Running 0 5m sample-app-6d55487cdd-zbdbr 1/1 Running 0 5m
過段時間,當kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/custom-metrics/pods/*/http_requests_per_second"
的值持續低於500m時進行縮容,縮容時間由--horizontal-pod-autoscaler-downscale-stabilization
指定,默認5分鐘。
提供oc get hpa
的TARGETS
字段能夠查看擴縮容比例
# oc get hpa NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE sample-app Deployment/sample-app 66m/500m 1 10 1 3h
部署adapter前須要配置adapter的rule,用於預處理metrics,默認配置爲manifests/custom-metrics-config-map.yaml
。adapter的配置主要分爲4個:
Discovery:指定須要處理的Prometheus的metrics。經過seriesQuery挑選須要處理的metrics集合,能夠經過seriesFilters精確過濾metrics。
seriesQuery能夠根據標籤進行查找(以下),也能夠直接指定metric name查找
seriesQuery: '{__name__=~"^container_.*_total",container_name!="POD",namespace!="",pod_name!=""}' seriesFilters: - isNot: "^container_.*_seconds_total"
seriesFilters:
is: <regex>, 匹配包含該正則表達式的metrics. isNot: <regex>, 匹配不包含該正則表達式的metrics.
Association:設置metric與kubernetes resources的映射關係,kubernetes resorces能夠經過kubectl api-resources
命令查看。overrides會將Prometheus metric label與一個kubernetes resource(下例爲deployment)關聯。須要注意的是該label必須是一個真實的kubernetes resource,如metric的pod_name能夠映射爲kubernetes的pod resource,但不能將container_image映射爲kubernetes的pod resource,映射錯誤會致使沒法經過custom metrics API獲取正確的值。這也表示metric中必須存在一個真實的resource 名稱,將其映射爲kubernetes resource。
resources: overrides: microservice: {group: "apps", resource: "deployment"}
Naming:用於將prometheus metrics名稱轉化爲custom metrics API所使用的metrics名稱,但不會改變其自己的metric名稱,即經過curl http://$(kubectl get service sample-app -o jsonpath='{ .spec.clusterIP }')/metrics
得到的仍然是老的metric名稱。若是不須要能夠不執行這一步。
# match turn any name <name>_total to <name>_per_second # e.g. http_requests_total becomes http_requests_per_second name: matches: "^(.*)_total$" as: "${1}_per_second"
如本例中HPA後續能夠經過/apis/{APIService-name}/v1beta1/namespaces/{namespaces-name}/pods/*/http_requests_per_second
獲取metrics
Querying:處理調用custom metrics API獲取到的metrics的value,該值最終提供給HPA進行擴縮容
# convert cumulative cAdvisor metrics into rates calculated over 2 minutes metricsQuery: "sum(rate(<<.Series>>{<<.LabelMatchers>>,container_name!="POD"}[2m])) by (<<.GroupBy>>)"
metricsQuery
字段使用Go template將URL請求轉變爲Prometheus的請求,它會提取custom metrics API請求中的字段,並將其劃分爲metric name,group-resource,以及group-resource中的一個或多個objects,對應以下字段:
Series
: metric名稱LabelMatchers
: 以逗號分割的objects,當前表示特定group-resource加上命名空間的label(若是該group-resource 是namespaced的)GroupBy
:以逗號分割的label的集合,當前表示LabelMatchers中的group-resource label假設metrics http_requests_per_second
以下
http_requests_per_second{pod="pod1",service="nginx1",namespace="somens"} http_requests_per_second{pod="pod2",service="nginx2",namespace="somens"}
當調用kubectl get --raw "/apis/{APIService-name}/v1beta1/namespaces/somens/pods/*/http_request_per_second"
時,metricsQuery
字段的模板的實際內容以下:
Series: "http_requests_total"
LabelMatchers: "pod=~\"pod1|pod2",namespace="somens"
GroupBy:pod
adapter使用字段rules
和externalRules
分別表示custom metrics和external metrics,如本例中
apiVersion: v1 kind: ConfigMap metadata: name: adapter-config namespace: openshift-monitoring data: config.yaml: | externalRules: - seriesQuery: '{namespace!="",pod!=""}' seriesFilters: [] resources: overrides: namespace: resource: namespace pod: resource: pod metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>}[22m])) by (<<.GroupBy>>) rules: - seriesQuery: '{namespace!="",pod!=""}' seriesFilters: [] resources: overrides: namespace: resource: namespace pod: resource: pod name: matches: "^(.*)_total" as: "${1}_per_second" metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)
HPA一般會根據type從aggregated APIs (metrics.k8s.io
, custom.metrics.k8s.io
, external.metrics.k8s.io
)的資源路徑上拉取metrics
HPA支持的metrics類型有4種(下述爲v2beta2的格式):
resource:目前僅支持cpu
和memory
。target能夠指定數值(targetAverageValue
)和比例(targetAverageUtilization
)進行擴縮容
HPA從metrics.k8s.io
獲取resource metrics
pods:custom metrics,這類metrics描述了pod類型,target僅支持按指定數值(targetAverageValue
)進行擴縮容。targetAverageValue
用於計算全部相關pods上的metrics的平均值
type: Pods pods: metric: name: packets-per-second target: type: AverageValue averageValue: 1k
HPA從custom.metrics.k8s.io
獲取custom metrics
object:custom metrics,這類metrics描述了相同命名空間下的(非pod)類型。target支持經過value
和AverageValue
進行擴縮容,前者直接將metric與target比較進行擴縮容,後者經過metric/相關的pod數目
與target比較進行擴縮容
type: Object object: metric: name: requests-per-second describedObject: apiVersion: extensions/v1beta1 kind: Ingress name: main-route target: type: Value value: 2k
external:kubernetes 1.10+。這類metrics與kubernetes集羣無關(pods和object須要與kubernetes中的某一類型關聯)。與object相似,target支持經過value
和AverageValue
進行擴縮容。因爲external會嘗試匹配全部kubernetes資源的metrics,所以實際中不建議使用該類型。
HPA從external.metrics.k8s.io
獲取external metrics
- type: External external: metric: name: queue_messages_ready selector: "queue=worker_tasks" target: type: AverageValue averageValue: 30
1.6版本支持多metrics的擴縮容,當其中一個metrics達到擴容標準時就會建立pod副本(當前副本<maxReplicas)
注:target的value的一個單位能夠劃分爲1000份,每一份以m
爲單位,如500m表示1/2
個單位。參見Quantity
kubernetes HPA的算法以下:
desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]
當使用targetAverageValue
或targetAverageUtilization
時,currentMetricValue會取HPA指定的全部pods的metric的平均值
假設註冊的APIService爲custom.metrics.k8s.io/v1beta1,在註冊好APIService後HorizontalPodAutoscaler controller會從以/apis/custom.metrics.k8s.io/v1beta1
爲根API的路徑上抓取metrics。metrics的API path能夠分爲namespaced
和non-namespaced
類型的。經過以下方式校驗HPA是否能夠獲取到metrics:
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/{namespace-name}/{object-type}/{object-name}/{metric-name...}"
如獲取monitor
命名空間下名爲grafana
的pod的start_time_seconds
metric
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/monitor/pods/grafana/start_time_seconds"
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/{namespace-name}/pods/*/{metric-name...}"
如獲取monitor
命名空間下名爲全部pod的start_time_seconds
metric
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/monitor/pods/*/start_time_seconds"
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/{namespace-name}/{object-type}/{object-name}/{metric-name...}?labelSelector={label-name}"
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/{namespace-name}/pods/*/{metric-name...}?labelSelector={label-name}"
non-namespaced和namespaced的相似,主要有node,namespace,PersistentVolume等。non-namespaced訪問有些與custom metrics API描述不一致。
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/{namespace-name}/metrics/{metric-name...}"
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/*/metrics/{metric-name...}"
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/nodes/{node-name}/{metric-name...}"
使用以下方式查看註冊的APIService發現的全部rules
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1
若是獲取失敗,能夠看下使用oc get apiservice v1beta1.custom.metrics.k8s.io -oyaml
查看status
和message
的相關信息
若是獲取到的resource爲空,則須要校驗deploy中的Prometheus url是否正確,是否有權限等
經過以下方式查看完整的請求過程(--v=8)
kubectl get --raw 「/apis/custom.metrics.k8s.io/v1beta1/namespaces/{namespace-name}/pods/*/{metric-name...}" --v=8
若是上述過程正確,但獲取到的items爲空
--metrics-relist-interval
設置值大於Prometheus的參數scrape_interval
rules
的seriesQuery
規則能夠抓取到Prometheus的數據rules
的metricsQuery
規則能夠抓取到計算出數據,此處須要注意的是,若是使用到了計算某段時間的數據,若是時間設置太短,可能致使沒有數據生成官方提供了End-to-end walkthrough,但須要採集的metrics中包含pod
和namespace
label,不然在官方默認配置下沒法採集到metrics。
Configuration Walkthroughs一步步講解了如何配置adapter config
在goland裏面使用以下參數能夠遠程調試adapter:
--secure-port=6443 --tls-cert-file=D:\adapter\serving.crt --tls-private-key-file=D:\adapter\serving.key --logtostderr=true --prometheus-url=${prometheus-url} --metrics-relist-interval=70s --v=10 --config=D:\adapter\config.yaml --lister-kubeconfig=D:\adapter\k8s-config.yaml --authorization-kubeconfig=D:\adapter\k8s-config.yaml --authentication-kubeconfig=D:\adapter\k8s-config.yaml
Kubernetes pod autoscaler using custom metrics
Kubernetes API Aggregation Setup — Nuts & Bolts