使用k8s-prometheus-adapter實現HPA

環境:

kubernetes 1.11+/openshift3.11html


自定義metric HPA原理:

首選須要註冊一個apiservice(custom metrics API)。node

當HPA請求metrics時,kube-aggregator(apiservice的controller)會將請求轉發到adapter,adapter做爲kubernentes集羣的pod,實現了Kubernetes resource metrics API and custom metrics API,它會根據配置的rules從Prometheus抓取並處理metrics,在處理(如重命名metrics等)完後將metric經過custom metrics API返回給HPA。最後HPA經過獲取的metrics的value對Deployment/ReplicaSet進行擴縮容。nginx

adapter做爲extension-apiserver(即本身實現的pod),充當了代理kube-apiserver請求Prometheus的功能。git

以下是k8s-prometheus-adapter apiservice的定義,kube-aggregator經過下面的service將請求轉發給adapter。v1beta1.custom.metrics.k8s.io是寫在k8s-prometheus-adapter代碼中的,所以不能任意改變。github

apiVersion: apiregistration.k8s.io/v1beta1
kind: APIService
metadata:
  name: v1beta1.custom.metrics.k8s.io
spec:
  service:
    name: custom-metrics-apiserver
    namespace: custom-metrics
  group: custom.metrics.k8s.io
  version: v1beta1
  insecureSkipTLSVerify: true
  groupPriorityMinimum: 100
  versionPriority: 100

部署:

  • github下載k8s-prometheus-adapter正則表達式

  • 參照官方文檔部署adapter:算法

    • pull鏡像:directxman12/k8s-prometheus-adapter:latest,修改鏡像tag並push到本地鏡像倉庫docker

    • 生成證書:運行以下shell腳本(來自官方)生成cm-adapter-serving-certs.yaml,並將其拷貝到manifests/目錄下,該證書用於kube-aggregator與adapter通訊時認證adapter。注意下面證書有效時間爲5年(43800h)以及受權的域名。shell

      #!/usr/bin/env bash
      # exit immediately when a command fails
      set -e
      # only exit with zero if all commands of the pipeline exit successfully
      set -o pipefail
      # error on unset variables
      set -u
      
      # Detect if we are on mac or should use GNU base64 options
      case $(uname) in
              Darwin)
                  b64_opts='-b=0'
                  ;; 
              *)
                  b64_opts='--wrap=0'
      esac
      
      go get -v -u github.com/cloudflare/cfssl/cmd/...
      
      export PURPOSE=metrics
      echo '{"signing":{"default":{"expiry":"43800h","usages":["signing","key encipherment","'${PURPOSE}'"]}}}' > "ca-config.json"
      
      export SERVICE_NAME=custom-metrics-apiserver
      export ALT_NAMES='"custom-metrics-apiserver.custom-metrics","custom-metrics-apiserver.custom-metrics.svc"'
      echo "{\"CN\":\"${SERVICE_NAME}\", \"hosts\": [${ALT_NAMES}], \"key\": {\"algo\": \"rsa\",\"size\": 2048}}" | \
             	cfssl gencert -ca=ca.crt -ca-key=ca.key -config=ca-config.json - | cfssljson -bare apiserver
      
      cat <<-EOF > cm-adapter-serving-certs.yaml
      apiVersion: v1
      kind: Secret
      metadata:
        name: cm-adapter-serving-certs
      data:
        serving.crt: $(base64 ${b64_opts} < apiserver.pem)
        serving.key: $(base64 ${b64_opts} < apiserver-key.pem)
      EOF

      能夠在custom-metrics-apiservice.yaml中設置insecureSkipTLSVerify: true時,kube-aggregator不會校驗adapter的如上證書。若是須要啓用校驗,則須要在caBundle中添加openshift集羣的ca證書(非openshift集羣的自簽證書會被認爲是不可信任的證書),將openshift集羣master節點的/etc/origin/master/ca.crt進行base64轉碼黏貼到caBundle字段便可。json

      base64 ca.crt

      也能夠黏貼openshift集羣master節點的/root/.kube/config文件中的clusters.cluster.certificate-authority-data字段

      • 建立命名空間:kubectl create namespace custom-metrics
    • openshift的kube-system下面可能沒有role extension-apiserver-authentication-reader,若是不存在,則須要建立

      apiVersion: rbac.authorization.k8s.io/v1
      kind: Role
      metadata:
        annotations:
          rbac.authorization.kubernetes.io/autoupdate: "true"
        labels:
          kubernetes.io/bootstrapping: rbac-defaults
        name: extension-apiserver-authentication-reader
        namespace: kube-system
      rules:
      - apiGroups:
        - ""
        resourceNames:
        - extension-apiserver-authentication
        resources:
        - configmaps
        verbs:
        - get
    • 修改custom-metrics-apiserver-deployment.yaml的--prometheus-url字段,指向正確的prometheus

    • 建立其餘組件:kubectl create -f manifests/

      在部署時會建立一個名爲custom-metrics-resource-readerclusterRole,用於受權adapter讀取kubernetes cluster的資源,能夠看到其容許讀取的資源爲namespaces/pods/services

      apiVersion: rbac.authorization.k8s.io/v1
      kind: ClusterRole
      metadata:
        name: custom-metrics-resource-reader
      rules:
      - apiGroups:
        - ""
        resources:
        - namespaces
        - pods
        - services
        verbs:
        - get
        - list
  • 部署demo:

    • 部署官方demo

      # cat sample-app.deploy.yaml
      apiVersion: apps/v1
      kind: Deployment
      metadata:
        name: sample-app
        labels:
          app: sample-app
      spec:
        replicas: 1
        selector:
          matchLabels:
            app: sample-app
        template:
          metadata:
            labels:
              app: sample-app
          spec:
            containers:
            - image: docker-local.art.aliocp.csvw.com/openshift3/autoscale-demo:v0.1.2
              name: metrics-provider
              ports:
              - name: http
                containerPort: 8080
    • 建立service

      apiVersion: v1
      kind: Service
      metadata:
        labels:
          app: sample-app
        name: sample-app
        namespace: custom-metrics
      spec:
        ports:
        - name: http
          port: 80
          protocol: TCP
          targetPort: 8080
        selector:
          app: sample-app
        type: ClusterIP

      custom-metrics命名空間下驗證能夠獲取到metrics

      curl http://$(kubectl get service sample-app -o jsonpath='{ .spec.clusterIP }')/metrics
  • 部署serviceMonitor

    因爲HPA須要用到namespacepod等kubernetes的資源信息,所以須要使用servicemonitor註冊方式來爲metrics添加這些信息

    • openshift Prometheus operator對servicemonitor的限制以下

      serviceMonitorNamespaceSelector:
          matchExpressions:
          - key: openshift.io/cluster-monitoring
            operator: Exists
        serviceMonitorSelector:
          matchExpressions:
          - key: k8s-app
            operator: Exists
    • 所以須要給custom-metrics命名空間添加標籤

      oc label namespace custom-metrics openshift.io/cluster-monitoring=true
    • openshift-monitoring命名空間中建立service-monitor

      # cat service-monitor.yaml
      kind: ServiceMonitor
      apiVersion: monitoring.coreos.com/v1
      metadata:
        name: sample-app
        labels:
          k8s-app: testsample
          app: sample-app
      spec:
        namespaceSelector:
          any: true
        selector:
          matchLabels:
            app: sample-app
        endpoints:
        - port: http
    • 添加權限

      oc adm policy add-cluster-role-to-user view system:serviceaccount:openshift-monitoring:prometheus-k8s
      
      oc adm policy add-role-to-user view system:serviceaccount:openshift-monitoring:prometheus-k8s -n custom-metrics
  • 測試HPA

    • 建立HPA,表示1秒請求大於0.5個時開始擴容

      # cat sample-app-hpa.yaml
      kind: HorizontalPodAutoscaler
      apiVersion: autoscaling/v2beta1
      metadata:
        name: sample-app
      spec:
        scaleTargetRef:
          # point the HPA at the sample application
          # you created above
          apiVersion: apps/v1
          kind: Deployment
          name: sample-app
        # autoscale between 1 and 10 replicas
        minReplicas: 1
        maxReplicas: 10
        metrics:
        # use a "Pods" metric, which takes the average of the
        # given metric across all pods controlled by the autoscaling target
        - type: Pods
          pods:
            # use the metric that you used above: pods/http_requests
            metricName: http_requests_per_second
            # target 500 milli-requests per second,
            # which is 1 request every two seconds
            targetAverageValue: 500m

      經過oc describe hpa sample-app查看hpa是否運行正常

    • 持續執行命令curl http://$(kubectl get service sample-app -o jsonpath='{ .spec.clusterIP }')/metrics發出請求

    • 經過命令kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/custom-metrics/pods/*/http_requests_per_second"查看其對應的value值,當其值大於500m時開始擴容

      # oc get pod
      NAME                          READY     STATUS    RESTARTS   AGE
      sample-app-6d55487cdd-dc6qz   1/1       Running   0          18h
      sample-app-6d55487cdd-w6bbb   1/1       Running   0          5m
      sample-app-6d55487cdd-zbdbr   1/1       Running   0          5m
    • 過段時間,當kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/custom-metrics/pods/*/http_requests_per_second"的值持續低於500m時進行縮容,縮容時間由--horizontal-pod-autoscaler-downscale-stabilization指定,默認5分鐘。

      提供oc get hpaTARGETS字段能夠查看擴縮容比例

      # oc get hpa
      NAME         REFERENCE               TARGETS    MINPODS   MAXPODS   REPLICAS   AGE
      sample-app   Deployment/sample-app   66m/500m   1         10        1          3h

Adapter config

部署adapter前須要配置adapter的rule,用於預處理metrics,默認配置爲manifests/custom-metrics-config-map.yaml。adapter的配置主要分爲4個:

  • Discovery:指定須要處理的Prometheus的metrics。經過seriesQuery挑選須要處理的metrics集合,能夠經過seriesFilters精確過濾metrics。

    seriesQuery能夠根據標籤進行查找(以下),也能夠直接指定metric name查找

    seriesQuery: '{__name__=~"^container_.*_total",container_name!="POD",namespace!="",pod_name!=""}'
    seriesFilters:
      - isNot: "^container_.*_seconds_total"

    seriesFilters:

    is: <regex>, 匹配包含該正則表達式的metrics.
    isNot: <regex>, 匹配不包含該正則表達式的metrics.
  • Association:設置metric與kubernetes resources的映射關係,kubernetes resorces能夠經過kubectl api-resources命令查看。overrides會將Prometheus metric label與一個kubernetes resource(下例爲deployment)關聯。須要注意的是該label必須是一個真實的kubernetes resource,如metric的pod_name能夠映射爲kubernetes的pod resource,但不能將container_image映射爲kubernetes的pod resource,映射錯誤會致使沒法經過custom metrics API獲取正確的值。這也表示metric中必須存在一個真實的resource 名稱,將其映射爲kubernetes resource。

    resources:
      overrides:
        microservice: {group: "apps", resource: "deployment"}
  • Naming:用於將prometheus metrics名稱轉化爲custom metrics API所使用的metrics名稱,但不會改變其自己的metric名稱,即經過curl http://$(kubectl get service sample-app -o jsonpath='{ .spec.clusterIP }')/metrics得到的仍然是老的metric名稱。若是不須要能夠不執行這一步。

    # match turn any name <name>_total to <name>_per_second
    # e.g. http_requests_total becomes http_requests_per_second
    name:
      matches: "^(.*)_total$"
      as: "${1}_per_second"

    如本例中HPA後續能夠經過/apis/{APIService-name}/v1beta1/namespaces/{namespaces-name}/pods/*/http_requests_per_second獲取metrics

  • Querying:處理調用custom metrics API獲取到的metrics的value,該值最終提供給HPA進行擴縮容

    # convert cumulative cAdvisor metrics into rates calculated over 2 minutes
    metricsQuery: "sum(rate(<<.Series>>{<<.LabelMatchers>>,container_name!="POD"}[2m])) by (<<.GroupBy>>)"

    metricsQuery 字段使用Go template將URL請求轉變爲Prometheus的請求,它會提取custom metrics API請求中的字段,並將其劃分爲metric name,group-resource,以及group-resource中的一個或多個objects,對應以下字段:

    • Series: metric名稱
    • LabelMatchers: 以逗號分割的objects,當前表示特定group-resource加上命名空間的label(若是該group-resource 是namespaced的)
    • GroupBy:以逗號分割的label的集合,當前表示LabelMatchers中的group-resource label

    假設metrics http_requests_per_second以下

    http_requests_per_second{pod="pod1",service="nginx1",namespace="somens"}
    http_requests_per_second{pod="pod2",service="nginx2",namespace="somens"}

    當調用kubectl get --raw "/apis/{APIService-name}/v1beta1/namespaces/somens/pods/*/http_request_per_second"時,metricsQuery字段的模板的實際內容以下:

    • Series: "http_requests_total"
    • LabelMatchers: "pod=~\"pod1|pod2",namespace="somens"
    • GroupBy:pod

    adapter使用字段rulesexternalRules分別表示custom metrics和external metrics,如本例中

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: adapter-config
      namespace: openshift-monitoring
    data:
      config.yaml: |
        externalRules:
        - seriesQuery: '{namespace!="",pod!=""}'
          seriesFilters: []
          resources:
            overrides:
              namespace:
                resource: namespace
              pod:
                resource: pod
          metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>}[22m])) by (<<.GroupBy>>)
        rules:
        - seriesQuery: '{namespace!="",pod!=""}'
          seriesFilters: []
          resources:
            overrides:
              namespace:
                resource: namespace
              pod:
                resource: pod
          name:
            matches: "^(.*)_total"
            as: "${1}_per_second"
          metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)

HPA的配置

HPA一般會根據type從aggregated APIs (metrics.k8s.io, custom.metrics.k8s.io, external.metrics.k8s.io)的資源路徑上拉取metrics

HPA支持的metrics類型有4種(下述爲v2beta2的格式):

  • resource:目前僅支持cpumemory。target能夠指定數值(targetAverageValue)和比例(targetAverageUtilization)進行擴縮容

    HPA從metrics.k8s.io獲取resource metrics

  • pods:custom metrics,這類metrics描述了pod類型,target僅支持按指定數值(targetAverageValue)進行擴縮容。targetAverageValue 用於計算全部相關pods上的metrics的平均值

    type: Pods
    pods:
      metric:
        name: packets-per-second
      target:
        type: AverageValue
        averageValue: 1k

    HPA從custom.metrics.k8s.io獲取custom metrics

  • object:custom metrics,這類metrics描述了相同命名空間下的(非pod)類型。target支持經過valueAverageValue進行擴縮容,前者直接將metric與target比較進行擴縮容,後者經過metric/相關的pod數目與target比較進行擴縮容

    type: Object
    object:
      metric:
        name: requests-per-second
      describedObject:
        apiVersion: extensions/v1beta1
        kind: Ingress
        name: main-route
      target:
        type: Value
        value: 2k
  • external:kubernetes 1.10+。這類metrics與kubernetes集羣無關(pods和object須要與kubernetes中的某一類型關聯)。與object相似,target支持經過valueAverageValue進行擴縮容。因爲external會嘗試匹配全部kubernetes資源的metrics,所以實際中不建議使用該類型。

    HPA從external.metrics.k8s.io獲取external metrics

    - type: External
      external:
        metric:
          name: queue_messages_ready
          selector: "queue=worker_tasks"
        target:
          type: AverageValue
          averageValue: 30
  • 1.6版本支持多metrics的擴縮容,當其中一個metrics達到擴容標準時就會建立pod副本(當前副本<maxReplicas)

注:target的value的一個單位能夠劃分爲1000份,每一份以m爲單位,如500m表示1/2個單位。參見Quantity

kubernetes HPA的算法以下:

desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]

當使用targetAverageValuetargetAverageUtilization時,currentMetricValue會取HPA指定的全部pods的metric的平均值


Kubernetes metrics的獲取

假設註冊的APIService爲custom.metrics.k8s.io/v1beta1,在註冊好APIService後HorizontalPodAutoscaler controller會從以/apis/custom.metrics.k8s.io/v1beta1爲根API的路徑上抓取metrics。metrics的API path能夠分爲namespacednon-namespaced類型的。經過以下方式校驗HPA是否能夠獲取到metrics:

namespaced

  • 獲取指定namespace下指定object類型和名稱的metrics
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/{namespace-name}/{object-type}/{object-name}/{metric-name...}"

如獲取monitor命名空間下名爲grafana的pod的start_time_seconds metric

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/monitor/pods/grafana/start_time_seconds"
  • 獲取指定namespace下全部特定object類型的metrics
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/{namespace-name}/pods/*/{metric-name...}"

如獲取monitor命名空間下名爲全部pod的start_time_seconds metric

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/monitor/pods/*/start_time_seconds"
  • 使用labelSelector能夠選擇帶有特定label的object
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/{namespace-name}/{object-type}/{object-name}/{metric-name...}?labelSelector={label-name}"
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/{namespace-name}/pods/*/{metric-name...}?labelSelector={label-name}"

non-namespaced

non-namespaced和namespaced的相似,主要有node,namespace,PersistentVolume等。non-namespaced訪問有些與custom metrics API描述不一致。

  • 訪問object爲namespace的方式以下以下
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/{namespace-name}/metrics/{metric-name...}"
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/*/metrics/{metric-name...}"
  • 訪問node的方式以下
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/nodes/{node-name}/{metric-name...}"

DEBUG:

  • 使用以下方式查看註冊的APIService發現的全部rules

    kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1

    若是獲取失敗,能夠看下使用oc get apiservice v1beta1.custom.metrics.k8s.io -oyaml查看statusmessage的相關信息

    若是獲取到的resource爲空,則須要校驗deploy中的Prometheus url是否正確,是否有權限等

  • 經過以下方式查看完整的請求過程(--v=8)

    kubectl get --raw 「/apis/custom.metrics.k8s.io/v1beta1/namespaces/{namespace-name}/pods/*/{metric-name...}" --v=8
  • 若是上述過程正確,但獲取到的items爲空

    • 首先保證k8s-prometheus-adapter的參數--metrics-relist-interval設置值大於Prometheus的參數scrape_interval
    • 確保k8s-prometheus-adapter rulesseriesQuery規則能夠抓取到Prometheus的數據
    • 確保k8s-prometheus-adapter rulesmetricsQuery規則能夠抓取到計算出數據,此處須要注意的是,若是使用到了計算某段時間的數據,若是時間設置太短,可能致使沒有數據生成

TIPS:

  • 官方提供了End-to-end walkthrough,但須要採集的metrics中包含podnamespace label,不然在官方默認配置下沒法採集到metrics。

  • Configuration Walkthroughs一步步講解了如何配置adapter config

  • 在goland裏面使用以下參數能夠遠程調試adapter:

    --secure-port=6443 --tls-cert-file=D:\adapter\serving.crt --tls-private-key-file=D:\adapter\serving.key --logtostderr=true --prometheus-url=${prometheus-url} --metrics-relist-interval=70s --v=10 --config=D:\adapter\config.yaml --lister-kubeconfig=D:\adapter\k8s-config.yaml --authorization-kubeconfig=D:\adapter\k8s-config.yaml --authentication-kubeconfig=D:\adapter\k8s-config.yaml


參考:

Kubernetes pod autoscaler using custom metrics

Kubernetes API Aggregation Setup — Nuts & Bolts

Configure the Aggregation Layer

Aggregation

Setup an Extension API Server

OpenShift下的JVM監控

相關文章
相關標籤/搜索