k8s系列---資源指標API及自定義指標API

不得不說千萬不要隨意更改版本,我用的1.13的版本,而後學到這一步時,還因yaml文件不一樣,卡住了好久,而後各類google才找到解決辦法html

 https://www.linuxea.com/2112.htmlnode

之前是用heapster來收集資源指標才能看,如今heapster要廢棄了。python

    從k8s v1.8開始後,引入了新的功能,即把資源指標引入api。 linux

    資源指標:metrics-server git

    自定義指標: prometheus,k8s-prometheus-adapter github

    所以,新一代架構: sql

    1) 核心指標流水線:由kubelet、metrics-server以及由API server提供的api組成;cpu累計利用率、內存實時利用率、pod的資源佔用率及容器的磁盤佔用率 docker

    2) 監控流水線:用於從系統收集各類指標數據並提供終端用戶、存儲系統以及HPA,他們包含核心指標以及許多非核心指標。非核心指標不能被k8s所解析。 apache

    metrics-server是個api server,僅僅收集cpu利用率、內存利用率等。api

[root@master ~]# kubectl api-versions
admissionregistration.k8s.io/v1beta1
apiextensions.k8s.io/v1beta1
apiregistration.k8s.io/v1
apiregistration.k8s.io/v1beta1
apps/v1
apps/v1beta1
apps/v1beta2
authentication.k8s.io/v1
authentication.k8s.io/v1beta1
authorization.k8s.io/v1

  

 訪問 https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/metrics-server  獲取yaml文件,但這個裏面的yaml文件更新了。和視頻內的有差異

貼出我修改後的yaml文件,留做備用

[root@master metrics-server]# cat auth-delegator.yaml 
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: metrics-server:system:auth-delegator
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:auth-delegator
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: kube-system
cat auth-delegator.yaml
[root@master metrics-server]# cat auth-reader.yaml 
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: metrics-server-auth-reader
  namespace: kube-system
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: kube-system
auth-reader.yaml
[root@master metrics-server]# cat metrics-apiservice.yaml 
apiVersion: apiregistration.k8s.io/v1beta1
kind: APIService
metadata:
  name: v1beta1.metrics.k8s.io
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
spec:
  service:
    name: metrics-server
    namespace: kube-system
  group: metrics.k8s.io
  version: v1beta1
  insecureSkipTLSVerify: true
  groupPriorityMinimum: 100
  versionPriority: 100
metrics-apiservice.yaml

關鍵是這個文件

[root@master metrics-server]# cat metrics-server-deployment.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: metrics-server
  namespace: kube-system
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: metrics-server-config
  namespace: kube-system
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: EnsureExists
data:
  NannyConfiguration: |-
    apiVersion: nannyconfig/v1alpha1
    kind: NannyConfiguration
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: metrics-server-v0.3.1
  namespace: kube-system
  labels:
    k8s-app: metrics-server
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
    version: v0.3.1
spec:
  selector:
    matchLabels:
      k8s-app: metrics-server
      version: v0.3.1
  template:
    metadata:
      name: metrics-server
      labels:
        k8s-app: metrics-server
        version: v0.3.1
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ''
        seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
    spec:
      priorityClassName: system-cluster-critical
      serviceAccountName: metrics-server
      containers:
      - name: metrics-server
        image: mirrorgooglecontainers/metrics-server-amd64:v0.3.1
        command:
        - /metrics-server
        - --metric-resolution=30s
        - --kubelet-insecure-tls
        - --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP
        # These are needed for GKE, which doesn't support secure communication yet.
        # Remove these lines for non-GKE clusters, and when GKE supports token-based auth.
        #- --kubelet-port=10250
        #- --deprecated-kubelet-completely-insecure=true

        ports:
        - containerPort: 443
          name: https
          protocol: TCP
      - name: metrics-server-nanny
        image: mirrorgooglecontainers/addon-resizer:1.8.4
        resources:
          limits:
            cpu: 100m
            memory: 300Mi
          requests:
            cpu: 5m
            memory: 50Mi
        env:
          - name: MY_POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: MY_POD_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
        volumeMounts:
        - name: metrics-server-config-volume
          mountPath: /etc/config
        command:
          - /pod_nanny
          - --config-dir=/etc/config
          - --cpu=100m
          - --extra-cpu=0.5m
          - --memory=100Mi
          - --extra-memory=50Mi
          - --threshold=5
          - --deployment=metrics-server-v0.3.1
          - --container=metrics-server
          - --poll-period=300000
          - --estimator=exponential
          # Specifies the smallest cluster (defined in number of nodes)
          #           # resources will be scaled to.
          - --minClusterSize=10

      volumes:
        - name: metrics-server-config-volume
          configMap:
            name: metrics-server-config
      tolerations:
        - key: "CriticalAddonsOnly"
          operator: "Exists"
metrics-server-deployment.yaml
[root@master metrics-server]# cat metrics-server-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: metrics-server
  namespace: kube-system
  labels:
    addonmanager.kubernetes.io/mode: Reconcile
    kubernetes.io/cluster-service: "true"
    kubernetes.io/name: "Metrics-server"
spec:
  selector:
    k8s-app: metrics-server
  ports:
  - port: 443
    protocol: TCP
    targetPort: https
metrics-server-service.yaml
[root@master metrics-server]# cat metrics-server-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: metrics-server
  namespace: kube-system
  labels:
    addonmanager.kubernetes.io/mode: Reconcile
    kubernetes.io/cluster-service: "true"
    kubernetes.io/name: "Metrics-server"
spec:
  selector:
    k8s-app: metrics-server
  ports:
  - port: 443
    protocol: TCP
    targetPort: https
[root@master metrics-server]# cat resource-reader.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: system:metrics-server
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
rules:
- apiGroups:
  - ""
  resources:
  - pods
  - nodes
  - namespaces
  - nodes/stats
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - "extensions"
  resources:
  - deployments
  verbs:
  - get
  - list
  - update
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: system:metrics-server
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:metrics-server
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: kube-system
metrics-server-service.yaml

若是從github上下載以上文件apply出錯,就用上面的metrics-server-deployment.yaml文件,刪掉從新apply一下就能夠了

[root@master metrics-server]# kubectl apply -f ./

  

[root@master ~]#  kubectl proxy --port=8080

  

確保metrics-server-v0.3.1-76b796b-4xgvp是running狀態,我當時出現了Error發現是yaml裏面有問題,最後該掉running了,該來該去該到上面的最終版

[root@master metrics-server]# kubectl get pods -n kube-system
NAME                                    READY   STATUS    RESTARTS   AGE
canal-mgbc2                             3/3     Running   12         3d23h
canal-s4xgb                             3/3     Running   23         3d23h
canal-z98bc                             3/3     Running   15         3d23h
coredns-78d4cf999f-5shdq                1/1     Running   0          6m4s
coredns-78d4cf999f-xj5pj                1/1     Running   0          5m53s
etcd-master                             1/1     Running   13         17d
kube-apiserver-master                   1/1     Running   13         17d
kube-controller-manager-master          1/1     Running   19         17d
kube-flannel-ds-amd64-8xkfn             1/1     Running   0          <invalid>
kube-flannel-ds-amd64-t7jpc             1/1     Running   0          <invalid>
kube-flannel-ds-amd64-vlbjz             1/1     Running   0          <invalid>
kube-proxy-ggcbf                        1/1     Running   11         17d
kube-proxy-jxksd                        1/1     Running   11         17d
kube-proxy-nkkpc                        1/1     Running   12         17d
kube-scheduler-master                   1/1     Running   19         17d
kubernetes-dashboard-76479d66bb-zr4dd   1/1     Running   0          <invalid>
metrics-server-v0.3.1-76b796b-4xgvp     2/2     Running   0          9s

  

查看出錯日誌 -c指定容器名,該pod內有兩個容器,metrcis-server只是其中一個,另外一個查詢方法同樣,把名字改掉便可

[root@master metrics-server]# kubectl logs metrics-server-v0.3.1-76b796b-4xgvp   -c metrics-server -n kube-system

  

大體出錯的日誌內容以下幾條;

403 Forbidden", response: "Forbidden (user=system:anonymous, verb=get, resource=nodes, subresource=stats)

E0903  1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:<hostname>: unable to fetch metrics from Kubelet <hostname> (<hostname>): Get https://<hostname>:10250/stats/summary/: dial tcp: lookup <hostname> on 10.96.0.10:53: no such host


no response from https://10.101.248.96:443: Get https://10.101.248.96:443: Proxy Error ( Connection refused )


E1109 09:54:49.509521       1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:linuxea.node-2.com: unable to fetch metrics from Kubelet linuxea.node-2.com (10.10.240.203): Get https://10.10.240.203:10255/stats/summary/: dial tcp 10.10.240.203:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:linuxea.node-3.com: unable to fetch metrics from Kubelet linuxea.node-3.com (10.10.240.143): Get https://10.10.240.143:10255/stats/summary/: dial tcp 10.10.240.143:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:linuxea.node-4.com: unable to fetch metrics from Kubelet linuxea.node-4.com (10.10.240.142): Get https://10.10.240.142:10255/stats/summary/: dial tcp 10.10.240.142:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:linuxea.master-1.com: unable to fetch metrics from Kubelet linuxea.master-1.com (10.10.240.161): Get https://10.10.240.161:10255/stats/summary/: dial tcp 10.10.240.161:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:linuxea.node-1.com: unable to fetch metrics from Kubelet linuxea.node-1.com (10.10.240.202): Get https://10.10.240.202:10255/stats/summary/: dial tcp 10.10.240.202:10255: connect: connection refused]

  

當時我按照網上的方法嘗試修改coredns配置,結果搞的日誌出現獲取全部pod都unable,以下,而後又取消掉了修改,刪掉了coredns,讓他本身從新生成了倆新的coredns容器

- --kubelet-insecure-tls這種方式是禁用tls驗證,通常不建議在生產環境中使用。而且因爲DNS是沒法解析到這些主機名,使用- --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP進行規避。還有另一種方法,修改coredns,不過,我並不建議這樣作。

參考這篇:https://github.com/kubernetes-incubator/metrics-server/issues/131

metrics-server unable to fetch pdo metrics for pod

  

以上爲遇到的問題,反正用我上面的yaml絕對保證解決以上全部問題。還有那個flannel改了directrouting以後爲啥每次重啓集羣機器,他就失效呢,我不得不在刪掉flannel而後從新生成,這個問題前面文章寫到了。

此時執行以下命令就都成功了,item裏也有值了

[root@master ~]# curl http://localhost:8080/apis/metrics.k8s.io/v1beta1
{
  "kind": "APIResourceList",
  "apiVersion": "v1",
  "groupVersion": "metrics.k8s.io/v1beta1",
  "resources": [
    {
      "name": "nodes",
      "singularName": "",
      "namespaced": false,
      "kind": "NodeMetrics",
      "verbs": [
        "get",
        "list"
      ]
    },
    {
      "name": "pods",
      "singularName": "",
      "namespaced": true,
      "kind": "PodMetrics",
      "verbs": [
        "get",
        "list"
      ]
    }
  ]

  

[root@master metrics-server]# curl http://localhost:8080/apis/metrics.k8s.io/v1beta1/pods | more
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 14868    0 14868    0     0  1521k      0 --:--:-- --:--:-- --:--:-- 1613k
{
  "kind": "PodMetricsList",
  "apiVersion": "metrics.k8s.io/v1beta1",
  "metadata": {
    "selfLink": "/apis/metrics.k8s.io/v1beta1/pods"
  },
  "items": [
    {
      "metadata": {
        "name": "pod1",
        "namespace": "prod",
        "selfLink": "/apis/metrics.k8s.io/v1beta1/namespaces/prod/pods/pod1",
        "creationTimestamp": "2019-01-29T02:39:12Z"
      },

  

[root@master metrics-server]# kubectl top pods
NAME                CPU(cores)   MEMORY(bytes)   
filebeat-ds-4llpp   1m           2Mi             
filebeat-ds-dv49l   1m           5Mi             
myapp-0             0m           1Mi             
myapp-1             0m           2Mi             
myapp-2             0m           1Mi             
myapp-3             0m           1Mi             
myapp-4             0m           2Mi    

  

[root@master metrics-server]# kubectl top nodes
NAME     CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
master   206m         5%     1377Mi          72%       
node1    88m          8%     534Mi           28%       
node2    78m          7%     935Mi           49% 

  

 

自定義指標(prometheus)

    你們看到,咱們的metrics已經能夠正常工做了。不過,metrics只能監控cpu和內存,對於其餘指標如用戶自定義的監控指標,metrics就沒法監控到了。這時就須要另一個組件叫prometheus。

    prometheus的部署很是麻煩。

    node_exporter是agent;

    PromQL至關於sql語句來查詢數據; 

    k8s-prometheus-adapter:prometheus是不能直接解析k8s的指標的,須要藉助k8s-prometheus-adapter轉換成api

    kube-state-metrics是用來整合數據的。

    下面開始部署。

    訪問 https://github.com/ikubernetes/k8s-prom

 

[root@master pro]# git clone https://github.com/iKubernetes/k8s-prom.git

  

先建立一個叫prom的名稱空間: 

[root@master k8s-prom]# kubectl apply -f namespace.yaml 
namespace/prom created

  

 部署node_exporter: 

 

[root@master k8s-prom]# cd node_exporter/
[root@master node_exporter]# ls
node-exporter-ds.yaml  node-exporter-svc.yaml
[root@master node_exporter]# kubectl apply -f .
daemonset.apps/prometheus-node-exporter created
service/prometheus-node-exporter created

  

[root@master node_exporter]# kubectl get pods -n prom
NAME                             READY     STATUS    RESTARTS   AGE
prometheus-node-exporter-dmmjj   1/1       Running   0          7m
prometheus-node-exporter-ghz2l   1/1       Running   0          7m
prometheus-node-exporter-zt2lw   1/1       Running   0          7m

  

    部署prometheus: 

 

[root@master k8s-prom]# cd prometheus/
[root@master prometheus]# ls
prometheus-cfg.yaml  prometheus-deploy.yaml  prometheus-rbac.yaml  prometheus-svc.yaml
[root@master prometheus]# kubectl apply -f .
configmap/prometheus-config created
deployment.apps/prometheus-server created
clusterrole.rbac.authorization.k8s.io/prometheus created
serviceaccount/prometheus created
clusterrolebinding.rbac.authorization.k8s.io/prometheus created
service/prometheus created

  

看prom名稱空間中的全部資源: pod/prometheus-server-76dc8df7b-hw8xc  處於 Pending   狀態,日誌顯示內存不足

 [root@master prometheus]# kubectl logs prometheus-server-556b8896d6-dfqkp -n prom  
Warning  FailedScheduling  2m52s (x2 over 2m52s)  default-scheduler  0/3 nodes are available: 3 Insufficient memory.

  

修改prometheus-deploy.yaml,刪掉內存那三行

        resources:
          limits:
            memory: 2Gi

  

從新apply

[root@master prometheus]# kubectl apply -f prometheus-deploy.yaml

  

[root@master prometheus]# kubectl get all -n prom
NAME                                     READY     STATUS    RESTARTS   AGE
pod/prometheus-node-exporter-dmmjj       1/1       Running   0          10m
pod/prometheus-node-exporter-ghz2l       1/1       Running   0          10m
pod/prometheus-node-exporter-zt2lw       1/1       Running   0          10m
pod/prometheus-server-65f5d59585-6l8m8   1/1       Running   0          55s
NAME                               TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
service/prometheus                 NodePort    10.111.127.64   <none>        9090:30090/TCP   56s
service/prometheus-node-exporter   ClusterIP   None            <none>        9100/TCP         10m
NAME                                      DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/prometheus-node-exporter   3         3         3         3            3           <none>          10m
NAME                                DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/prometheus-server   1         1         1            1           56s
NAME                                           DESIRED   CURRENT   READY     AGE
replicaset.apps/prometheus-server-65f5d59585   1         1         1         56s

  

上面咱們看到經過NodePorts的方式,能夠經過宿主機的30090端口,來訪問prometheus容器裏面的應用。 

 

    最好掛載個pvc的存儲,要不這些監控數據過一會就沒了。 

    部署kube-state-metrics,用來整合數據:  

[root@master k8s-prom]# cd kube-state-metrics/
[root@master kube-state-metrics]# ls
kube-state-metrics-deploy.yaml  kube-state-metrics-rbac.yaml  kube-state-metrics-svc.yaml
[root@master kube-state-metrics]# kubectl apply -f .
deployment.apps/kube-state-metrics created
serviceaccount/kube-state-metrics created
clusterrole.rbac.authorization.k8s.io/kube-state-metrics created
clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created
service/kube-state-metrics created

  

[root@master kube-state-metrics]# kubectl get all -n prom
NAME                                      READY     STATUS    RESTARTS   AGE
pod/kube-state-metrics-58dffdf67d-v9klh   1/1       Running   0          14m
NAME                               TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
service/kube-state-metrics         ClusterIP   10.111.41.139   <none>        8080/TCP         14m

  

部署k8s-prometheus-adapter,這個須要自制證書:

[root@master k8s-prometheus-adapter]# cd /etc/kubernetes/pki/
[root@master pki]# (umask 077; openssl genrsa -out serving.key 2048)
Generating RSA private key, 2048 bit long modulus
...........................................................................................+++
...............+++
e is 65537 (0x10001)

  

    證書請求: 

[root@master pki]#  openssl req -new -key serving.key -out serving.csr -subj "/CN=serving"

  

    開始簽證: 

 

[root@master pki]# openssl  x509 -req -in serving.csr -CA ./ca.crt -CAkey ./ca.key -CAcreateserial -out serving.crt -days 3650
Signature ok
subject=/CN=serving
Getting CA Private Key

  

    建立加密的配置文件: 

 

[root@master pki]# kubectl create secret generic cm-adapter-serving-certs --from-file=serving.crt=./serving.crt --from-file=serving.key=./serving.key  -n prom
secret/cm-adapter-serving-certs created

  

    注:cm-adapter-serving-certs是custom-metrics-apiserver-deployment.yaml文件裏面的名字。

 

[root@master pki]# kubectl get secrets -n prom
NAME                             TYPE                                  DATA      AGE
cm-adapter-serving-certs         Opaque                                2         51s
default-token-knsbg              kubernetes.io/service-account-token   3         4h
kube-state-metrics-token-sccdf   kubernetes.io/service-account-token   3         3h
prometheus-token-nqzbz           kubernetes.io/service-account-token   3         3h

  

  部署k8s-prometheus-adapter:

[root@master k8s-prom]# cd k8s-prometheus-adapter/
[root@master k8s-prometheus-adapter]# ls
custom-metrics-apiserver-auth-delegator-cluster-role-binding.yaml   custom-metrics-apiserver-service.yaml
custom-metrics-apiserver-auth-reader-role-binding.yaml              custom-metrics-apiservice.yaml
custom-metrics-apiserver-deployment.yaml                            custom-metrics-cluster-role.yaml
custom-metrics-apiserver-resource-reader-cluster-role-binding.yaml  custom-metrics-resource-reader-cluster-role.yaml
custom-metrics-apiserver-service-account.yaml                       hpa-custom-metrics-cluster-role-binding.yaml

  

 因爲k8s v1.11.2和k8s-prometheus-adapter最新版不兼容,1.13的也不兼容,解決辦法就是訪問https://github.com/DirectXMan12/k8s-prometheus-adapter/tree/master/deploy/manifests下載最新版的custom-metrics-apiserver-deployment.yaml文件,並把裏面的namespace的名字改爲prom;同時還要下載custom-metrics-config-map.yaml文件到本地來,並把裏面的namespace的名字改爲prom。

[root@master k8s-prometheus-adapter]# kubectl apply -f .
clusterrolebinding.rbac.authorization.k8s.io/custom-metrics:system:auth-delegator created
rolebinding.rbac.authorization.k8s.io/custom-metrics-auth-reader created
deployment.apps/custom-metrics-apiserver created
clusterrolebinding.rbac.authorization.k8s.io/custom-metrics-resource-reader created
serviceaccount/custom-metrics-apiserver created
service/custom-metrics-apiserver created
apiservice.apiregistration.k8s.io/v1beta1.custom.metrics.k8s.io created
clusterrole.rbac.authorization.k8s.io/custom-metrics-server-resources created
clusterrole.rbac.authorization.k8s.io/custom-metrics-resource-reader created
clusterrolebinding.rbac.authorization.k8s.io/hpa-controller-custom-metrics created

  

[root@master k8s-prometheus-adapter]# kubectl get all -n prom
NAME                                           READY     STATUS    RESTARTS   AGE
pod/custom-metrics-apiserver-65f545496-64lsz   1/1       Running   0          6m
pod/kube-state-metrics-58dffdf67d-v9klh        1/1       Running   0          4h
pod/prometheus-node-exporter-dmmjj             1/1       Running   0          4h
pod/prometheus-node-exporter-ghz2l             1/1       Running   0          4h
pod/prometheus-node-exporter-zt2lw             1/1       Running   0          4h
pod/prometheus-server-65f5d59585-6l8m8         1/1       Running   0          4h
NAME                               TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
service/custom-metrics-apiserver   ClusterIP   10.103.87.246   <none>        443/TCP          36m
service/kube-state-metrics         ClusterIP   10.111.41.139   <none>        8080/TCP         4h
service/prometheus                 NodePort    10.111.127.64   <none>        9090:30090/TCP   4h
service/prometheus-node-exporter   ClusterIP   None            <none>        9100/TCP         4h
NAME                                      DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/prometheus-node-exporter   3         3         3         3            3           <none>          4h
NAME                                       DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/custom-metrics-apiserver   1         1         1            1           36m
deployment.apps/kube-state-metrics         1         1         1            1           4h
deployment.apps/prometheus-server          1         1         1            1           4h
NAME                                                  DESIRED   CURRENT   READY     AGE
replicaset.apps/custom-metrics-apiserver-5f6b4d857d   0         0         0         36m
replicaset.apps/custom-metrics-apiserver-65f545496    1         1         1         6m
replicaset.apps/custom-metrics-apiserver-86ccf774d5   0         0         0         17m
replicaset.apps/kube-state-metrics-58dffdf67d         1         1         1         4h
replicaset.apps/prometheus-server-65f5d59585          1         1         1         4h

  

 

  最終看到prom名稱空間裏面的全部資源都是running狀態了。 

 

[root@master k8s-prometheus-adapter]# kubectl api-versions
custom.metrics.k8s.io/v1beta1

  

  能夠看到custom.metrics.k8s.io/v1beta1這個api了。我那沒看到上面這個東西,可是不影響使用

  開個代理: 

[root@master k8s-prometheus-adapter]# kubectl proxy --port=8080

  

     能夠看到指標數據了:

 

[root@master pki]# curl  http://localhost:8080/apis/custom.metrics.k8s.io/v1beta1/
 {
      "name": "pods/ceph_rocksdb_submit_transaction_sync",
      "singularName": "",
      "namespaced": true,
      "kind": "MetricValueList",
      "verbs": [
        "get"
      ]
    },
    {
      "name": "jobs.batch/kube_deployment_created",
      "singularName": "",
      "namespaced": true,
      "kind": "MetricValueList",
      "verbs": [
        "get"
      ]
    },
    {
      "name": "jobs.batch/kube_pod_owner",
      "singularName": "",
      "namespaced": true,
      "kind": "MetricValueList",
      "verbs": [
        "get"
      ]
    },

  

  下面咱們就能夠愉快的建立HPA了(水平Pod自動伸縮)。

    另外,prometheus還能夠和grafana整合。以下步驟。

    先下載文件grafana.yaml,訪問https://github.com/kubernetes/heapster/blob/master/deploy/kube-config/influxdb/grafana.yaml

[root@master pro]# wget https://raw.githubusercontent.com/kubernetes-retired/heapster/master/deploy/kube-config/influxdb/grafana.yaml

  

    修改grafana.yaml文件內容:

 

把namespace: kube-system改爲prom,有兩處;
 把env裏面的下面兩個註釋掉:
        - name: INFLUXDB_HOST
          value: monitoring-influxdb
 在最有一行加個type: NodePort
 ports:
  - port: 80
    targetPort: 3000
  selector:
    k8s-app: grafana
  type: NodePort

  

[root@master pro]# kubectl apply -f grafana.yaml 
deployment.extensions/monitoring-grafana created
service/monitoring-grafana created

  

[root@master pro]# kubectl get pods -n prom
NAME                                       READY     STATUS    RESTARTS   AGE
monitoring-grafana-ffb4d59bd-gdbsk         1/1       Running   0          5s

  

若是還有問題就刪掉上面的那幾個,從新在apply一下

    看到grafana這個pod運行起來了。 

 

[root@master pro]# kubectl get svc -n prom
NAME                       TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
monitoring-grafana         NodePort    10.106.164.205   <none>        80:32659/TCP     19m

  

 咱們能夠訪問宿主機master ip: http://172.16.1.100:32659

 

 

 

上圖端口號是9090,根據本身svc實際端口去填寫。除了把80 改爲9090.其他不變,爲何是上面的格式,由於他們都處於一個名稱空間內,能夠經過服務名訪問到的。

[root@master pro]# kubectl get svc -n prom     
NAME                       TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
custom-metrics-apiserver   ClusterIP   10.109.58.249   <none>        443/TCP          52m
kube-state-metrics         ClusterIP   10.103.52.45    <none>        8080/TCP         69m
monitoring-grafana         NodePort    10.110.240.31   <none>        80:31128/TCP     17m
prometheus                 NodePort    10.110.19.171   <none>        9090:30090/TCP   145m
prometheus-node-exporter   ClusterIP   None            <none>        9100/TCP         146m

  

    而後,就能從界面上看到相應的數據了。 

    登陸下面的網站下載個grafana監控k8s-prometheus的模板: https://grafana.com/dashboards/6417

    而後再grafana的界面中導入上面下載的模板: 

    導入模板以後,就能看到監控數據了: 

 

 HPA的沒去實際操做,由於之前本身作過了,就不作了,直接複製過來,若有問題本身單獨解決

HPA(水平pod自動擴展) 

    當pod壓力大了,會根據負載自動擴展Pod個數以均勻壓力。 

    目前,HPA只支持兩個版本,v1版本只支持核心指標的定義(只能根據cpu利用率的指標進行pod的擴展); 

[root@master pro]# kubectl explain hpa.spec.scaleTargetRef
scaleTargetRef:表示基於什麼指標來計算pod伸縮的標準

  

[root@master pro]# kubectl api-versions |grep auto
autoscaling/v1
autoscaling/v2beta1

  

    上面看到分別支持hpav1和hpav2。 

    下面咱們用命令行的方式從新建立一個帶有資源限制的pod myapp: 

[root@master ~]# kubectl run myapp --image=ikubernetes/myapp:v1 --replicas=1 --requests='cpu=50m,memory=256Mi' --limits='cpu=50m,memory=256Mi' --labels='app=myapp' --expose --port=80
service/myapp created
deployment.apps/myapp created

  

[root@master ~]# kubectl get pods
NAME                     READY     STATUS    RESTARTS   AGE
myapp-6985749785-fcvwn   1/1       Running   0          58s

  

    下面咱們讓myapp 這個pod能自動水平擴展,用kubectl autoscale,其實就是指明HPA控制器的。 

 

[root@master ~]# kubectl autoscale deployment myapp --min=1 --max=8 --cpu-percent=60
horizontalpodautoscaler.autoscaling/myapp autoscaled

  

 --min:表示最小擴展pod的個數 

    --max:表示最多擴展pod的個數 

    --cpu-percent:cpu利用率 

[root@master ~]# kubectl get hpa
NAME      REFERENCE          TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
myapp     Deployment/myapp   0%/60%    1         8         1          4m

  

[root@master ~]# kubectl get svc
NAME         TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
myapp        ClusterIP   10.105.235.197   <none>        80/TCP              19

  

    下面咱們把service改爲NodePort的方式:

 

[root@master ~]# kubectl patch svc myapp -p '{"spec":{"type": "NodePort"}}'
service/myapp patched

  

[root@master ~]# kubectl get svc
NAME         TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
myapp        NodePort    10.105.235.197   <none>        80:31990/TCP        22m

  

[root@master ~]# yum install httpd-tools #主要是爲了安裝ab壓測工具

  

[root@master ~]# kubectl get pods -o wide
NAME                     READY     STATUS    RESTARTS   AGE       IP            NODE
myapp-6985749785-fcvwn   1/1       Running   0          25m       10.244.2.84   node2

  

    開始用ab工具壓測 

 

[root@master ~]# ab -c 1000 -n 5000000 http://172.16.1.100:31990/index.html
This is ApacheBench, Version 2.3 <$Revision: 1430300 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 172.16.1.100 (be patient)

  

    多等一會,會看到pods的cpu利用率爲98%,須要擴展爲2個pod了: 

 

[root@master ~]# kubectl describe hpa
resource cpu on pods  (as a percentage of request):  98% (49m) / 60%
Deployment pods:                                       1 current / 2 desired

  

[root@master ~]# kubectl top pods
NAME                     CPU(cores)   MEMORY(bytes)   
myapp-6985749785-fcvwn   49m (咱們設置的總cpu是50m)         3Mi

  

[root@master ~]#  kubectl get pods -o wide
NAME                     READY     STATUS    RESTARTS   AGE       IP             NODE
myapp-6985749785-fcvwn   1/1       Running   0          32m       10.244.2.84    node2
myapp-6985749785-sr4qv   1/1       Running   0          2m        10.244.1.105   node1

  

    上面咱們看到已經自動擴展爲2個pod了,再等一會,隨着cpu壓力的上升,還會看到自動擴展爲4個或更多的pod: 

 

[root@master ~]#  kubectl get pods -o wide
NAME                     READY     STATUS    RESTARTS   AGE       IP             NODE
myapp-6985749785-2mjrd   1/1       Running   0          1m        10.244.1.107   node1
myapp-6985749785-bgz6p   1/1       Running   0          1m        10.244.1.108   node1
myapp-6985749785-fcvwn   1/1       Running   0          35m       10.244.2.84    node2
myapp-6985749785-sr4qv   1/1       Running   0          5m        10.244.1.105   node1

  

    等壓測一中止,pod個數還會收縮爲正常個數的。

    上面咱們用的是hpav1來作的水平pod自動擴展的功能,咱們前面也說過,hpa v1版本只能根據cpu利用率括水平自動擴展pod。 

    下面咱們介紹一下hpa v2的功能,它能夠根據自定義指標利用率來水平擴展pod。 

    在使用hpa v2版本前,咱們先把前面建立的hpa v1版本刪除了,以避免和咱們測試的hpa v2版本衝突: 

[root@master hpa]# kubectl delete hpa myapp
horizontalpodautoscaler.autoscaling "myapp" deleted

  

好了,下面咱們建立一個hpa v2: 

[root@master hpa]# cat hpa-v2-demo.yaml 
apiVersion: autoscaling/v2beta1   #從這能夠看出是hpa v2版本
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa-v2
spec:
  scaleTargetRef: #根據什麼指標來作評估壓力
    apiVersion: apps/v1 #對誰來作自動擴展
    kind: Deployment
    name: myapp
  minReplicas: 1 #最少副本數量
  maxReplicas: 10
  metrics: #表示依據哪些指標來進行評估
  - type: Resource #表示基於資源進行評估
    resource: 
      name: cpu
      targetAverageUtilization: 55 #表示pod cpu使用率超過55%,就自動水平擴展pod個數
  - type: Resource
    resource:
      name: memory #咱們知道hpa v1版本只能根據cpu來進行評估,而到了咱們的hpa v2版本就能夠根據內存來進行評估了
      targetAverageValue: 50Mi #表示pod內存使用超過50M,就自動水平擴展pod個數

  

[root@master hpa]# kubectl apply -f hpa-v2-demo.yaml 
horizontalpodautoscaler.autoscaling/myapp-hpa-v2 created

  

[root@master hpa]# kubectl get hpa
NAME           REFERENCE          TARGETS                MINPODS   MAXPODS   REPLICAS   AGE
myapp-hpa-v2   Deployment/myapp   3723264/50Mi, 0%/55%   1         10        1          37s

  

    咱們看到如今只有一個pod 

 

[root@master hpa]# kubectl get pods -o wide
NAME                     READY     STATUS    RESTARTS   AGE       IP            NODE
myapp-6985749785-fcvwn   1/1       Running   0          57m       10.244.2.84   node2

  

    開始壓測: 

 

[root@master ~]# ab -c 100 -n 5000000 http://172.16.1.100:31990/index.html

  

    看hpa v2的檢測狀況: 

 

[root@master hpa]# kubectl describe hpa
Metrics:                                               ( current / target )
  resource memory on pods:                             3756032 / 50Mi
  resource cpu on pods  (as a percentage of request):  82% (41m) / 55%
Min replicas:                                          1
Max replicas:                                          10
Deployment pods:                                       1 current / 2 desired

  

[root@master hpa]# kubectl get pods -o wide
NAME                     READY     STATUS    RESTARTS   AGE       IP             NODE
myapp-6985749785-8frq4   1/1       Running   0          1m        10.244.1.109   node1
myapp-6985749785-fcvwn   1/1       Running   0          1h        10.244.2.84    node2

  

  看到自動擴展出了2個Pod。等壓測一中止,pod個數還會收縮爲正常個數的。 

    未來咱們不光能夠用hpa v2,根據cpu和內存使用率進行伸縮Pod個數,還能夠根據http併發量等。 

    好比下面的: 

[root@master hpa]# cat hpa-v2-custom.yaml 
apiVersion: autoscaling/v2beta1  #從這能夠看出是hpa v2版本
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa-v2
spec:
  scaleTargetRef: #根據什麼指標來作評估壓力
    apiVersion: apps/v1 #對誰來作自動擴展
    kind: Deployment
    name: myapp
  minReplicas: 1 #最少副本數量
  maxReplicas: 10
  metrics: #表示依據哪些指標來進行評估
  - type: Pods #表示基於資源進行評估
    pods: 
      metricName: http_requests#自定義的資源指標
        targetAverageValue: 800m #m表示個數,表示併發數800

  

關於併發數的hpa,具體鏡像能夠參考https://hub.docker.com/r/ikubernetes/metrics-app/

相關文章
相關標籤/搜索