不得不說千萬不要隨意更改版本,我用的1.13的版本,而後學到這一步時,還因yaml文件不一樣,卡住了好久,而後各類google才找到解決辦法html
https://www.linuxea.com/2112.htmlnode
之前是用heapster來收集資源指標才能看,如今heapster要廢棄了。python
從k8s v1.8開始後,引入了新的功能,即把資源指標引入api。 linux
資源指標:metrics-server git
自定義指標: prometheus,k8s-prometheus-adapter github
所以,新一代架構: sql
1) 核心指標流水線:由kubelet、metrics-server以及由API server提供的api組成;cpu累計利用率、內存實時利用率、pod的資源佔用率及容器的磁盤佔用率 docker
2) 監控流水線:用於從系統收集各類指標數據並提供終端用戶、存儲系統以及HPA,他們包含核心指標以及許多非核心指標。非核心指標不能被k8s所解析。 apache
metrics-server是個api server,僅僅收集cpu利用率、內存利用率等。api
[root@master ~]# kubectl api-versions admissionregistration.k8s.io/v1beta1 apiextensions.k8s.io/v1beta1 apiregistration.k8s.io/v1 apiregistration.k8s.io/v1beta1 apps/v1 apps/v1beta1 apps/v1beta2 authentication.k8s.io/v1 authentication.k8s.io/v1beta1 authorization.k8s.io/v1
訪問 https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/metrics-server 獲取yaml文件,但這個裏面的yaml文件更新了。和視頻內的有差異
貼出我修改後的yaml文件,留做備用
[root@master metrics-server]# cat auth-delegator.yaml apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: metrics-server:system:auth-delegator labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: system:auth-delegator subjects: - kind: ServiceAccount name: metrics-server namespace: kube-system
[root@master metrics-server]# cat auth-reader.yaml apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: metrics-server-auth-reader namespace: kube-system labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: extension-apiserver-authentication-reader subjects: - kind: ServiceAccount name: metrics-server namespace: kube-system
[root@master metrics-server]# cat metrics-apiservice.yaml apiVersion: apiregistration.k8s.io/v1beta1 kind: APIService metadata: name: v1beta1.metrics.k8s.io labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile spec: service: name: metrics-server namespace: kube-system group: metrics.k8s.io version: v1beta1 insecureSkipTLSVerify: true groupPriorityMinimum: 100 versionPriority: 100
關鍵是這個文件
[root@master metrics-server]# cat metrics-server-deployment.yaml apiVersion: v1 kind: ServiceAccount metadata: name: metrics-server namespace: kube-system labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile --- apiVersion: v1 kind: ConfigMap metadata: name: metrics-server-config namespace: kube-system labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: EnsureExists data: NannyConfiguration: |- apiVersion: nannyconfig/v1alpha1 kind: NannyConfiguration --- apiVersion: apps/v1 kind: Deployment metadata: name: metrics-server-v0.3.1 namespace: kube-system labels: k8s-app: metrics-server kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile version: v0.3.1 spec: selector: matchLabels: k8s-app: metrics-server version: v0.3.1 template: metadata: name: metrics-server labels: k8s-app: metrics-server version: v0.3.1 annotations: scheduler.alpha.kubernetes.io/critical-pod: '' seccomp.security.alpha.kubernetes.io/pod: 'docker/default' spec: priorityClassName: system-cluster-critical serviceAccountName: metrics-server containers: - name: metrics-server image: mirrorgooglecontainers/metrics-server-amd64:v0.3.1 command: - /metrics-server - --metric-resolution=30s - --kubelet-insecure-tls - --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP # These are needed for GKE, which doesn't support secure communication yet. # Remove these lines for non-GKE clusters, and when GKE supports token-based auth. #- --kubelet-port=10250 #- --deprecated-kubelet-completely-insecure=true ports: - containerPort: 443 name: https protocol: TCP - name: metrics-server-nanny image: mirrorgooglecontainers/addon-resizer:1.8.4 resources: limits: cpu: 100m memory: 300Mi requests: cpu: 5m memory: 50Mi env: - name: MY_POD_NAME valueFrom: fieldRef: fieldPath: metadata.name - name: MY_POD_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace volumeMounts: - name: metrics-server-config-volume mountPath: /etc/config command: - /pod_nanny - --config-dir=/etc/config - --cpu=100m - --extra-cpu=0.5m - --memory=100Mi - --extra-memory=50Mi - --threshold=5 - --deployment=metrics-server-v0.3.1 - --container=metrics-server - --poll-period=300000 - --estimator=exponential # Specifies the smallest cluster (defined in number of nodes) # # resources will be scaled to. - --minClusterSize=10 volumes: - name: metrics-server-config-volume configMap: name: metrics-server-config tolerations: - key: "CriticalAddonsOnly" operator: "Exists"
[root@master metrics-server]# cat metrics-server-service.yaml apiVersion: v1 kind: Service metadata: name: metrics-server namespace: kube-system labels: addonmanager.kubernetes.io/mode: Reconcile kubernetes.io/cluster-service: "true" kubernetes.io/name: "Metrics-server" spec: selector: k8s-app: metrics-server ports: - port: 443 protocol: TCP targetPort: https
[root@master metrics-server]# cat metrics-server-service.yaml apiVersion: v1 kind: Service metadata: name: metrics-server namespace: kube-system labels: addonmanager.kubernetes.io/mode: Reconcile kubernetes.io/cluster-service: "true" kubernetes.io/name: "Metrics-server" spec: selector: k8s-app: metrics-server ports: - port: 443 protocol: TCP targetPort: https [root@master metrics-server]# cat resource-reader.yaml apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: system:metrics-server labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile rules: - apiGroups: - "" resources: - pods - nodes - namespaces - nodes/stats verbs: - get - list - watch - apiGroups: - "extensions" resources: - deployments verbs: - get - list - update - watch --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: system:metrics-server labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: system:metrics-server subjects: - kind: ServiceAccount name: metrics-server namespace: kube-system
若是從github上下載以上文件apply出錯,就用上面的metrics-server-deployment.yaml文件,刪掉從新apply一下就能夠了
[root@master metrics-server]# kubectl apply -f ./
[root@master ~]# kubectl proxy --port=8080
確保metrics-server-v0.3.1-76b796b-4xgvp是running狀態,我當時出現了Error發現是yaml裏面有問題,最後該掉running了,該來該去該到上面的最終版
[root@master metrics-server]# kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE canal-mgbc2 3/3 Running 12 3d23h canal-s4xgb 3/3 Running 23 3d23h canal-z98bc 3/3 Running 15 3d23h coredns-78d4cf999f-5shdq 1/1 Running 0 6m4s coredns-78d4cf999f-xj5pj 1/1 Running 0 5m53s etcd-master 1/1 Running 13 17d kube-apiserver-master 1/1 Running 13 17d kube-controller-manager-master 1/1 Running 19 17d kube-flannel-ds-amd64-8xkfn 1/1 Running 0 <invalid> kube-flannel-ds-amd64-t7jpc 1/1 Running 0 <invalid> kube-flannel-ds-amd64-vlbjz 1/1 Running 0 <invalid> kube-proxy-ggcbf 1/1 Running 11 17d kube-proxy-jxksd 1/1 Running 11 17d kube-proxy-nkkpc 1/1 Running 12 17d kube-scheduler-master 1/1 Running 19 17d kubernetes-dashboard-76479d66bb-zr4dd 1/1 Running 0 <invalid> metrics-server-v0.3.1-76b796b-4xgvp 2/2 Running 0 9s
查看出錯日誌 -c指定容器名,該pod內有兩個容器,metrcis-server只是其中一個,另外一個查詢方法同樣,把名字改掉便可
[root@master metrics-server]# kubectl logs metrics-server-v0.3.1-76b796b-4xgvp -c metrics-server -n kube-system
大體出錯的日誌內容以下幾條;
403 Forbidden", response: "Forbidden (user=system:anonymous, verb=get, resource=nodes, subresource=stats) E0903 1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:<hostname>: unable to fetch metrics from Kubelet <hostname> (<hostname>): Get https://<hostname>:10250/stats/summary/: dial tcp: lookup <hostname> on 10.96.0.10:53: no such host no response from https://10.101.248.96:443: Get https://10.101.248.96:443: Proxy Error ( Connection refused ) E1109 09:54:49.509521 1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:linuxea.node-2.com: unable to fetch metrics from Kubelet linuxea.node-2.com (10.10.240.203): Get https://10.10.240.203:10255/stats/summary/: dial tcp 10.10.240.203:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:linuxea.node-3.com: unable to fetch metrics from Kubelet linuxea.node-3.com (10.10.240.143): Get https://10.10.240.143:10255/stats/summary/: dial tcp 10.10.240.143:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:linuxea.node-4.com: unable to fetch metrics from Kubelet linuxea.node-4.com (10.10.240.142): Get https://10.10.240.142:10255/stats/summary/: dial tcp 10.10.240.142:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:linuxea.master-1.com: unable to fetch metrics from Kubelet linuxea.master-1.com (10.10.240.161): Get https://10.10.240.161:10255/stats/summary/: dial tcp 10.10.240.161:10255: connect: connection refused, unable to fully scrape metrics from source kubelet_summary:linuxea.node-1.com: unable to fetch metrics from Kubelet linuxea.node-1.com (10.10.240.202): Get https://10.10.240.202:10255/stats/summary/: dial tcp 10.10.240.202:10255: connect: connection refused]
當時我按照網上的方法嘗試修改coredns配置,結果搞的日誌出現獲取全部pod都unable,以下,而後又取消掉了修改,刪掉了coredns,讓他本身從新生成了倆新的coredns容器
- --kubelet-insecure-tls
這種方式是禁用tls驗證,通常不建議在生產環境中使用。而且因爲DNS是沒法解析到這些主機名,使用- --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP
進行規避。還有另一種方法,修改coredns,不過,我並不建議這樣作。
參考這篇:https://github.com/kubernetes-incubator/metrics-server/issues/131
metrics-server unable to fetch pdo metrics for pod
以上爲遇到的問題,反正用我上面的yaml絕對保證解決以上全部問題。還有那個flannel改了directrouting以後爲啥每次重啓集羣機器,他就失效呢,我不得不在刪掉flannel而後從新生成,這個問題前面文章寫到了。
此時執行以下命令就都成功了,item裏也有值了
[root@master ~]# curl http://localhost:8080/apis/metrics.k8s.io/v1beta1 { "kind": "APIResourceList", "apiVersion": "v1", "groupVersion": "metrics.k8s.io/v1beta1", "resources": [ { "name": "nodes", "singularName": "", "namespaced": false, "kind": "NodeMetrics", "verbs": [ "get", "list" ] }, { "name": "pods", "singularName": "", "namespaced": true, "kind": "PodMetrics", "verbs": [ "get", "list" ] } ]
[root@master metrics-server]# curl http://localhost:8080/apis/metrics.k8s.io/v1beta1/pods | more % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 14868 0 14868 0 0 1521k 0 --:--:-- --:--:-- --:--:-- 1613k { "kind": "PodMetricsList", "apiVersion": "metrics.k8s.io/v1beta1", "metadata": { "selfLink": "/apis/metrics.k8s.io/v1beta1/pods" }, "items": [ { "metadata": { "name": "pod1", "namespace": "prod", "selfLink": "/apis/metrics.k8s.io/v1beta1/namespaces/prod/pods/pod1", "creationTimestamp": "2019-01-29T02:39:12Z" },
[root@master metrics-server]# kubectl top pods NAME CPU(cores) MEMORY(bytes) filebeat-ds-4llpp 1m 2Mi filebeat-ds-dv49l 1m 5Mi myapp-0 0m 1Mi myapp-1 0m 2Mi myapp-2 0m 1Mi myapp-3 0m 1Mi myapp-4 0m 2Mi
[root@master metrics-server]# kubectl top nodes NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% master 206m 5% 1377Mi 72% node1 88m 8% 534Mi 28% node2 78m 7% 935Mi 49%
你們看到,咱們的metrics已經能夠正常工做了。不過,metrics只能監控cpu和內存,對於其餘指標如用戶自定義的監控指標,metrics就沒法監控到了。這時就須要另一個組件叫prometheus。
prometheus的部署很是麻煩。
node_exporter是agent;
PromQL至關於sql語句來查詢數據;
k8s-prometheus-adapter:prometheus是不能直接解析k8s的指標的,須要藉助k8s-prometheus-adapter轉換成api
kube-state-metrics是用來整合數據的。
下面開始部署。
訪問 https://github.com/ikubernetes/k8s-prom
[root@master pro]# git clone https://github.com/iKubernetes/k8s-prom.git
先建立一個叫prom的名稱空間:
[root@master k8s-prom]# kubectl apply -f namespace.yaml namespace/prom created
部署node_exporter:
[root@master k8s-prom]# cd node_exporter/ [root@master node_exporter]# ls node-exporter-ds.yaml node-exporter-svc.yaml [root@master node_exporter]# kubectl apply -f . daemonset.apps/prometheus-node-exporter created service/prometheus-node-exporter created
[root@master node_exporter]# kubectl get pods -n prom NAME READY STATUS RESTARTS AGE prometheus-node-exporter-dmmjj 1/1 Running 0 7m prometheus-node-exporter-ghz2l 1/1 Running 0 7m prometheus-node-exporter-zt2lw 1/1 Running 0 7m
部署prometheus:
[root@master k8s-prom]# cd prometheus/ [root@master prometheus]# ls prometheus-cfg.yaml prometheus-deploy.yaml prometheus-rbac.yaml prometheus-svc.yaml [root@master prometheus]# kubectl apply -f . configmap/prometheus-config created deployment.apps/prometheus-server created clusterrole.rbac.authorization.k8s.io/prometheus created serviceaccount/prometheus created clusterrolebinding.rbac.authorization.k8s.io/prometheus created service/prometheus created
看prom名稱空間中的全部資源: pod/prometheus-server-76dc8df7b-hw8xc 處於 Pending 狀態,日誌顯示內存不足
[root@master prometheus]# kubectl logs prometheus-server-556b8896d6-dfqkp -n prom Warning FailedScheduling 2m52s (x2 over 2m52s) default-scheduler 0/3 nodes are available: 3 Insufficient memory.
修改prometheus-deploy.yaml,刪掉內存那三行
resources: limits: memory: 2Gi
從新apply
[root@master prometheus]# kubectl apply -f prometheus-deploy.yaml
[root@master prometheus]# kubectl get all -n prom NAME READY STATUS RESTARTS AGE pod/prometheus-node-exporter-dmmjj 1/1 Running 0 10m pod/prometheus-node-exporter-ghz2l 1/1 Running 0 10m pod/prometheus-node-exporter-zt2lw 1/1 Running 0 10m pod/prometheus-server-65f5d59585-6l8m8 1/1 Running 0 55s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/prometheus NodePort 10.111.127.64 <none> 9090:30090/TCP 56s service/prometheus-node-exporter ClusterIP None <none> 9100/TCP 10m NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/prometheus-node-exporter 3 3 3 3 3 <none> 10m NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE deployment.apps/prometheus-server 1 1 1 1 56s NAME DESIRED CURRENT READY AGE replicaset.apps/prometheus-server-65f5d59585 1 1 1 56s
上面咱們看到經過NodePorts的方式,能夠經過宿主機的30090端口,來訪問prometheus容器裏面的應用。
最好掛載個pvc的存儲,要不這些監控數據過一會就沒了。
部署kube-state-metrics,用來整合數據:
[root@master k8s-prom]# cd kube-state-metrics/ [root@master kube-state-metrics]# ls kube-state-metrics-deploy.yaml kube-state-metrics-rbac.yaml kube-state-metrics-svc.yaml [root@master kube-state-metrics]# kubectl apply -f . deployment.apps/kube-state-metrics created serviceaccount/kube-state-metrics created clusterrole.rbac.authorization.k8s.io/kube-state-metrics created clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created service/kube-state-metrics created
[root@master kube-state-metrics]# kubectl get all -n prom NAME READY STATUS RESTARTS AGE pod/kube-state-metrics-58dffdf67d-v9klh 1/1 Running 0 14m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/kube-state-metrics ClusterIP 10.111.41.139 <none> 8080/TCP 14m
部署k8s-prometheus-adapter,這個須要自制證書:
[root@master k8s-prometheus-adapter]# cd /etc/kubernetes/pki/ [root@master pki]# (umask 077; openssl genrsa -out serving.key 2048) Generating RSA private key, 2048 bit long modulus ...........................................................................................+++ ...............+++ e is 65537 (0x10001)
證書請求:
[root@master pki]# openssl req -new -key serving.key -out serving.csr -subj "/CN=serving"
開始簽證:
[root@master pki]# openssl x509 -req -in serving.csr -CA ./ca.crt -CAkey ./ca.key -CAcreateserial -out serving.crt -days 3650 Signature ok subject=/CN=serving Getting CA Private Key
建立加密的配置文件:
[root@master pki]# kubectl create secret generic cm-adapter-serving-certs --from-file=serving.crt=./serving.crt --from-file=serving.key=./serving.key -n prom secret/cm-adapter-serving-certs created
注:cm-adapter-serving-certs是custom-metrics-apiserver-deployment.yaml文件裏面的名字。
[root@master pki]# kubectl get secrets -n prom NAME TYPE DATA AGE cm-adapter-serving-certs Opaque 2 51s default-token-knsbg kubernetes.io/service-account-token 3 4h kube-state-metrics-token-sccdf kubernetes.io/service-account-token 3 3h prometheus-token-nqzbz kubernetes.io/service-account-token 3 3h
部署k8s-prometheus-adapter:
[root@master k8s-prom]# cd k8s-prometheus-adapter/ [root@master k8s-prometheus-adapter]# ls custom-metrics-apiserver-auth-delegator-cluster-role-binding.yaml custom-metrics-apiserver-service.yaml custom-metrics-apiserver-auth-reader-role-binding.yaml custom-metrics-apiservice.yaml custom-metrics-apiserver-deployment.yaml custom-metrics-cluster-role.yaml custom-metrics-apiserver-resource-reader-cluster-role-binding.yaml custom-metrics-resource-reader-cluster-role.yaml custom-metrics-apiserver-service-account.yaml hpa-custom-metrics-cluster-role-binding.yaml
因爲k8s v1.11.2和k8s-prometheus-adapter最新版不兼容,1.13的也不兼容,解決辦法就是訪問https://github.com/DirectXMan12/k8s-prometheus-adapter/tree/master/deploy/manifests下載最新版的custom-metrics-apiserver-deployment.yaml文件,並把裏面的namespace的名字改爲prom;同時還要下載custom-metrics-config-map.yaml文件到本地來,並把裏面的namespace的名字改爲prom。
[root@master k8s-prometheus-adapter]# kubectl apply -f . clusterrolebinding.rbac.authorization.k8s.io/custom-metrics:system:auth-delegator created rolebinding.rbac.authorization.k8s.io/custom-metrics-auth-reader created deployment.apps/custom-metrics-apiserver created clusterrolebinding.rbac.authorization.k8s.io/custom-metrics-resource-reader created serviceaccount/custom-metrics-apiserver created service/custom-metrics-apiserver created apiservice.apiregistration.k8s.io/v1beta1.custom.metrics.k8s.io created clusterrole.rbac.authorization.k8s.io/custom-metrics-server-resources created clusterrole.rbac.authorization.k8s.io/custom-metrics-resource-reader created clusterrolebinding.rbac.authorization.k8s.io/hpa-controller-custom-metrics created
[root@master k8s-prometheus-adapter]# kubectl get all -n prom NAME READY STATUS RESTARTS AGE pod/custom-metrics-apiserver-65f545496-64lsz 1/1 Running 0 6m pod/kube-state-metrics-58dffdf67d-v9klh 1/1 Running 0 4h pod/prometheus-node-exporter-dmmjj 1/1 Running 0 4h pod/prometheus-node-exporter-ghz2l 1/1 Running 0 4h pod/prometheus-node-exporter-zt2lw 1/1 Running 0 4h pod/prometheus-server-65f5d59585-6l8m8 1/1 Running 0 4h NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/custom-metrics-apiserver ClusterIP 10.103.87.246 <none> 443/TCP 36m service/kube-state-metrics ClusterIP 10.111.41.139 <none> 8080/TCP 4h service/prometheus NodePort 10.111.127.64 <none> 9090:30090/TCP 4h service/prometheus-node-exporter ClusterIP None <none> 9100/TCP 4h NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/prometheus-node-exporter 3 3 3 3 3 <none> 4h NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE deployment.apps/custom-metrics-apiserver 1 1 1 1 36m deployment.apps/kube-state-metrics 1 1 1 1 4h deployment.apps/prometheus-server 1 1 1 1 4h NAME DESIRED CURRENT READY AGE replicaset.apps/custom-metrics-apiserver-5f6b4d857d 0 0 0 36m replicaset.apps/custom-metrics-apiserver-65f545496 1 1 1 6m replicaset.apps/custom-metrics-apiserver-86ccf774d5 0 0 0 17m replicaset.apps/kube-state-metrics-58dffdf67d 1 1 1 4h replicaset.apps/prometheus-server-65f5d59585 1 1 1 4h
最終看到prom名稱空間裏面的全部資源都是running狀態了。
[root@master k8s-prometheus-adapter]# kubectl api-versions custom.metrics.k8s.io/v1beta1
能夠看到custom.metrics.k8s.io/v1beta1這個api了。我那沒看到上面這個東西,可是不影響使用
開個代理:
[root@master k8s-prometheus-adapter]# kubectl proxy --port=8080
能夠看到指標數據了:
[root@master pki]# curl http://localhost:8080/apis/custom.metrics.k8s.io/v1beta1/ { "name": "pods/ceph_rocksdb_submit_transaction_sync", "singularName": "", "namespaced": true, "kind": "MetricValueList", "verbs": [ "get" ] }, { "name": "jobs.batch/kube_deployment_created", "singularName": "", "namespaced": true, "kind": "MetricValueList", "verbs": [ "get" ] }, { "name": "jobs.batch/kube_pod_owner", "singularName": "", "namespaced": true, "kind": "MetricValueList", "verbs": [ "get" ] },
下面咱們就能夠愉快的建立HPA了(水平Pod自動伸縮)。
另外,prometheus還能夠和grafana整合。以下步驟。
先下載文件grafana.yaml,訪問https://github.com/kubernetes/heapster/blob/master/deploy/kube-config/influxdb/grafana.yaml
[root@master pro]# wget https://raw.githubusercontent.com/kubernetes-retired/heapster/master/deploy/kube-config/influxdb/grafana.yaml
修改grafana.yaml文件內容:
把namespace: kube-system改爲prom,有兩處; 把env裏面的下面兩個註釋掉: - name: INFLUXDB_HOST value: monitoring-influxdb 在最有一行加個type: NodePort ports: - port: 80 targetPort: 3000 selector: k8s-app: grafana type: NodePort
[root@master pro]# kubectl apply -f grafana.yaml deployment.extensions/monitoring-grafana created service/monitoring-grafana created
[root@master pro]# kubectl get pods -n prom NAME READY STATUS RESTARTS AGE monitoring-grafana-ffb4d59bd-gdbsk 1/1 Running 0 5s
若是還有問題就刪掉上面的那幾個,從新在apply一下
看到grafana這個pod運行起來了。
[root@master pro]# kubectl get svc -n prom NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE monitoring-grafana NodePort 10.106.164.205 <none> 80:32659/TCP 19m
咱們能夠訪問宿主機master ip: http://172.16.1.100:32659
上圖端口號是9090,根據本身svc實際端口去填寫。除了把80 改爲9090.其他不變,爲何是上面的格式,由於他們都處於一個名稱空間內,能夠經過服務名訪問到的。
[root@master pro]# kubectl get svc -n prom
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
custom-metrics-apiserver ClusterIP 10.109.58.249 <none> 443/TCP 52m
kube-state-metrics ClusterIP 10.103.52.45 <none> 8080/TCP 69m
monitoring-grafana NodePort 10.110.240.31 <none> 80:31128/TCP 17m
prometheus NodePort 10.110.19.171 <none> 9090:30090/TCP 145m
prometheus-node-exporter ClusterIP None <none> 9100/TCP 146m
而後,就能從界面上看到相應的數據了。
登陸下面的網站下載個grafana監控k8s-prometheus的模板: https://grafana.com/dashboards/6417
而後再grafana的界面中導入上面下載的模板:
導入模板以後,就能看到監控數據了:
HPA的沒去實際操做,由於之前本身作過了,就不作了,直接複製過來,若有問題本身單獨解決
當pod壓力大了,會根據負載自動擴展Pod個數以均勻壓力。
目前,HPA只支持兩個版本,v1版本只支持核心指標的定義(只能根據cpu利用率的指標進行pod的擴展);
[root@master pro]# kubectl explain hpa.spec.scaleTargetRef scaleTargetRef:表示基於什麼指標來計算pod伸縮的標準
[root@master pro]# kubectl api-versions |grep auto autoscaling/v1 autoscaling/v2beta1
上面看到分別支持hpav1和hpav2。
下面咱們用命令行的方式從新建立一個帶有資源限制的pod myapp:
[root@master ~]# kubectl run myapp --image=ikubernetes/myapp:v1 --replicas=1 --requests='cpu=50m,memory=256Mi' --limits='cpu=50m,memory=256Mi' --labels='app=myapp' --expose --port=80 service/myapp created deployment.apps/myapp created
[root@master ~]# kubectl get pods NAME READY STATUS RESTARTS AGE myapp-6985749785-fcvwn 1/1 Running 0 58s
下面咱們讓myapp 這個pod能自動水平擴展,用kubectl autoscale,其實就是指明HPA控制器的。
[root@master ~]# kubectl autoscale deployment myapp --min=1 --max=8 --cpu-percent=60 horizontalpodautoscaler.autoscaling/myapp autoscaled
--min:表示最小擴展pod的個數
--max:表示最多擴展pod的個數
--cpu-percent:cpu利用率
[root@master ~]# kubectl get hpa NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE myapp Deployment/myapp 0%/60% 1 8 1 4m
[root@master ~]# kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE myapp ClusterIP 10.105.235.197 <none> 80/TCP 19
下面咱們把service改爲NodePort的方式:
[root@master ~]# kubectl patch svc myapp -p '{"spec":{"type": "NodePort"}}' service/myapp patched
[root@master ~]# kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE myapp NodePort 10.105.235.197 <none> 80:31990/TCP 22m
[root@master ~]# yum install httpd-tools #主要是爲了安裝ab壓測工具
[root@master ~]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE myapp-6985749785-fcvwn 1/1 Running 0 25m 10.244.2.84 node2
開始用ab工具壓測
[root@master ~]# ab -c 1000 -n 5000000 http://172.16.1.100:31990/index.html This is ApacheBench, Version 2.3 <$Revision: 1430300 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking 172.16.1.100 (be patient)
多等一會,會看到pods的cpu利用率爲98%,須要擴展爲2個pod了:
[root@master ~]# kubectl describe hpa resource cpu on pods (as a percentage of request): 98% (49m) / 60% Deployment pods: 1 current / 2 desired
[root@master ~]# kubectl top pods NAME CPU(cores) MEMORY(bytes) myapp-6985749785-fcvwn 49m (咱們設置的總cpu是50m) 3Mi
[root@master ~]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE myapp-6985749785-fcvwn 1/1 Running 0 32m 10.244.2.84 node2 myapp-6985749785-sr4qv 1/1 Running 0 2m 10.244.1.105 node1
上面咱們看到已經自動擴展爲2個pod了,再等一會,隨着cpu壓力的上升,還會看到自動擴展爲4個或更多的pod:
[root@master ~]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE myapp-6985749785-2mjrd 1/1 Running 0 1m 10.244.1.107 node1 myapp-6985749785-bgz6p 1/1 Running 0 1m 10.244.1.108 node1 myapp-6985749785-fcvwn 1/1 Running 0 35m 10.244.2.84 node2 myapp-6985749785-sr4qv 1/1 Running 0 5m 10.244.1.105 node1
等壓測一中止,pod個數還會收縮爲正常個數的。
上面咱們用的是hpav1來作的水平pod自動擴展的功能,咱們前面也說過,hpa v1版本只能根據cpu利用率括水平自動擴展pod。
下面咱們介紹一下hpa v2的功能,它能夠根據自定義指標利用率來水平擴展pod。
在使用hpa v2版本前,咱們先把前面建立的hpa v1版本刪除了,以避免和咱們測試的hpa v2版本衝突:
[root@master hpa]# kubectl delete hpa myapp horizontalpodautoscaler.autoscaling "myapp" deleted
好了,下面咱們建立一個hpa v2:
[root@master hpa]# cat hpa-v2-demo.yaml apiVersion: autoscaling/v2beta1 #從這能夠看出是hpa v2版本 kind: HorizontalPodAutoscaler metadata: name: myapp-hpa-v2 spec: scaleTargetRef: #根據什麼指標來作評估壓力 apiVersion: apps/v1 #對誰來作自動擴展 kind: Deployment name: myapp minReplicas: 1 #最少副本數量 maxReplicas: 10 metrics: #表示依據哪些指標來進行評估 - type: Resource #表示基於資源進行評估 resource: name: cpu targetAverageUtilization: 55 #表示pod cpu使用率超過55%,就自動水平擴展pod個數 - type: Resource resource: name: memory #咱們知道hpa v1版本只能根據cpu來進行評估,而到了咱們的hpa v2版本就能夠根據內存來進行評估了 targetAverageValue: 50Mi #表示pod內存使用超過50M,就自動水平擴展pod個數
[root@master hpa]# kubectl apply -f hpa-v2-demo.yaml horizontalpodautoscaler.autoscaling/myapp-hpa-v2 created
[root@master hpa]# kubectl get hpa NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE myapp-hpa-v2 Deployment/myapp 3723264/50Mi, 0%/55% 1 10 1 37s
咱們看到如今只有一個pod
[root@master hpa]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE myapp-6985749785-fcvwn 1/1 Running 0 57m 10.244.2.84 node2
開始壓測:
[root@master ~]# ab -c 100 -n 5000000 http://172.16.1.100:31990/index.html
看hpa v2的檢測狀況:
[root@master hpa]# kubectl describe hpa Metrics: ( current / target ) resource memory on pods: 3756032 / 50Mi resource cpu on pods (as a percentage of request): 82% (41m) / 55% Min replicas: 1 Max replicas: 10 Deployment pods: 1 current / 2 desired
[root@master hpa]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE myapp-6985749785-8frq4 1/1 Running 0 1m 10.244.1.109 node1 myapp-6985749785-fcvwn 1/1 Running 0 1h 10.244.2.84 node2
看到自動擴展出了2個Pod。等壓測一中止,pod個數還會收縮爲正常個數的。
未來咱們不光能夠用hpa v2,根據cpu和內存使用率進行伸縮Pod個數,還能夠根據http併發量等。
好比下面的:
[root@master hpa]# cat hpa-v2-custom.yaml apiVersion: autoscaling/v2beta1 #從這能夠看出是hpa v2版本 kind: HorizontalPodAutoscaler metadata: name: myapp-hpa-v2 spec: scaleTargetRef: #根據什麼指標來作評估壓力 apiVersion: apps/v1 #對誰來作自動擴展 kind: Deployment name: myapp minReplicas: 1 #最少副本數量 maxReplicas: 10 metrics: #表示依據哪些指標來進行評估 - type: Pods #表示基於資源進行評估 pods: metricName: http_requests#自定義的資源指標 targetAverageValue: 800m #m表示個數,表示併發數800
關於併發數的hpa,具體鏡像能夠參考https://hub.docker.com/r/ikubernetes/metrics-app/