上篇:node
在可觀察性裏,指標是最可以從多方面去反映系統運行情況的。由於指標有各類各樣,咱們能夠經過多維數據分析的方式來對系統的各個維度進行一個測量和監控。git
Istio 默認是經過自帶的 Promethuse 和 Grafana 組件來完成指標的收集和展現,可是監控系統這樣的基礎工具,一般在每一個公司的生產環境上都是必備的,因此若是使用 Istio 自帶的組件就重複了。github
所以把現有的監控系統和 Istio 整合在一塊兒是最好的解決方案。因此本小節就演示下用現有的監控系統和 Istio 進行一個指標收集方面的整合。web
首先,咱們須要瞭解 Istio 是怎麼把它的指標暴露出來的。它主要提供瞭如下兩個指標接口:docker
/metrics
:提供 Istio 自身運行情況的指標信息/stats/prometheus
:Envoy 提供的接口,可獲取網絡流量相關的指標咱們能夠請求 /stats/prometheus
接口查看它提供的指標數據:json
$ kubectl exec -it -n demo ${sleep_pod_name} -c sleep -- curl http://httpbin.demo:15090/stats/prometheus
istiod 服務的 /metrics
接口暴露了控制平面的一些指標,咱們能夠經過以下方式獲取到:vim
$ kubectl exec -it -n demo ${sleep_pod_name} -c sleep -- curl http://istiod.istio-system:15014/metrics
支撐動態配置的基礎是 Prometheus 的服務發現機制:後端
kubernetes_sd_config.role
配置項定義了對哪些目標進行指標收集
relabel_configs
配置項定義了過濾機制,用於對暴露出來的接口進行過濾咱們先來搭建一個監控系統,而後與 Istio 進行整合。首先部署 Prometheus ,具體的配置清單內容以下:api
apiVersion: apps/v1 kind: Deployment metadata: name: prometheus namespace: monitoring labels: app: prometheus spec: selector: matchLabels: app: prometheus template: metadata: labels: app: prometheus spec: serviceAccount: appmesh-prometheus serviceAccountName: appmesh-prometheus containers: - image: prom/prometheus:latest name: prometheus command: - "/bin/prometheus" args: - "--config.file=/etc/prometheus/prometheus.yml" - "--storage.tsdb.path=/prometheus" - "--storage.tsdb.retention=24h" - "--web.enable-admin-api" - "--web.enable-lifecycle" ports: - containerPort: 9090 protocol: TCP name: http volumeMounts: - mountPath: /etc/prometheus name: config-volume - mountPath: /prometheus/data name: data-volume resources: requests: cpu: 100m memory: 512Mi limits: cpu: 100m memory: 512Mi securityContext: runAsUser: 0 volumes: - configMap: name: prometheus-config name: config-volume - emptyDir: {} name: data-volume --- apiVersion: v1 kind: Service metadata: name: prometheus namespace: monitoring labels: app: prometheus spec: selector: app: prometheus type: NodePort ports: - name: web port: 9090 targetPort: http --- apiVersion: v1 kind: ServiceAccount metadata: name: appmesh-prometheus namespace: monitoring --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: namespace: monitoring name: appmesh-prometheus rules: - apiGroups: - "" resources: - nodes - nodes/proxy - nodes/metrics - services - endpoints - pods - ingresses - configmaps verbs: - get - list - watch - apiGroups: - "extensions" - "networking.k8s.io" resources: - ingresses/status - ingresses verbs: - get - list - watch - nonResourceURLs: - "/metrics" verbs: - get --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: appmesh-prometheus subjects: - kind: ServiceAccount name: appmesh-prometheus namespace: monitoring roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: appmesh-prometheus
建立 Prometheus 的 ConfigMap:瀏覽器
apiVersion: v1 kind: ConfigMap metadata: name: prometheus-config namespace: monitoring data: prometheus.yml: | global: scrape_interval: 15s scrape_timeout: 15s scrape_configs: - job_name: 'prometheus' static_configs: - targets: ['localhost:9090']
而後部署 Grafana ,配置清單內容以下:
apiVersion: apps/v1 kind: Deployment metadata: labels: app: grafana name: grafana namespace: monitoring spec: replicas: 1 selector: matchLabels: app: grafana template: metadata: labels: app: grafana spec: containers: - name: grafana image: grafana/grafana:latest imagePullPolicy: IfNotPresent ports: - containerPort: 3000 name: grafana env: - name: GRAFANA_PORT value: "3000" - name: GF_AUTH_BASIC_ENABLED value: "false" - name: GF_AUTH_ANONYMOUS_ENABLED value: "true" - name: GF_AUTH_ANONYMOUS_ORG_ROLE value: Admin resources: limits: cpu: 100m memory: 256Mi requests: cpu: 100m memory: 256Mi volumeMounts: - mountPath: /var/lib/grafana name: grafana-storage volumes: - name: grafana-storage emptyDir: {} --- apiVersion: v1 kind: Service metadata: name: grafana namespace: monitoring labels: app: grafana spec: selector: app: grafana type: NodePort ports: - name: http port: 3000 targetPort: 3000 nodePort: 32000 --- apiVersion: v1 kind: ServiceAccount metadata: name: grafana namespace: monitoring
確認都正常啓動了:
[root@m1 ~]# kubectl get all -n monitoring NAME READY STATUS RESTARTS AGE pod/grafana-86f5dc96d-6hsmz 1/1 Running 0 20m pod/prometheus-9dd6bd8bb-wcdrw 1/1 Running 0 2m30s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/grafana NodePort 10.101.215.111 <none> 3000:32000/TCP 20m service/prometheus NodePort 10.101.113.122 <none> 9090:31053/TCP 13m NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/grafana 1/1 1 1 20m deployment.apps/prometheus 1/1 1 1 13m NAME DESIRED CURRENT READY AGE replicaset.apps/grafana-86f5dc96d 1 1 1 20m replicaset.apps/prometheus-9dd6bd8bb 1 1 1 13m [root@m1 ~]#
查看 prometheus 和 grafana 調度在哪臺 work 節點上:
[root@m1 ~]# kubectl get po -l app=grafana -n monitoring -o jsonpath='{.items[0].status.hostIP}' 192.168.243.139 [root@m1 ~]# kubectl get po -l app=prometheus -n monitoring -o jsonpath='{.items[0].status.hostIP}' 192.168.243.139
使用瀏覽器訪問 prometheus,並查看其配置內容是否符合預期,便是否能與 ConfigMap 的內容對應上:
從上圖能夠看到目前 prometheus 是靜態配置,接下來咱們須要將其改成動態配置,修改其 ConfigMap 內容以下:
apiVersion: v1 kind: ConfigMap metadata: name: prometheus-config namespace: monitoring data: prometheus.yml: |- global: scrape_interval: 15s scrape_timeout: 15s scrape_configs: - job_name: 'prometheus' static_configs: - targets: ['localhost:9090'] # 如下是整合Istio的配置 - job_name: envoy-stats honor_timestamps: true metrics_path: /stats/prometheus scheme: http kubernetes_sd_configs: - role: pod relabel_configs: - source_labels: [__meta_kubernetes_pod_container_port_name] separator: ; regex: .*-envoy-prom replacement: $1 action: keep - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port] separator: ; regex: ([^:]+)(?::\d+)?;(\d+) target_label: __address__ replacement: $1:15090 action: replace - separator: ; regex: __meta_kubernetes_pod_label_(.+) replacement: $1 action: labeldrop - source_labels: [__meta_kubernetes_namespace] separator: ; regex: (.*) target_label: namespace replacement: $1 action: replace - source_labels: [__meta_kubernetes_pod_name] separator: ; regex: (.*) target_label: pod_name replacement: $1 action: replace
$ISTIO_HOME/samples/addons/prometheus.yaml
而後經過 patch 命令重建一下 prometheus :
[root@m1 ~]# kubectl patch deployment prometheus -n monitoring -p "{\"spec\":{\"template\":{\"metadata\":{\"labels\":{\"date\":\"`date +'%s'`\"}}}}}" deployment.apps/prometheus patched [root@m1 ~]#
查看配置是否已生效:
此時在 prometheus 上就能夠查詢到 Istio 的指標了:
Grafana 方面則只須要將 Istio 內置的 Dashboard 導出,而後再導入到另外一個 Grafana 便可,比較簡單就不演示了。
在分佈式系統中,應用產生的日誌會分佈在各個節點上,很是不利於查看和管理。因此通常都會採用集中式日誌架構,把日誌數據彙總到一個日誌平臺進行統一的管理,而日誌平臺中最廣爲人知的就是 ELK Stack 了。
主要功能:
接下來咱們安裝 ELK 套件,去收集 Istio Envoy的log數據。首先在集羣中建立一個命名空間:
[root@m1 ~]# kubectl create ns elk namespace/elk created [root@m1 ~]#
而後使用以下配置清單,部署Elastic Search和Kibana:
kind: List apiVersion: v1 items: - apiVersion: apps/v1 kind: Deployment metadata: name: kibana spec: selector: matchLabels: app: kibana replicas: 1 template: metadata: name: kibana labels: app: kibana spec: containers: - image: docker.elastic.co/kibana/kibana:6.4.0 name: kibana env: - name: ELASTICSEARCH_URL value: "http://elasticsearch:9200" ports: - name: http containerPort: 5601 - apiVersion: v1 kind: Service metadata: name: kibana spec: type: NodePort ports: - name: http port: 5601 targetPort: 5601 nodePort: 32001 selector: app: kibana - apiVersion: apps/v1 kind: Deployment metadata: name: elasticsearch spec: selector: matchLabels: app: elasticsearch replicas: 1 template: metadata: name: elasticsearch labels: app: elasticsearch spec: initContainers: - name: init-sysctl image: busybox command: - sysctl - -w - vm.max_map_count=262144 securityContext: privileged: true containers: - image: docker.elastic.co/elasticsearch/elasticsearch:6.4.0 name: elasticsearch env: - name: network.host value: "_site_" - name: node.name value: "${HOSTNAME}" - name: discovery.zen.ping.unicast.hosts value: "${ELASTICSEARCH_NODEPORT_SERVICE_HOST}" - name: cluster.name value: "test-single" - name: ES_JAVA_OPTS value: "-Xms128m -Xmx128m" volumeMounts: - name: es-data mountPath: /usr/share/elasticsearch/data volumes: - name: es-data emptyDir: {} - apiVersion: v1 kind: Service metadata: name: elasticsearch-nodeport spec: type: NodePort ports: - name: http port: 9200 targetPort: 9200 nodePort: 32002 - name: tcp port: 9300 targetPort: 9300 nodePort: 32003 selector: app: elasticsearch - apiVersion: v1 kind: Service metadata: name: elasticsearch spec: clusterIP: None ports: - name: http port: 9200 - name: tcp port: 9300 selector: app: elasticsearch
指定命名空間進行部署:
[root@m1 ~]# kubectl apply -f elk/deploy.yaml -n elk deployment.apps/kibana created service/kibana created deployment.apps/elasticsearch created service/elasticsearch-nodeport created service/elasticsearch created [root@m1 ~]#
以上只是部署了elasticsearch和kibana,但想要對 Envoy 的日誌進行收集,咱們還須要部署Logstash或FileBeats,這裏以FileBeats做爲示例,配置清單內容以下:
kind: List apiVersion: v1 items: - apiVersion: v1 kind: ConfigMap metadata: name: filebeat-config labels: k8s-app: filebeat kubernetes.io/cluster-service: "true" app: filebeat-config data: filebeat.yml: | processors: - add_cloud_metadata: filebeat.modules: - module: system filebeat.inputs: - type: log paths: - /var/log/containers/*.log symlinks: true output.elasticsearch: hosts: ['elasticsearch:9200'] logging.level: info - apiVersion: apps/v1 kind: Deployment metadata: name: filebeat labels: k8s-app: filebeat kubernetes.io/cluster-service: "true" spec: selector: matchLabels: app: filebeat replicas: 1 template: metadata: name: filebeat labels: app: filebeat k8s-app: filebeat kubernetes.io/cluster-service: "true" spec: containers: - image: docker.elastic.co/beats/filebeat:6.4.0 name: filebeat args: [ "-c", "/home/filebeat-config/filebeat.yml", "-e", ] securityContext: runAsUser: 0 volumeMounts: - name: filebeat-storage mountPath: /var/log/containers - name: varlogpods mountPath: /var/log/pods - name: varlibdockercontainers mountPath: /var/lib/docker/containers - name: "filebeat-volume" mountPath: "/home/filebeat-config" volumes: - name: filebeat-storage hostPath: path: /var/log/containers - name: varlogpods hostPath: path: /var/log/pods - name: varlibdockercontainers hostPath: path: /var/lib/docker/containers - name: filebeat-volume configMap: name: filebeat-config - apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: filebeat subjects: - kind: ServiceAccount name: filebeat namespace: elk roleRef: kind: ClusterRole name: filebeat apiGroup: rbac.authorization.k8s.io - apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: filebeat labels: k8s-app: filebeat rules: - apiGroups: [""] # "" indicates the core API group resources: - namespaces - pods verbs: - get - watch - list - apiVersion: v1 kind: ServiceAccount metadata: name: filebeat namespace: elk labels: k8s-app: filebeat
確認全部組件都已經部署成功:
[root@m1 ~]# kubectl get all -n elk NAME READY STATUS RESTARTS AGE pod/elasticsearch-697c88cd76-xvn4j 1/1 Running 0 4m53s pod/filebeat-8646b847b7-f58zg 1/1 Running 0 32s pod/kibana-fc98677d7-9z5dl 1/1 Running 0 8m14s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/elasticsearch ClusterIP None <none> 9200/TCP,9300/TCP 8m14s service/elasticsearch-nodeport NodePort 10.96.106.229 <none> 9200:32002/TCP,9300:32003/TCP 8m14s service/kibana NodePort 10.105.91.140 <none> 5601:32001/TCP 8m14s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/elasticsearch 1/1 1 1 8m14s deployment.apps/filebeat 1/1 1 1 32s deployment.apps/kibana 1/1 1 1 8m14s NAME DESIRED CURRENT READY AGE replicaset.apps/elasticsearch-697c88cd76 1 1 1 4m53s replicaset.apps/filebeat-8646b847b7 1 1 1 32s replicaset.apps/kibana-fc98677d7 1 1 1 8m14s [root@m1 ~]#
到 Kibana 上建立一個簡單的 Index Pattern:
建立完成:
而後在 Discover 頁面就能夠查看到FileBeat收集並存儲在Elastic Search中的日誌數據了:
基於 Envoy 實現的分佈式追蹤的流程以下:
接下來咱們利用 Operator 安裝 Jaeger,以此演示 Istio 如何與現存的分佈式追蹤系統進行集成。咱們先簡單瞭解 一下 Operator:
首先克隆 jaeger-operator 的倉庫:
[root@m1 ~]# cd /usr/local/src [root@m1 /usr/local/src]# git clone https://github.com/jaegertracing/jaeger-operator.git
修改配置文件,將 WATCH_NAMESPACE
的 value
設置爲空,讓其可以追蹤全部命名空間下的請求:
[root@m1 /usr/local/src]# vim jaeger-operator/deploy/operator.yaml ... env: - name: WATCH_NAMESPACE value: ...
建立 jaeger 的 crd:
[root@m1 /usr/local/src]# kubectl apply -f jaeger-operator/deploy/crds/jaegertracing.io_jaegers_crd.yaml customresourcedefinition.apiextensions.k8s.io/jaegers.jaegertracing.io created [root@m1 /usr/local/src]#
而後建立一個命名空間,並將 jaeger 的其餘資源建立在該命名空間下:
$ kubectl create ns observability $ kubectl apply -f jaeger-operator/deploy/role.yaml -n observability $ kubectl apply -f jaeger-operator/deploy/role_binding.yaml -n observability $ kubectl apply -f jaeger-operator/deploy/service_account.yaml -n observability $ kubectl apply -f jaeger-operator/deploy/cluster_role.yaml -n observability $ kubectl apply -f jaeger-operator/deploy/cluster_role_binding.yaml -n observability $ kubectl apply -f jaeger-operator/deploy/operator.yaml -n observability
確認 operator 已正常啓動:
[root@m1 /usr/local/src]# kubectl get all -n observability NAME READY STATUS RESTARTS AGE pod/jaeger-operator-7f76698d98-x9wkh 1/1 Running 0 105s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/jaeger-operator-metrics ClusterIP 10.100.189.227 <none> 8383/TCP,8686/TCP 11s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/jaeger-operator 1/1 1 1 105s NAME DESIRED CURRENT READY AGE replicaset.apps/jaeger-operator-7f76698d98 1 1 1 105s [root@m1 /usr/local/src]#
安裝 jaegers 這個自定義資源,operator 會經過該自定義資源自動幫咱們部署 jaeger:
[root@m1 /usr/local/src]# kubectl apply -f jaeger-operator/examples/simplest.yaml -n observability jaeger.jaegertracing.io/simplest created [root@m1 /usr/local/src]# kubectl get jaegers -n observability NAME STATUS VERSION STRATEGY STORAGE AGE simplest Running 1.21.0 allinone memory 3m8s [root@m1 /usr/local/src]#
將 Jaeger 部署好後,接下來咱們就是將其與 Istio 進行集成。集成很簡單,只須要經過 istioctl
工具設置一些配置變量便可,命令以下:
[root@m1 ~]# istioctl install --set profile=demo -y \ --set values.global.tracer.zipkin.address=simplest-collector.observability:9411 \ --set values.pilot.traceSampling=100
profile
值須要設置爲安裝 Istio 時所設置的值,不然會按默認值從新安裝 IstioJaeger 集成 Istio 完成後,還剩最後一步,咱們須要經過注入的方式來把 Jaeger 的 agent 注入到咱們的應用中。Jaeger Operator支持自動注入,咱們只須要在 Annotation 裏增長一個注入的標誌便可。
以前咱們也提到過了 Istio 的 tracing 對應用並非徹底透明的,咱們須要本身在應用中去對 trace header 進行處理。因此爲了測試方便,咱們就使用官方提供的 Bookinfo 應用做爲演示。部署 Bookinfo :
[root@m1 ~]# kubectl apply -f /usr/local/istio-1.8.1/samples/bookinfo/platform/kube/bookinfo.yaml [root@m1 ~]# kubectl apply -f /usr/local/istio-1.8.1/samples/bookinfo/networking/bookinfo-gateway.yaml
Jaeger 支持針對命名空間或 Deployment 進行注入,以 product 這個 Deployment 爲例,咱們只須要在其 Annotation 中添加一行 sidecar.jaegertracing.io/inject: "true"
便可:
[root@m1 ~]# kubectl edit deployments.apps/productpage-v1 apiVersion: apps/v1 kind: Deployment metadata: annotations: sidecar.jaegertracing.io/inject: "true" ...
而後經過 patch 命令重建一下 productpage:
[root@m1 ~]# kubectl patch deployment productpage-v1 -p "{\"spec\":{\"template\":{\"metadata\":{\"labels\":{\"date\":\"`date +'%s'`\"}}}}}" deployment.apps/productpage-v1 patched [root@m1 ~]#
此時能夠看到 productpage Pod 裏的容器數量增長到3個了,說明 Jaeger 已經將 agent 注入進去了:
[root@m1 ~]# kubectl get pods NAME READY STATUS RESTARTS AGE productpage-v1-5c75dcd69f-g9sjh 3/3 Running 0 96s ...
使用以下命令打開 Jaeger 的 Web UI 訪問端口:
[root@m1 ~]# kubectl port-forward svc/simplest-query -n observability 16686:16686 --address 192.168.243.138 Forwarding from 192.168.243.138:16686 -> 16686
在頁面上能夠看到 Jaeger 已經能探測到 productpage 服務了:
Istio 常見的調試方式主要有如下幾種:
咱們可使用 --help
參數查看 istioctl
命令的幫助信息:
$ istioctl --help
istioctl verify-install
:可用於驗證當前的k8s集羣環境是否能夠部署 Istioistioctl install [flags]
:用於在當前集羣安裝 Istio 環境istioctl profile [list / diff / dump]
:操做 Istio 的 profileistioctl kube-inject
:用於對Pod注入Envoy sidecaristioctl dashboard [command]
:啓動指定的 Istio Dashboard Web UI
controlz / envoy / Grafana / jaeger / kiali / Prometheus / zipkin
istioctl ps <pod-name>
:查看網格配置同步狀態。有以下幾種狀態:
istioctl pc [cluster/route/…] <pod-name.namespace>
:獲取指定資源的網格配置詳情istioctl x( experimental )describe pod <pod-name>
:
示例:
[root@m1 ~]# istioctl x describe pod productpage-v1-65576bb7bf-4bwwr Pod: productpage-v1-65576bb7bf-4bwwr Pod Ports: 9080 (productpage), 15090 (istio-proxy) -------------------- Service: productpage Port: http 9080/HTTP targets pod port 9080 Exposed on Ingress Gateway http://192.168.243.140 VirtualService: bookinfo /productpage, /static*, /login, /logout, /api/v1/products* [root@m1 ~]#
istioctl analyze [–n <namespace> / --all-namespaces]
:檢查指定命名空間下的網格配置狀況,若是有問題會提示相應的警告或錯誤信息istioctl analyze a.yaml b.yaml my-app-config/
:針對單個配置文件或某個目錄下的全部配置文件進行檢查istioctl analyze --use-kube=false a.yaml
:以忽略部署平臺的方式去檢查指定的配置文件controlZ 是針對控制平面的可視化自檢工具,其主要功能以下:
使用方式以下:
istioctl d controlz <istiod-podname> -n istio-system
Envoy admin API 能夠查看和操做數據平面,其主要功能以下:
使用以下命令打開指定Pod的Envoy admin API:
istioctl d envoy <pod-name>.[namespace] --address ${ip}
或經過以下方式開放其端口:
kubectl port-forward <pod-name> 15000:15000 ${ip}
其頁面以下:
Pilot debug 接口的主要功能以下:
使用以下命令開放其端口:
kubectl port-forward service/istiod -n istio-system 15014:15014 --address ${ip}
其頁面以下: