如下咱們描述如何使用 Elastic 技術棧來爲 Kubernetes 構建監控環境。可觀測性的目標是爲生產環境提供運維工具來檢測服務不可用的狀況(好比服務宕機、錯誤或者響應變慢等),而且保留一些能夠排查的信息,以幫助咱們定位問題。總的來講主要包括3個方面:html
本文咱們就將在 Kubernetes 集羣中使用由 ElasticSearch、Kibana、Filebeat、Metricbeat 和 APM-Server 組成的 Elastic 技術棧來監控系統環境。爲了更好地去了解這些組件的配置,咱們這裏將採用手寫資源清單文件的方式來安裝這些組件,固然咱們也可使用 Helm 等其餘工具來快速安裝配置。java
接下來咱們就來學習下如何使用 Elastic 技術構建 Kubernetes 監控棧。咱們這裏的試驗環境是 Kubernetes v1.16.3 版本的集羣(已經配置完成),爲方便管理,咱們將全部的資源對象都部署在一個名爲 elastic 的命名空間中:node
$ kubectl create ns elastic namespace/elastic created
這裏咱們先部署一個使用 SpringBoot 和 MongoDB 開發的示例應用。首先部署一個 MongoDB 應用,對應的資源清單文件以下所示:git
# mongo.yml --- apiVersion: v1 kind: Service metadata: name: mongo namespace: elastic labels: app: mongo spec: ports: - port: 27017 protocol: TCP selector: app: mongo --- apiVersion: apps/v1 kind: StatefulSet metadata: namespace: elastic name: mongo labels: app: mongo spec: serviceName: "mongo" selector: matchLabels: app: mongo template: metadata: labels: app: mongo spec: containers: - name: mongo image: mongo ports: - containerPort: 27017 volumeMounts: - name: data mountPath: /data/db volumeClaimTemplates: - metadata: name: data spec: accessModes: [ "ReadWriteOnce" ] storageClassName: rook-ceph-block # 使用支持 RWO 的 StorageClass resources: requests: storage: 1Gi
這裏咱們使用了一個名爲 rook-ceph-block 的 StorageClass 對象來自動建立 PV,能夠替換成本身集羣中支持 RWO 的 StorageClass 對象便可。存儲採用rook-ceph 實踐配置,直接使用上面的資源清單建立便可:github
$ kubectl apply -f mongo.yml service/mongo created statefulset.apps/mongo created $ kubectl get pods -n elastic -l app=mongo NAME READY STATUS RESTARTS AGE mongo-0 1/1 Running 0 34m
直到 Pod 變成 Running 狀態證實 mongodb 部署成功了。接下來部署 SpringBoot 的 API 應用,這裏咱們經過 NodePort 類型的 Service 服務來暴露該服務,對應的資源清單文件以下所示:web
# spring-boot-simple.yml --- apiVersion: v1 kind: Service metadata: namespace: elastic name: spring-boot-simple labels: app: spring-boot-simple spec: type: NodePort ports: - port: 8080 protocol: TCP selector: app: spring-boot-simple --- apiVersion: apps/v1 kind: Deployment metadata: namespace: elastic name: spring-boot-simple labels: app: spring-boot-simple spec: replicas: 1 selector: matchLabels: app: spring-boot-simple template: metadata: labels: app: spring-boot-simple spec: containers: - image: cnych/spring-boot-simple:0.0.1-SNAPSHOT name: spring-boot-simple env: - name: SPRING_DATA_MONGODB_HOST # 指定MONGODB地址 value: mongo ports: - containerPort: 8080
一樣直接建立上面的應用的應用便可:spring
$ kubectl apply -f spring-boot-simple.yaml service/spring-boot-simple created deployment.apps/spring-boot-simple created $ kubectl get pods -n elastic -l app=spring-boot-simple NAME READY STATUS RESTARTS AGE spring-boot-simple-64795494bf-hqpcj 1/1 Running 0 24m $ kubectl get svc -n elastic -l app=spring-boot-simple NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE spring-boot-simple NodePort 10.109.55.134 <none> 8080:31847/TCP 84s
當應用部署完成後,咱們就能夠經過地址 http://:31847 訪問應用,能夠經過以下命令進行簡單測試:mongodb
$ curl -X GET http://k8s.qikqiak.com:31847/ Greetings from Spring Boot!
發送一個 POST 請求:docker
$ curl -X POST http://k8s.qikqiak.com:31847/message -d 'hello world' {"id":"5ef55c130d53190001bf74d2","message":"hello+world=","postedAt":"2020-06-26T02:23:15.860+0000"}
獲取因此消息數據:shell
$ curl -X GET http://k8s.qikqiak.com:31847/message [{"id":"5ef55c130d53190001bf74d2","message":"hello+world=","postedAt":"2020-06-26T02:23:15.860+0000"}]
要創建一個 Elastic 技術的監控棧,固然首先咱們須要部署 ElasticSearch,它是用來存儲全部的指標、日誌和追蹤的數據庫,這裏咱們經過3個不一樣角色的可擴展的節點組成一個集羣。
設置集羣的第一個節點爲 Master 主節點,來負責控制整個集羣。首先建立一個 ConfigMap 對象,用來描述集羣的一些配置信息,以方便將 ElasticSearch 的主節點配置到集羣中並開啓安全認證功能。對應的資源清單文件以下所示:
# elasticsearch-master.configmap.yaml --- apiVersion: v1 kind: ConfigMap metadata: namespace: elastic name: elasticsearch-master-config labels: app: elasticsearch role: master data: elasticsearch.yml: |- cluster.name: ${CLUSTER_NAME} node.name: ${NODE_NAME} discovery.seed_hosts: ${NODE_LIST} cluster.initial_master_nodes: ${MASTER_NODES} network.host: 0.0.0.0 node: master: true data: false ingest: false xpack.security.enabled: true xpack.monitoring.collection.enabled: true ---
而後建立一個 Service 對象,在 Master 節點下,咱們只須要經過用於集羣通訊的 9300 端口進行通訊。資源清單文件以下所示:
# elasticsearch-master.service.yaml --- apiVersion: v1 kind: Service metadata: namespace: elastic name: elasticsearch-master labels: app: elasticsearch role: master spec: ports: - port: 9300 name: transport selector: app: elasticsearch role: master ---
最後使用一個 Deployment 對象來定義 Master 節點應用,資源清單文件以下所示:
# elasticsearch-master.deployment.yaml --- apiVersion: apps/v1 kind: Deployment metadata: namespace: elastic name: elasticsearch-master labels: app: elasticsearch role: master spec: replicas: 1 selector: matchLabels: app: elasticsearch role: master template: metadata: labels: app: elasticsearch role: master spec: containers: - name: elasticsearch-master image: docker.elastic.co/elasticsearch/elasticsearch:7.8.0 env: - name: CLUSTER_NAME value: elasticsearch - name: NODE_NAME value: elasticsearch-master - name: NODE_LIST value: elasticsearch-master,elasticsearch-data,elasticsearch-client - name: MASTER_NODES value: elasticsearch-master - name: "ES_JAVA_OPTS" value: "-Xms512m -Xmx512m" ports: - containerPort: 9300 name: transport volumeMounts: - name: config mountPath: /usr/share/elasticsearch/config/elasticsearch.yml readOnly: true subPath: elasticsearch.yml - name: storage mountPath: /data volumes: - name: config configMap: name: elasticsearch-master-config - name: "storage" emptyDir: medium: "" ---
直接建立上面的3個資源對象便可:
$ kubectl apply -f elasticsearch-master.configmap.yaml \ -f elasticsearch-master.service.yaml \ -f elasticsearch-master.deployment.yaml configmap/elasticsearch-master-config created service/elasticsearch-master created deployment.apps/elasticsearch-master created $ kubectl get pods -n elastic -l app=elasticsearch NAME READY STATUS RESTARTS AGE elasticsearch-master-6f666cbbd-r9vtx 1/1 Running 0 111m
直到 Pod 變成 Running 狀態就代表 master 節點安裝成功。
如今咱們須要安裝的是集羣的數據節點,它主要來負責集羣的數據託管和執行查詢。 和 master 節點同樣,咱們使用一個 ConfigMap 對象來配置咱們的數據節點:
# elasticsearch-data.configmap.yaml --- apiVersion: v1 kind: ConfigMap metadata: namespace: elastic name: elasticsearch-data-config labels: app: elasticsearch role: data data: elasticsearch.yml: |- cluster.name: ${CLUSTER_NAME} node.name: ${NODE_NAME} discovery.seed_hosts: ${NODE_LIST} cluster.initial_master_nodes: ${MASTER_NODES} network.host: 0.0.0.0 node: master: false data: true ingest: false xpack.security.enabled: true xpack.monitoring.collection.enabled: true ---
能夠看到和上面的 master 配置很是相似,不過須要注意的是屬性 node.data=true。
一樣只須要經過 9300 端口和其餘節點進行通訊:
# elasticsearch-data.service.yaml --- apiVersion: v1 kind: Service metadata: namespace: elastic name: elasticsearch-data labels: app: elasticsearch role: data spec: ports: - port: 9300 name: transport selector: app: elasticsearch role: data ---
最後建立一個 StatefulSet 的控制器,由於可能會有多個數據節點,每個節點的數據不是同樣的,須要單獨存儲,因此也使用了一個 volumeClaimTemplates 來分別建立存儲卷,對應的資源清單文件以下所示:
# elasticsearch-data.statefulset.yaml --- apiVersion: apps/v1 kind: StatefulSet metadata: namespace: elastic name: elasticsearch-data labels: app: elasticsearch role: data spec: serviceName: "elasticsearch-data" selector: matchLabels: app: elasticsearch role: data template: metadata: labels: app: elasticsearch role: data spec: containers: - name: elasticsearch-data image: docker.elastic.co/elasticsearch/elasticsearch:7.8.0 env: - name: CLUSTER_NAME value: elasticsearch - name: NODE_NAME value: elasticsearch-data - name: NODE_LIST value: elasticsearch-master,elasticsearch-data,elasticsearch-client - name: MASTER_NODES value: elasticsearch-master - name: "ES_JAVA_OPTS" value: "-Xms1024m -Xmx1024m" ports: - containerPort: 9300 name: transport volumeMounts: - name: config mountPath: /usr/share/elasticsearch/config/elasticsearch.yml readOnly: true subPath: elasticsearch.yml - name: elasticsearch-data-persistent-storage mountPath: /data/db volumes: - name: config configMap: name: elasticsearch-data-config volumeClaimTemplates: - metadata: name: elasticsearch-data-persistent-storage spec: accessModes: [ "ReadWriteOnce" ] storageClassName: rook-ceph-block resources: requests: storage: 50Gi ---
直接建立上面的資源對象便可:
$ kubectl apply -f elasticsearch-data.configmap.yaml \ -f elasticsearch-data.service.yaml \ -f elasticsearch-data.statefulset.yaml configmap/elasticsearch-data-config created service/elasticsearch-data created statefulset.apps/elasticsearch-data created
直到 Pod 變成 Running 狀態證實節點啓動成功:
$ kubectl get pods -n elastic -l app=elasticsearch NAME READY STATUS RESTARTS AGE elasticsearch-data-0 1/1 Running 0 90m elasticsearch-master-6f666cbbd-r9vtx 1/1 Running 0 111m
最後來安裝配置 ElasticSearch 的客戶端節點,該節點主要負責暴露一個 HTTP 接口將查詢數據傳遞給數據節點獲取數據。
一樣使用一個 ConfigMap 對象來配置該節點:
# elasticsearch-client.configmap.yaml --- apiVersion: v1 kind: ConfigMap metadata: namespace: elastic name: elasticsearch-client-config labels: app: elasticsearch role: client data: elasticsearch.yml: |- cluster.name: ${CLUSTER_NAME} node.name: ${NODE_NAME} discovery.seed_hosts: ${NODE_LIST} cluster.initial_master_nodes: ${MASTER_NODES} network.host: 0.0.0.0 node: master: false data: false ingest: true xpack.security.enabled: true xpack.monitoring.collection.enabled: true ---
客戶端節點須要暴露兩個端口,9300端口用於與集羣的其餘節點進行通訊,9200端口用於 HTTP API。對應的 Service 對象以下所示:
# elasticsearch-client.service.yaml --- apiVersion: v1 kind: Service metadata: namespace: elastic name: elasticsearch-client labels: app: elasticsearch role: client spec: ports: - port: 9200 name: client - port: 9300 name: transport selector: app: elasticsearch role: client ---
使用一個 Deployment 對象來描述客戶端節點:
# elasticsearch-client.deployment.yaml --- apiVersion: apps/v1 kind: Deployment metadata: namespace: elastic name: elasticsearch-client labels: app: elasticsearch role: client spec: selector: matchLabels: app: elasticsearch role: client template: metadata: labels: app: elasticsearch role: client spec: containers: - name: elasticsearch-client image: docker.elastic.co/elasticsearch/elasticsearch:7.8.0 env: - name: CLUSTER_NAME value: elasticsearch - name: NODE_NAME value: elasticsearch-client - name: NODE_LIST value: elasticsearch-master,elasticsearch-data,elasticsearch-client - name: MASTER_NODES value: elasticsearch-master - name: "ES_JAVA_OPTS" value: "-Xms256m -Xmx256m" ports: - containerPort: 9200 name: client - containerPort: 9300 name: transport volumeMounts: - name: config mountPath: /usr/share/elasticsearch/config/elasticsearch.yml readOnly: true subPath: elasticsearch.yml - name: storage mountPath: /data volumes: - name: config configMap: name: elasticsearch-client-config - name: "storage" emptyDir: medium: "" ---
一樣直接建立上面的資源對象來部署 client 節點:
$ kubectl apply -f elasticsearch-client.configmap.yaml \ -f elasticsearch-client.service.yaml \ -f elasticsearch-client.deployment.yaml configmap/elasticsearch-client-config created service/elasticsearch-client created deployment.apps/elasticsearch-client created
直到全部的節點都部署成功後證實集羣安裝成功:
$ kubectl get pods -n elastic -l app=elasticsearch NAME READY STATUS RESTARTS AGE elasticsearch-client-788bffcc98-hh2s8 1/1 Running 0 83m elasticsearch-data-0 1/1 Running 0 91m elasticsearch-master-6f666cbbd-r9vtx 1/1 Running 0 112m
能夠經過以下所示的命令來查看集羣的狀態變化:
$ kubectl logs -f -n elastic \ $(kubectl get pods -n elastic | grep elasticsearch-master | sed -n 1p | awk '{print $1}') \ | grep "Cluster health status changed from" {"type": "server", "timestamp": "2020-06-26T03:31:21,353Z", "level": "INFO", "component": "o.e.c.r.a.AllocationService", "cluster.name": "elasticsearch", "node.name": "elasticsearch-master", "message": "Cluster health status changed from [RED] to [GREEN] (reason: [shards started [[.monitoring-es-7-2020.06.26][0]]]).", "cluster.uuid": "SS_nyhNiTDSCE6gG7z-J4w", "node.id": "BdVScO9oQByBHR5rfw-KDA" }
咱們啓用了 xpack 安全模塊來保護咱們的集羣,因此咱們須要一個初始化的密碼。咱們能夠執行以下所示的命令,在客戶端節點容器內運行 bin/elasticsearch-setup-passwords
命令來生成默認的用戶名和密碼:
$ kubectl exec $(kubectl get pods -n elastic | grep elasticsearch-client | sed -n 1p | awk '{print $1}') \ -n elastic \ -- bin/elasticsearch-setup-passwords auto -b Changed password for user apm_system PASSWORD apm_system = 3Lhx61s6woNLvoL5Bb7t Changed password for user kibana_system PASSWORD kibana_system = NpZv9Cvhq4roFCMzpja3 Changed password for user kibana PASSWORD kibana = NpZv9Cvhq4roFCMzpja3 Changed password for user logstash_system PASSWORD logstash_system = nNnGnwxu08xxbsiRGk2C Changed password for user beats_system PASSWORD beats_system = fen759y5qxyeJmqj6UPp Changed password for user remote_monitoring_user PASSWORD remote_monitoring_user = mCP77zjCATGmbcTFFgOX Changed password for user elastic PASSWORD elastic = wmxhvsJFeti2dSjbQEAH
注意須要將 elastic 用戶名和密碼也添加到 Kubernetes 的 Secret 對象中(後續會進行調用):
$ kubectl create secret generic elasticsearch-pw-elastic \ -n elastic \ --from-literal password=wmxhvsJFeti2dSjbQEAH secret/elasticsearch-pw-elastic created
ElasticSearch 集羣安裝完成後,接着咱們能夠來部署 Kibana,這是 ElasticSearch 的數據可視化工具,它提供了管理 ElasticSearch 集羣和可視化數據的各類功能。
一樣首先咱們使用 ConfigMap 對象來提供一個文件文件,其中包括對 ElasticSearch 的訪問(主機、用戶名和密碼),這些都是經過環境變量配置的。對應的資源清單文件以下所示:
# kibana.configmap.yaml --- apiVersion: v1 kind: ConfigMap metadata: namespace: elastic name: kibana-config labels: app: kibana data: kibana.yml: |- server.host: 0.0.0.0 elasticsearch: hosts: ${ELASTICSEARCH_HOSTS} username: ${ELASTICSEARCH_USER} password: ${ELASTICSEARCH_PASSWORD} ---
而後經過一個 NodePort 類型的服務來暴露 Kibana 服務:
# kibana.service.yaml --- apiVersion: v1 kind: Service metadata: namespace: elastic name: kibana labels: app: kibana spec: type: NodePort ports: - port: 5601 name: webinterface selector: app: kibana ---
最後經過 Deployment 來部署 Kibana 服務,因爲須要經過環境變量提供密碼,這裏咱們使用上面建立的 Secret 對象來引用:
# kibana.deployment.yaml --- apiVersion: apps/v1 kind: Deployment metadata: namespace: elastic name: kibana labels: app: kibana spec: selector: matchLabels: app: kibana template: metadata: labels: app: kibana spec: containers: - name: kibana image: docker.elastic.co/kibana/kibana:7.8.0 ports: - containerPort: 5601 name: webinterface env: - name: ELASTICSEARCH_HOSTS value: "http://elasticsearch-client.elastic.svc.cluster.local:9200" - name: ELASTICSEARCH_USER value: "elastic" - name: ELASTICSEARCH_PASSWORD valueFrom: secretKeyRef: # 調用前面建立的secret密碼文件,將密碼賦值成爲變量使用 name: elasticsearch-pw-elastic key: password volumeMounts: - name: config mountPath: /usr/share/kibana/config/kibana.yml readOnly: true subPath: kibana.yml volumes: - name: config configMap: name: kibana-config ---
一樣直接建立上面的資源清單便可部署:
$ kubectl apply -f kibana.configmap.yaml \ -f kibana.service.yaml \ -f kibana.deployment.yaml configmap/kibana-config created service/kibana created deployment.apps/kibana created
部署成功後,能夠經過查看 Pod 的日誌來了解 Kibana 的狀態:
$ kubectl logs -f -n elastic $(kubectl get pods -n elastic | grep kibana | sed -n 1p | awk '{print $1}') \ | grep "Status changed from yellow to green" {"type":"log","@timestamp":"2020-06-26T04:20:38Z","tags":["status","plugin:elasticsearch@7.8.0","info"],"pid":6,"state":"green","message":"Status changed from yellow to green - Ready","prevState":"yellow","prevMsg":"Waiting for Elasticsearch"}
當狀態變成 green
後,咱們就能夠經過 NodePort 端口 30474 去瀏覽器中訪問 Kibana 服務了:
$ kubectl get svc kibana -n elastic NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kibana NodePort 10.101.121.31 <none> 5601:30474/TCP 8m18s
以下圖所示,使用上面咱們建立的 Secret 對象的 elastic 用戶和生成的密碼便可登陸:
登陸成功後會自動跳轉到 Kibana 首頁:
一樣也能夠本身建立一個新的超級用戶,Management → Stack Management → Create User:
使用新的用戶名和密碼,選擇 superuser
這個角色來建立新的用戶:
建立成功後就可使用上面新建的用戶登陸 Kibana,最後還能夠經過 Management → Stack Monitoring 頁面查看整個集羣的健康狀態:
到這裏咱們就安裝成功了 ElasticSearch 與 Kibana,它們將爲咱們來存儲和可視化咱們的應用數據(監控指標、日誌和追蹤)服務。
上面咱們已經安裝配置了 ElasticSearch 的集羣,接下來咱們未來使用 Metricbeat 對 Kubernetes 集羣進行監控。Metricbeat 是一個服務器上的輕量級採集器,用於按期收集主機和服務的監控指標。這也是咱們構建 Kubernetes 全棧監控的第一個部分。
Metribeat 默認採集系統的指標,可是也包含了大量的其餘模塊來採集有關服務的指標,好比 Nginx、Kafka、MySQL、Redis 等等,支持的完整模塊能夠在 Elastic 官方網站上查看到 https://www.elastic.co/guide/en/beats/metricbeat/current/metricbeat-modules.html。
首先,咱們須要安裝 kube-state-metrics,這個組件是一個監聽 Kubernetes API 的服務,能夠暴露每一個資源對象狀態的相關指標數據。
要安裝 kube-state-metrics 也很是簡單,在對應的 GitHub 倉庫下就有對應的安裝資源清單文件:
$ git clone https://github.com/kubernetes/kube-state-metrics.git $ cd kube-state-metrics # 執行安裝命令 $ kubectl apply -f examples/standard/ clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics configured clusterrole.rbac.authorization.k8s.io/kube-state-metrics configured deployment.apps/kube-state-metrics configured serviceaccount/kube-state-metrics configured service/kube-state-metrics configured $ kubectl get pods -n kube-system -l app.kubernetes.io/name=kube-state-metrics NAME READY STATUS RESTARTS AGE kube-state-metrics-6d7449fc78-mgf4f 1/1 Running 0 88s
當 Pod 變成 Running 狀態後證實安裝成功。
因爲咱們須要監控全部的節點,因此咱們須要使用一個 DaemonSet 控制器來安裝 Metricbeat。
首先,使用一個 ConfigMap 來配置 Metricbeat,而後經過 Volume 將該對象掛載到容器中的 /etc/metricbeat.yaml
中去。配置文件中包含了 ElasticSearch 的地址、用戶名和密碼,以及 Kibana 配置,咱們要啓用的模塊與抓取頻率等信息。
# metricbeat.settings.configmap.yml --- apiVersion: v1 kind: ConfigMap metadata: namespace: elastic name: metricbeat-config labels: app: metricbeat data: metricbeat.yml: |- # 模塊配置 metricbeat.modules: - module: system period: ${PERIOD} # 設置一個抓取數據的間隔 metricsets: ["cpu", "load", "memory", "network", "process", "process_summary", "core", "diskio", "socket"] processes: ['.*'] process.include_top_n: by_cpu: 5 # 根據 CPU 計算的前5個進程 by_memory: 5 # 根據內存計算的前5個進程 - module: system period: ${PERIOD} metricsets: ["filesystem", "fsstat"] processors: - drop_event.when.regexp: # 排除一些系統目錄的監控 system.filesystem.mount_point: '^/(sys|cgroup|proc|dev|etc|host|lib)($|/)' - module: docker # 抓取docker應用,可是不支持containerd period: ${PERIOD} hosts: ["unix:///var/run/docker.sock"] metricsets: ["container", "cpu", "diskio", "healthcheck", "info", "memory", "network"] - module: kubernetes # 抓取 kubelet 監控指標 period: ${PERIOD} node: ${NODE_NAME} hosts: ["https://${NODE_NAME}:10250"] # 鏈接kubelet的監控端口,若是須要監控api-server/controller-manager等其餘組件的監控,也須要鏈接端口 metricsets: ["node", "system", "pod", "container", "volume"] bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token ssl.verification_mode: "none" - module: kubernetes # 抓取 kube-state-metrics 數據 period: ${PERIOD} node: ${NODE_NAME} metricsets: ["state_node", "state_deployment", "state_replicaset", "state_pod", "state_container"] hosts: ["kube-state-metrics.kube-system.svc.cluster.local:8080"] # 根據 k8s deployment 配置具體的服務模塊mongo metricbeat.autodiscover: providers: - type: kubernetes node: ${NODE_NAME} templates: - condition.equals: kubernetes.labels.app: mongo config: - module: mongodb period: ${PERIOD} hosts: ["mongo.elastic:27017"] metricsets: ["dbstats", "status", "collstats", "metrics", "replstatus"] # ElasticSearch 鏈接配置 output.elasticsearch: hosts: ['${ELASTICSEARCH_HOST:elasticsearch}:${ELASTICSEARCH_PORT:9200}'] username: ${ELASTICSEARCH_USERNAME} password: ${ELASTICSEARCH_PASSWORD} # 鏈接到 Kibana setup.kibana: host: '${KIBANA_HOST:kibana}:${KIBANA_PORT:5601}' # 導入已經存在的 Dashboard setup.dashboards.enabled: true # 配置 indice 生命週期 setup.ilm: policy_file: /etc/indice-lifecycle.json ---
ElasticSearch 的 indice 生命週期表示一組規則,能夠根據 indice 的大小或者時長應用到你的 indice 上。好比能夠天天或者每次超過 1GB 大小的時候對 indice 進行輪轉,咱們也能夠根據規則配置不一樣的階段。因爲監控會產生大量的數據,頗有可能一天就超過幾十G的數據,因此爲了防止大量的數據存儲,咱們能夠利用 indice 的生命週期來配置數據保留,這個在 Prometheus 中也有相似的操做。 以下所示的文件中,咱們配置成天天或每次超過5GB的時候就對 indice 進行輪轉,並刪除全部超過10天的 indice 文件,咱們這裏只保留10天監控數據徹底足夠了。
# metricbeat.indice-lifecycle.configmap.yml --- apiVersion: v1 kind: ConfigMap metadata: namespace: elastic name: metricbeat-indice-lifecycle labels: app: metricbeat data: indice-lifecycle.json: |- { "policy": { "phases": { "hot": { "actions": { "rollover": { "max_size": "5GB" , "max_age": "1d" } } }, "delete": { "min_age": "10d", "actions": { "delete": {} } } } } } ---
接下來就能夠來編寫 Metricbeat 的 DaemonSet 資源對象清單,以下所示:
# metricbeat.daemonset.yml --- apiVersion: apps/v1 kind: DaemonSet metadata: namespace: elastic name: metricbeat labels: app: metricbeat spec: selector: matchLabels: app: metricbeat template: metadata: labels: app: metricbeat spec: serviceAccountName: metricbeat terminationGracePeriodSeconds: 30 hostNetwork: true dnsPolicy: ClusterFirstWithHostNet containers: - name: metricbeat image: docker.elastic.co/beats/metricbeat:7.8.0 args: [ "-c", "/etc/metricbeat.yml", "-e", "-system.hostfs=/hostfs" ] env: - name: ELASTICSEARCH_HOST value: elasticsearch-client.elastic.svc.cluster.local - name: ELASTICSEARCH_PORT value: "9200" - name: ELASTICSEARCH_USERNAME value: elastic - name: ELASTICSEARCH_PASSWORD valueFrom: secretKeyRef: # 調用前面建立的secret密碼文件 name: elasticsearch-pw-elastic key: password - name: KIBANA_HOST value: kibana.elastic.svc.cluster.local - name: KIBANA_PORT value: "5601" - name: NODE_NAME valueFrom: fieldRef: fieldPath: spec.nodeName - name: PERIOD value: "10s" securityContext: runAsUser: 0 resources: limits: memory: 200Mi requests: cpu: 100m memory: 100Mi volumeMounts: - name: config mountPath: /etc/metricbeat.yml readOnly: true subPath: metricbeat.yml - name: indice-lifecycle mountPath: /etc/indice-lifecycle.json readOnly: true subPath: indice-lifecycle.json - name: dockersock mountPath: /var/run/docker.sock - name: proc mountPath: /hostfs/proc readOnly: true - name: cgroup mountPath: /hostfs/sys/fs/cgroup readOnly: true volumes: - name: proc hostPath: path: /proc - name: cgroup hostPath: path: /sys/fs/cgroup - name: dockersock hostPath: path: /var/run/docker.sock - name: config configMap: defaultMode: 0600 name: metricbeat-config - name: indice-lifecycle configMap: defaultMode: 0600 name: metricbeat-indice-lifecycle - name: data hostPath: path: /var/lib/metricbeat-data type: DirectoryOrCreate ---
須要注意的將上面的兩個 ConfigMap 掛載到容器中去,因爲須要 Metricbeat 獲取宿主機的相關信息,因此咱們這裏也掛載了一些宿主機的文件到容器中去,好比 proc
目錄,cgroup
目錄以及 dockersock
文件。
因爲 Metricbeat 須要去獲取 Kubernetes 集羣的資源對象信息,因此一樣須要對應的 RBAC 權限聲明,因爲是全局做用域的,因此這裏咱們使用 ClusterRole 進行聲明:
# metricbeat.permissions.yml --- apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRoleBinding metadata: name: metricbeat subjects: - kind: ServiceAccount name: metricbeat namespace: elastic roleRef: kind: ClusterRole name: metricbeat apiGroup: rbac.authorization.k8s.io --- apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRole metadata: name: metricbeat labels: app: metricbeat rules: - apiGroups: [""] resources: - nodes - namespaces - events - pods verbs: ["get", "list", "watch"] - apiGroups: ["extensions"] resources: - replicasets verbs: ["get", "list", "watch"] - apiGroups: ["apps"] resources: - statefulsets - deployments - replicasets verbs: ["get", "list", "watch"] - apiGroups: - "" resources: - nodes/stats verbs: - get --- apiVersion: v1 kind: ServiceAccount metadata: namespace: elastic name: metricbeat labels: app: metricbeat ---
直接建立上面的幾個資源對象便可:
$ kubectl apply -f metricbeat.settings.configmap.yml \ -f metricbeat.indice-lifecycle.configmap.yml \ -f metricbeat.daemonset.yml \ -f metricbeat.permissions.yml configmap/metricbeat-config configured configmap/metricbeat-indice-lifecycle configured daemonset.extensions/metricbeat created clusterrolebinding.rbac.authorization.k8s.io/metricbeat created clusterrole.rbac.authorization.k8s.io/metricbeat created serviceaccount/metricbeat created $ kubectl get pods -n elastic -l app=metricbeat NAME READY STATUS RESTARTS AGE metricbeat-2gstq 1/1 Running 0 18m metricbeat-99rdb 1/1 Running 0 18m metricbeat-9bb27 1/1 Running 0 18m metricbeat-cgbrg 1/1 Running 0 18m metricbeat-l2csd 1/1 Running 0 18m metricbeat-lsrgv 1/1 Running 0 18m
當 Metricbeat 的 Pod 變成 Running 狀態後,正常咱們就能夠在 Kibana 中去查看對應的監控信息了。
在 Kibana 左側頁面 Observability → Metrics 進入指標監控頁面,正常就能夠看到一些監控數據了:
也能夠根據本身的需求進行篩選,好比咱們能夠按照 Kubernetes Namespace 進行分組做爲視圖查看監控信息:
因爲咱們在配置文件中設置了屬性 setup.dashboards.enabled=true,因此 Kibana 會導入預先已經存在的一些 Dashboard。咱們能夠在左側菜單進入 Kibana → Dashboard 頁面,咱們會看到一個大約有 50 個 Metricbeat 的 Dashboard 列表,咱們能夠根據須要篩選 Dashboard,好比咱們要查看集羣節點的信息,能夠查看 [Metricbeat Kubernetes] Overview ECS
這個 Dashboard:
咱們還單獨啓用了 mongodb 模塊,咱們可使用 [Metricbeat MongoDB] Overview ECS 這個 Dashboard 來查看監控信息:
咱們還啓用了 docker 這個模塊,也可使用 [Metricbeat Docker] Overview ECS 這個 Dashboard 來查看監控信息:
到這裏咱們就完成了使用 Metricbeat 來監控 Kubernetes 集羣信息,在下面咱們學習如何使用 Filebeat 來收集日誌以監控 Kubernetes 集羣。
咱們將要安裝配置 Filebeat 來收集 Kubernetes 集羣中的日誌數據,而後發送到 ElasticSearch 去中,Filebeat 是一個輕量級的日誌採集代理,還能夠配置特定的模塊來解析和可視化應用(好比數據庫、Nginx 等)的日誌格式。
和 Metricbeat 相似,Filebeat 也須要一個配置文件來設置和 ElasticSearch 的連接信息、和 Kibana 的鏈接已經日誌採集和解析的方式。
以下所示的 ConfigMap 資源對象就是咱們這裏用於日誌採集的配置信息(能夠從官方網站上獲取完整的可配置信息):
# filebeat.settings.configmap.yml --- apiVersion: v1 kind: ConfigMap metadata: namespace: elastic name: filebeat-config labels: app: filebeat data: filebeat.yml: |- filebeat.inputs: - type: container enabled: true paths: - /var/log/containers/*.log processors: - add_kubernetes_metadata: in_cluster: true host: ${NODE_NAME} matchers: - logs_path: logs_path: "/var/log/containers/" filebeat.autodiscover: providers: - type: kubernetes templates: - condition.equals: kubernetes.labels.app: mongo config: - module: mongodb enabled: true log: input: type: docker containers.ids: - ${data.kubernetes.container.id} processors: - drop_event: when.or: - and: - regexp: message: '^\d+\.\d+\.\d+\.\d+ ' - equals: fileset.name: error - and: - not: regexp: message: '^\d+\.\d+\.\d+\.\d+ ' - equals: fileset.name: access - add_cloud_metadata: - add_kubernetes_metadata: matchers: - logs_path: logs_path: "/var/log/containers/" - add_docker_metadata: output.elasticsearch: hosts: ['${ELASTICSEARCH_HOST:elasticsearch}:${ELASTICSEARCH_PORT:9200}'] username: ${ELASTICSEARCH_USERNAME} password: ${ELASTICSEARCH_PASSWORD} setup.kibana: host: '${KIBANA_HOST:kibana}:${KIBANA_PORT:5601}' setup.dashboards.enabled: true setup.template.enabled: true setup.ilm: policy_file: /etc/indice-lifecycle.json ---
咱們配置採集 /var/log/containers/
下面的全部日誌數據,而且使用 inCluster
的模式訪問 Kubernetes 的 APIServer,獲取日誌數據的 Meta 信息,將日誌直接發送到 Elasticsearch。
此外還經過 policy_file
定義了 indice 的回收策略:
# filebeat.indice-lifecycle.configmap.yml --- apiVersion: v1 kind: ConfigMap metadata: namespace: elastic name: filebeat-indice-lifecycle labels: app: filebeat data: indice-lifecycle.json: |- { "policy": { "phases": { "hot": { "actions": { "rollover": { "max_size": "5GB" , "max_age": "1d" } } }, "delete": { "min_age": "30d", "actions": { "delete": {} } } } } } ---
一樣爲了採集每一個節點上的日誌數據,咱們這裏使用一個 DaemonSet 控制器,使用上面的配置來採集節點的日誌。
#filebeat.daemonset.yml --- apiVersion: apps/v1 kind: DaemonSet metadata: namespace: elastic name: filebeat labels: app: filebeat spec: selector: matchLabels: app: filebeat template: metadata: labels: app: filebeat spec: serviceAccountName: filebeat terminationGracePeriodSeconds: 30 containers: - name: filebeat image: docker.elastic.co/beats/filebeat:7.8.0 args: [ "-c", "/etc/filebeat.yml", "-e", ] env: - name: ELASTICSEARCH_HOST value: elasticsearch-client.elastic.svc.cluster.local - name: ELASTICSEARCH_PORT value: "9200" - name: ELASTICSEARCH_USERNAME value: elastic - name: ELASTICSEARCH_PASSWORD valueFrom: secretKeyRef: name: elasticsearch-pw-elastic key: password - name: KIBANA_HOST value: kibana.elastic.svc.cluster.local - name: KIBANA_PORT value: "5601" - name: NODE_NAME valueFrom: fieldRef: fieldPath: spec.nodeName securityContext: runAsUser: 0 resources: limits: memory: 200Mi requests: cpu: 100m memory: 100Mi volumeMounts: - name: config mountPath: /etc/filebeat.yml readOnly: true subPath: filebeat.yml - name: filebeat-indice-lifecycle mountPath: /etc/indice-lifecycle.json readOnly: true subPath: indice-lifecycle.json - name: data mountPath: /usr/share/filebeat/data - name: varlog mountPath: /var/log readOnly: true - name: varlibdockercontainers mountPath: /var/lib/docker/containers readOnly: true - name: dockersock mountPath: /var/run/docker.sock volumes: - name: config configMap: defaultMode: 0600 name: filebeat-config - name: filebeat-indice-lifecycle configMap: defaultMode: 0600 name: filebeat-indice-lifecycle - name: varlog hostPath: path: /var/log - name: varlibdockercontainers hostPath: path: /var/lib/docker/containers - name: dockersock hostPath: path: /var/run/docker.sock - name: data hostPath: path: /var/lib/filebeat-data type: DirectoryOrCreate ---
咱們這裏使用的是 Kubeadm 搭建的集羣,默認 Master 節點是有污點的,因此若是還想採集 Master 節點的日誌,還必須加上對應的容忍,我這裏不採集就沒有添加容忍了。 此外因爲須要獲取日誌在 Kubernetes 集羣中的 Meta 信息,好比 Pod 名稱、所在的命名空間等,因此 Filebeat 須要訪問 APIServer,天然就須要對應的 RBAC 權限了,因此還須要進行權限聲明:
# filebeat.permission.yml --- apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRoleBinding metadata: name: filebeat subjects: - kind: ServiceAccount name: filebeat namespace: elastic roleRef: kind: ClusterRole name: filebeat apiGroup: rbac.authorization.k8s.io --- apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRole metadata: name: filebeat labels: app: filebeat rules: - apiGroups: [""] resources: - namespaces - pods verbs: - get - watch - list --- apiVersion: v1 kind: ServiceAccount metadata: namespace: elastic name: filebeat labels: app: filebeat ---
而後直接安裝部署上面的幾個資源對象便可:
$ kubectl apply -f filebeat.settings.configmap.yml \ -f filebeat.indice-lifecycle.configmap.yml \ -f filebeat.daemonset.yml \ -f filebeat.permissions.yml configmap/filebeat-config created configmap/filebeat-indice-lifecycle created daemonset.apps/filebeat created clusterrolebinding.rbac.authorization.k8s.io/filebeat created clusterrole.rbac.authorization.k8s.io/filebeat created serviceaccount/filebeat created
當全部的 Filebeat 和 Logstash 的 Pod 都變成 Running 狀態後,證實部署完成。如今咱們就能夠進入到 Kibana 頁面中去查看日誌了。左側菜單 Observability → Logs
此外還能夠從上節咱們提到的 Metrics 頁面進入查看 Pod 的日誌:
點擊 Kubernetes Pod logs
獲取須要查看的 Pod 日誌:
若是集羣中要採集的日誌數據量太大,直接將數據發送給 ElasticSearch,對 ES 壓力比較大,這種狀況通常能夠加一個相似於 Kafka 這樣的中間件來緩衝下,或者經過 Logstash 來收集 Filebeat 的日誌。
這裏咱們就完成了使用 Filebeat 採集 Kubernetes 集羣的日誌,在下篇文章中,咱們繼續學習如何使用 Elastic APM 來追蹤 Kubernetes 集羣應用。
Elastic APM 是 Elastic Stack 上用於應用性能監控的工具,它容許咱們經過收集傳入請求、數據庫查詢、緩存調用等方式來實時監控應用性能。這可讓咱們更加輕鬆快速定位性能問題。
Elastic APM 是兼容 OpenTracing 的,因此咱們可使用大量現有的庫來跟蹤應用程序性能。
好比咱們能夠在一個分佈式環境(微服務架構)中跟蹤一個請求,並輕鬆找到可能潛在的性能瓶頸。
Elastic APM 經過一個名爲 APM-Server 的組件提供服務,用於收集並向 ElasticSearch 以及和應用一塊兒運行的 agent 程序發送追蹤數據。
首先咱們須要在 Kubernetes 集羣上安裝 APM-Server 來收集 agent 的追蹤數據,並轉發給 ElasticSearch,這裏一樣咱們使用一個 ConfigMap 來配置:
# apm.configmap.yml --- apiVersion: v1 kind: ConfigMap metadata: namespace: elastic name: apm-server-config labels: app: apm-server data: apm-server.yml: |- apm-server: host: "0.0.0.0:8200" output.elasticsearch: hosts: ['${ELASTICSEARCH_HOST:elasticsearch}:${ELASTICSEARCH_PORT:9200}'] username: ${ELASTICSEARCH_USERNAME} password: ${ELASTICSEARCH_PASSWORD} setup.kibana: host: '${KIBANA_HOST:kibana}:${KIBANA_PORT:5601}' ---
APM-Server 須要暴露 8200 端口來讓 agent 轉發他們的追蹤數據,新建一個對應的 Service 對象便可:
# apm.service.yml --- apiVersion: v1 kind: Service metadata: namespace: elastic name: apm-server labels: app: apm-server spec: ports: - port: 8200 name: apm-server selector: app: apm-server ---
而後使用一個 Deployment 資源對象管理便可:
# apm.deployment.yml --- apiVersion: apps/v1 kind: Deployment metadata: namespace: elastic name: apm-server labels: app: apm-server spec: replicas: 1 selector: matchLabels: app: apm-server template: metadata: labels: app: apm-server spec: containers: - name: apm-server image: docker.elastic.co/apm/apm-server:7.8.0 env: - name: ELASTICSEARCH_HOST value: elasticsearch-client.elastic.svc.cluster.local - name: ELASTICSEARCH_PORT value: "9200" - name: ELASTICSEARCH_USERNAME value: elastic - name: ELASTICSEARCH_PASSWORD valueFrom: secretKeyRef: name: elasticsearch-pw-elastic key: password - name: KIBANA_HOST value: kibana.elastic.svc.cluster.local - name: KIBANA_PORT value: "5601" ports: - containerPort: 8200 name: apm-server volumeMounts: - name: config mountPath: /usr/share/apm-server/apm-server.yml readOnly: true subPath: apm-server.yml volumes: - name: config configMap: name: apm-server-config ---
直接部署上面的幾個資源對象:
$ kubectl apply -f apm.configmap.yml \ -f apm.service.yml \ -f apm.deployment.yml configmap/apm-server-config created service/apm-server created deployment.extensions/apm-server created
當 Pod 處於 Running 狀態證實運行成功:
$ kubectl get pods -n elastic -l app=apm-server NAME READY STATUS RESTARTS AGE apm-server-667bfc5cff-zj8nq 1/1 Running 0 12m
接下來咱們能夠在第一節中部署的 Spring-Boot 應用上安裝一個 agent 應用。
接下來咱們在示例應用程序 spring-boot-simple 上配置一個 Elastic APM Java agent。 首先咱們須要把 elastic-apm-agent-1.8.0.jar 這個 jar 包程序內置到應用容器中去,在構建鏡像的 Dockerfile 文件中添加一行以下所示的命令直接下載該 JAR 包便可:
RUN wget -O /apm-agent.jar https://search.maven.org/remotecontent?filepath=co/elastic/apm/elastic-apm-agent/1.8.0/elastic-apm-agent-1.8.0.jar
完整的 Dockerfile 文件以下所示:
FROM openjdk:8-jdk-alpine ENV ELASTIC_APM_VERSION "1.8.0" RUN wget -O /apm-agent.jar https://search.maven.org/remotecontent?filepath=co/elastic/apm/elastic-apm-agent/$ELASTIC_APM_VERSION/elastic-apm-agent-$ELASTIC_APM_VERSION.jar COPY target/spring-boot-simple.jar /app.jar CMD java -jar /app.jar
而後須要在示例應用中添加上以下依賴關係,這樣咱們就能夠集成 open-tracing 的依賴庫或者使用 Elastic APM API 手動檢測。
<dependency> <groupId>co.elastic.apm</groupId> <artifactId>apm-agent-api</artifactId> <version>${elastic-apm.version}</version> </dependency> <dependency> <groupId>co.elastic.apm</groupId> <artifactId>apm-opentracing</artifactId> <version>${elastic-apm.version}</version> </dependency> <dependency> <groupId>io.opentracing.contrib</groupId> <artifactId>opentracing-spring-cloud-mongo-starter</artifactId> <version>${opentracing-spring-cloud.version}</version> </dependency>
而後須要修改第一篇文章中使用 Deployment 部署的 Spring-Boot 應用,須要開啓 Java agent 而且要鏈接到 APM-Server。
# spring-boot-simple.deployment.yml --- apiVersion: apps/v1 kind: Deployment metadata: namespace: elastic name: spring-boot-simple labels: app: spring-boot-simple spec: selector: matchLabels: app: spring-boot-simple template: metadata: labels: app: spring-boot-simple spec: containers: - image: cnych/spring-boot-simple:0.0.1-SNAPSHOT imagePullPolicy: Always name: spring-boot-simple command: - "java" - "-javaagent:/apm-agent.jar" - "-Delastic.apm.active=$(ELASTIC_APM_ACTIVE)" - "-Delastic.apm.server_urls=$(ELASTIC_APM_SERVER)" - "-Delastic.apm.service_name=spring-boot-simple" - "-jar" - "app.jar" env: - name: SPRING_DATA_MONGODB_HOST value: mongo - name: ELASTIC_APM_ACTIVE value: "true" - name: ELASTIC_APM_SERVER value: http://apm-server.elastic.svc.cluster.local:8200 ports: - containerPort: 8080 ---
而後從新部署上面的示例應用:
$ kubectl apply -f spring-boot-simple.yml $ kubectl get pods -n elastic -l app=spring-boot-simple NAME READY STATUS RESTARTS AGE spring-boot-simple-fb5564885-tf68d 1/1 Running 0 5m11s $ kubectl get svc -n elastic -l app=spring-boot-simple NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE spring-boot-simple NodePort 10.109.55.134 <none> 8080:31847/TCP 9d
當示例應用從新部署完成後,執行以下幾個請求:
get messages
獲取全部發布的 messages 數據:
$ curl -X GET http://k8s.qikqiak.com:31847/message
get messages (慢請求)
使用 sleep=<ms>
來模擬慢請求:
$ curl -X GET http://k8s.qikqiak.com:31847/message?sleep=3000
get messages (error)
使用 error=true 來觸發一異常:
$ curl -X GET http://k8s.qikqiak.com:31847/message?error=true
如今咱們去到 Kibana 頁面中路由到 APM 頁面,咱們應該就能夠看到 spring-boot-simple 應用的數據了。
點擊應用就能夠查看到當前應用的各類性能追蹤數據:
能夠查看如今的錯誤數據:
還能夠查看 JVM 的監控數據:
除此以外,咱們還能夠添加報警信息,就能夠在第一時間掌握應用的性能情況了。
到這裏咱們就完成了使用 Elastic Stack 進行 Kubernetes 環境的全棧監控,經過監控指標、日誌、性能追蹤來了解咱們的應用各方面運行狀況,加快咱們排查和解決各類問題。
關於kibana調取secret密碼時,登入kibana內查看密碼變量發現變量是一個亂碼值,這個目前只在變量掛入kibana容器中發現。
解決辦法:將容器變量調用設置成密碼
es 自動生成索引時,使用索引模板,生成默認tag 過多,能夠經過修改索引模板的方法來進行減小索引創建