Kubernetes Fluentd+Elasticsearch+Kibana統一日誌管理平臺搭建的填坑指南

 

在初步完成Kubernetes集羣架構的創建後,經過搭建一些監控組件,咱們已經可以實現html

  • 圖形化的監控每一個node,pod的狀態信息和資源狀況
  • 經過scale進行replicateSet的擴展和伸縮
  • 經過kubectl logs 或dashboard去查看每一個Pod的運行日誌

可是,在分佈式架構中節點的規模每每是很龐大的,一個典型的生產環境可能有幾十上百個minion節點,在這種狀況下就須要創建一套集中的日誌監控和管理系統,在本人前期的思考中,也想經過volumn外掛到存儲的方式實現weblogic的日誌輸出到共享存儲,但這種方式的問題在於:java

  • 咱們經過WebLogic單域的模式進行Docker的擴展服務, 這就意味着全部的日誌路徑和名字都是一致的,也就是都存放在pod的container節點的統一的路徑下(/u01/oracle/user_projects/domains/base_domain/servers/AdminServer/logs/AdminServer.log),若是經過volumnMount映射到存儲,會存在文件的衝突問題. 
  • 沒法獲取pod和container的信息
  • 沒法獲取集羣中其餘節點的運行信息

所以仍是須要尋找平臺級別的架構方案.在kubernetes的官方文檔中,https://kubernetes.io/docs/concepts/cluster-administration/logging/node

Kubernetes給出了幾種日誌方案,並給出Cluster-level logging的參考架構:linux

也就是說,咱們本身啓動運行的Pod的內部容器進程經過streaming的方式把日誌輸出到minion主機,而後由運行在相同主機的另一個pod,logging-agent-pod把日誌獲取到,同時把日誌傳回Backend, Bankend其實是基於不一樣的實現,好比Elasticsearch-logging,以及展現的kibana平臺。git

 

 Kubernetes建議採用這種結點級別的logging-agent,並提供了其中的兩種,一種用於Google Cloud Platform的Stackdriver Logging,另外一種就是Elasticsearch,兩種都是採用fluentd作爲在結點上運行的Agent(日誌代理)github

Using a node-level logging agent is the most common and encouraged approach for a Kubernetes cluster, because it creates only one agent per node, and it doesn’t require any changes to the applications running on the node. However, node-level logging only works for applications’ standard output and standard error.golang

Kubernetes doesn’t specify a logging agent, but two optional logging agents are packaged with the Kubernetes release: Stackdriver Logging for use with Google Cloud Platform, and Elasticsearch. You can find more information and instructions in the dedicated documents. Both use fluentd with custom configuration as an agent on the node.web

好了,下文便開始咱們的填坑指南docker

1. 準備工做express

  • 先交代一下環境

操做系統: CentOS 7.3

Kubernetes version: 1.5

[root@k8s-master fluentd-elasticsearch]# kubectl version
Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2", GitCommit:"a55267932d501b9fbd6d73e5ded47d79b5763ce5", GitTreeState:"clean", BuildDate:"2017-04-14T13:36:25Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2", GitCommit:"a55267932d501b9fbd6d73e5ded47d79b5763ce5", GitTreeState:"clean", BuildDate:"2017-04-14T13:36:25Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}

  • 將github中kubernetes代碼都下栽到master本地。
git clone https://github.com/kubernetes/kubernetes
  • 配置ServiceAccount,這是由於以後下載的fluentd images須要用到SSL的方式去鏈接API Server,若是不許備修改並生成新的images的話,仍是須要配置好, 配置指南參考
http://www.cnblogs.com/ericnie/p/6894688.html
  • 配置dns,kibana組件須要經過dns找到elasticsearch-logging的Service,若是不配置dns,那就須要修改kibana-controller.yaml中的地址爲固定的service ip,配置指南參考
http://www.cnblogs.com/ericnie/p/6897142.html

 

  • 下載images

進入/root/kubernetes/cluster/addons/fluentd-elasticsearch目錄,看到全部的yaml文件

其中fluentd-es-ds.yaml用於構建運行在每一個結點的fluentd DamonSet負責logging Agent角色,es-controller.yaml和es-service.yaml用於構建elasticsearch logging,負責logging backend的日誌彙總,而kibana-controller和kibana-service用於展現

把幾個conroller.yaml文件中的images下載到各個minion節點

docker pull gcr.io/google_containers/elasticsearch:v2.4.1-2
docker pull gcr.io/google_containers/fluentd-elasticsearch:1.22
docker pull gcr.io/google_containers/kibana:v4.6.1-1

 

2.啓動fluentd DaemonSet

  • 打Label(坑1)

Fluentd會運行在每個minion節點上,經過

# kubectl create -f fluentd-es-ds.yaml 
daemonset "fluentd-es-v1.22" created

而後在minion節點上經過tail -f /var/log/fluentd.log中查看,結果在minion節點上根本沒有fluentd.log文件啊!

筆者經過

kubectl get pods -n kube-system

看了一下,發現根本沒有fluentd相關的Pod在運行或者在pending! :( 

 經過

kubectl get -f fluentd-es-ds.yaml
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE-SELECTOR
AGE fluentd-es-v1.22 0 0 0 0 0 beta.kubernetes.io/fluentd-ds-ready=true 2m

查看一下,發現有個NODE-SELECTOR(正常的以下),beta.kubernetes.io/fluentd-ds-ready=true

 

 再

kubectl describe nodes k8s-node-1

發現個人minion節點根本沒有這個Label,經過命令打label

kubectl label node k8s-node-1 beta.kubernetes.io/fluentd-ds-ready=true

從新create後,就發如今k8s-node-1中已經存在/var/log/fluentd.log文件了.

  •  建立configmap(坑2)
# tail -f /var/log/fluentd.log

2017-03-02 02:27:01 +0000 [info]: reading config file path="/etc/td-agent/td-agent.conf"
2017-03-02 02:27:01 +0000 [info]: starting fluentd-0.12.31
2017-03-02 02:27:01 +0000 [info]: gem 'fluent-mixin-config-placeholders' version '0.4.0'
2017-03-02 02:27:01 +0000 [info]: gem 'fluent-mixin-plaintextformatter' version '0.2.6'
2017-03-02 02:27:01 +0000 [info]: gem 'fluent-plugin-docker_metadata_filter' version '0.1.3'
2017-03-02 02:27:01 +0000 [info]: gem 'fluent-plugin-elasticsearch' version '1.5.0'
2017-03-02 02:27:01 +0000 [info]: gem 'fluent-plugin-kafka' version '0.4.1'
2017-03-02 02:27:01 +0000 [info]: gem 'fluent-plugin-kubernetes_metadata_filter' version '0.24.0'
2017-03-02 02:27:01 +0000 [info]: gem 'fluent-plugin-mongo' version '0.7.16'
2017-03-02 02:27:01 +0000 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '1.5.5'
2017-03-02 02:27:01 +0000 [info]: gem 'fluent-plugin-s3' version '0.8.0'
2017-03-02 02:27:01 +0000 [info]: gem 'fluent-plugin-scribe' version '0.10.14'
2017-03-02 02:27:01 +0000 [info]: gem 'fluent-plugin-td' version '0.10.29'
2017-03-02 02:27:01 +0000 [info]: gem 'fluent-plugin-td-monitoring' version '0.2.2'
2017-03-02 02:27:01 +0000 [info]: gem 'fluent-plugin-webhdfs' version '0.4.2'
2017-03-02 02:27:01 +0000 [info]: gem 'fluentd' version '0.12.31'
2017-03-02 02:27:01 +0000 [info]: adding match pattern="fluent.**" type="null"
2017-03-02 02:27:01 +0000 [info]: adding filter pattern="kubernetes.**" type="kubernetes_metadata"
2017-03-02 02:27:02 +0000 [error]: config error file="/etc/td-agent/td-agent.conf" error="Invalid Kubernetes API v1 endpoint https://192.168.0.105:443/api: 401 Unauthorized"
2017-03-02 02:27:02 +0000 [info]: process finished code=256
2017-03-02 02:27:02 +0000 [warn]: process died within 1 second. exit.

 發現fluentd image是經過443端口去連個人ApiServer的,API Server開啓了安全機制,所以須要配置ca_file、client_cert、client_key等key,若是不想從新作images,Kubernetes提供了ConfigMap這一強大的武器,咱們能夠將新版td-agent.conf製做成kubernetes的configmap資源,並掛載到fluentd pod的相應位置以替換image中默認的td-agent.conf。

td-agent.conf的目錄是

/root/kubernetes/cluster/addons/fluentd-elasticsearch/fluentd-es-image

加入ca,client等後如

// td-agent.conf
... ...
<filter kubernetes.**>
  type kubernetes_metadata
  ca_file /srv/kubernetes/ca.crt
  client_cert /srv/kubernetes/kubecfg.crt
  client_key /srv/kubernetes/kubecfg.key
</filter>
... ...

須要注意的是

  • 在基於td-agent.conf建立configmap資源以前,須要將td-agent.conf中的註釋行都刪掉,建議先備份一份(後來發揮了做用)
  • fluentd pod將建立在kube-system下,所以configmap資源也須要建立在kube-system namespace下面,不然kubectl create沒法找到對應的configmap。
# kubectl create configmap td-agent-config --from-file=./td-agent.conf -n kube-system
configmap "td-agent-config" created

# kubectl get configmaps td-agent-config -o yaml
apiVersion: v1
data:
  td-agent.conf: |
    <match fluent.**>
      type null
    </match>

    <source>
      type tail
      path /var/log/containers/*.log
      pos_file /var/log/es-containers.log.pos
      time_format %Y-%m-%dT%H:%M:%S.%NZ
      tag kubernetes.*
      format json
      read_from_head true
    </source>
... ...

fluentd-es-ds.yaml也要隨之作一些改動,主要是增長兩個mount:

一個是mount 上面的configmap td-agent-config

另一個就是mount hostpath:/srv/kubernetes以獲取到相關client端的數字證書:

[root@k8s-master fluentd-elasticsearch]# cat fluentd-es-ds.yaml 
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: fluentd-es-v1.22
  namespace: kube-system
  labels:
    k8s-app: fluentd-es
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
    version: v1.22
spec:
  template:
    metadata:
      labels:
        k8s-app: fluentd-es
        kubernetes.io/cluster-service: "true"
        version: v1.22
      # This annotation ensures that fluentd does not get evicted if the node
      # supports critical pod annotation based priority scheme.
      # Note that this does not guarantee admission on the nodes (#40573).
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ''
    spec:
      containers:
      - name: fluentd-es
        image: gcr.io/google_containers/fluentd-elasticsearch:1.22
        command:
          - '/bin/sh'
          - '-c'
          - '/usr/sbin/td-agent 2>&1 >> /var/log/fluentd.log'
        resources:
          limits:
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 200Mi
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
        - name: td-agent-config
          mountPath: /etc/td-agent
        - name: tls-files
          mountPath: /srv/kubernetes
      nodeSelector:
        beta.kubernetes.io/fluentd-ds-ready: "true"
      terminationGracePeriodSeconds: 30
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers
      - name: td-agent-config
        configMap:
          name: td-agent-config
      - name: tls-files
        hostPath:
          path: /srv/kubernetes
[root@k8s-master fluentd-elasticsearch]# 

 

再次建立fluent-es-ds.yaml,而後看minion的/var/log/fluentd.log

......
    client_cert /srv/kubernetes/kubecfg.crt
    client_key /srv/kubernetes/kubecfg.key
  </filter>
  <match **>
    type elasticsearch
    log_level info
    include_tag_key true
    host elasticsearch-logging
    port 9200
    logstash_format true
    buffer_chunk_limit 2M
    buffer_queue_limit 32
    flush_interval 5s
    max_retry_wait 30
    disable_retry_limit 
    num_threads 8
  </match>
</ROOT>

 

出現這個基本算是成功了,貌似沒問題吧,其實有個坑,能夠接下來繼續配置elasticsearch log.

 

3.配置elasticsearch

建立elasticsearch,

# kubectl create -f es-controller.yaml
replicationcontroller "elasticsearch-logging-v1" created

# kubectl create -f es-service.yaml
service "elasticsearch-logging" created

get pods:

kube-system                  elasticsearch-logging-v1-3bzt6          1/1       Running    0          7s        172.16.57.8    10.46.181.146
kube-system                  elasticsearch-logging-v1-nvbe1          1/1       Running    0          7s        172.16.99.10   10.47.136.60

 

查看日誌

# kubectl logs -f elasticsearch-logging-v1-3bzt6 -n kube-system
F0302 03:59:41.036697       8 elasticsearch_logging_discovery.go:60] kube-system namespace doesn't exist: the server has asked for the client to provide credentials (get namespaces kube-system)
goroutine 1 [running]:
k8s.io/kubernetes/vendor/github.com/golang/glog.stacks(0x19a8100, 0xc400000000, 0xc2, 0x186)
... ...
main.main()
    elasticsearch_logging_discovery.go:60 +0xb53

[2017-03-02 03:59:42,587][INFO ][node                     ] [elasticsearch-logging-v1-3bzt6] version[2.4.1], pid[16], build[c67dc32/2016-09-27T18:57:55Z]
[2017-03-02 03:59:42,588][INFO ][node                     ] [elasticsearch-logging-v1-3bzt6] initializing ...
[2017-03-02 03:59:44,396][INFO ][plugins                  ] [elasticsearch-logging-v1-3bzt6] modules [reindex, lang-expression, lang-groovy], plugins [], sites []
... ...
[2017-03-02 03:59:44,441][INFO ][env                      ] [elasticsearch-logging-v1-3bzt6] heap size [1007.3mb], compressed ordinary object pointers [true]
[2017-03-02 03:59:48,355][INFO ][node                     ] [elasticsearch-logging-v1-3bzt6] initialized
[2017-03-02 03:59:48,355][INFO ][node                     ] [elasticsearch-logging-v1-3bzt6] starting ...
[2017-03-02 03:59:48,507][INFO ][transport                ] [elasticsearch-logging-v1-3bzt6] publish_address {172.16.57.8:9300}, bound_addresses {[::]:9300}
[2017-03-02 03:59:48,547][INFO ][discovery                ] [elasticsearch-logging-v1-3bzt6] kubernetes-logging/7_f_M2TKRZWOw4NhBc4EqA
[2017-03-02 04:00:18,552][WARN ][discovery                ] [elasticsearch-logging-v1-3bzt6] waited for 30s and no initial state was set by the discovery
[2017-03-02 04:00:18,562][INFO ][http                     ] [elasticsearch-logging-v1-3bzt6] publish_address {172.16.57.8:9200}, bound_addresses {[::]:9200}
[2017-03-02 04:00:18,562][INFO ][node                     ] [elasticsearch-logging-v1-3bzt6] started

發現錯誤,沒法提供安全的credential,經過在網上參考Tony Bai的技術文檔,發現是默認的Service Account的問題,其中原理還須要研究一下。

先run起來再說,解決方案以下:

建立一個新的serviceaccount在kube-system namespace下:

/serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: k8s-efk

# kubectl create -f serviceaccount.yaml -n kube-system
serviceaccount "k8s-efk" created

# kubectl get serviceaccount -n kube-system
NAME      SECRETS   AGE
default   1         139d
k8s-efk   1         17s

 

修改es-controller.yaml使用service account 「k8s-efk」:

[root@k8s-master fluentd-elasticsearch]# cat es-controller.yaml 
apiVersion: v1
kind: ReplicationController
metadata:
  name: elasticsearch-logging-v1
  namespace: kube-system
  labels:
    k8s-app: elasticsearch-logging
    version: v1
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
spec:
  replicas: 2
  selector:
    k8s-app: elasticsearch-logging
    version: v1
  template:
    metadata:
      labels:
        k8s-app: elasticsearch-logging
        version: v1
        kubernetes.io/cluster-service: "true"
    spec:
      serviceAccount: k8s-efk
      containers:
      - image: gcr.io/google_containers/elasticsearch:v2.4.1-2
        name: elasticsearch-logging
        resources:
          # need more cpu upon initialization, therefore burstable class
          limits:
            cpu: 1000m
          requests:
            cpu: 100m
        ports:
        - containerPort: 9200
          name: db
          protocol: TCP
        - containerPort: 9300
          name: transport
          protocol: TCP
        volumeMounts:
        - name: es-persistent-storage
          mountPath: /data
        env:
        - name: "NAMESPACE"
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
      volumes:
      - name: es-persistent-storage
        emptyDir: {}

 

從新建立elasticsearch logging service後,咱們再來查看elasticsearch-logging pod的日誌,貌似OK,其實也是個坑,呆會我繼續說:

[2017-05-22 06:09:12,155][INFO ][node                     ] [elasticsearch-logging-v1-9jjf1] version[2.4.1], pid[1], build[c67dc32/2016-09-27T18:57:55Z]
[2017-05-22 06:09:12,156][INFO ][node                     ] [elasticsearch-logging-v1-9jjf1] initializing ...
[2017-05-22 06:09:13,657][INFO ][plugins                  ] [elasticsearch-logging-v1-9jjf1] modules [reindex, lang-expression, lang-groovy], plugins [], sites []
[2017-05-22 06:09:13,733][INFO ][env                      ] [elasticsearch-logging-v1-9jjf1] using [1] data paths, mounts [[/data (/dev/mapper/cl-root)]], net usable_space [25gb], net total_space [37.2gb], spins? [possibly], types [xfs]
[2017-05-22 06:09:13,738][INFO ][env                      ] [elasticsearch-logging-v1-9jjf1] heap size [1015.6mb], compressed ordinary object pointers [true]
[2017-05-22 06:09:21,946][INFO ][node                     ] [elasticsearch-logging-v1-9jjf1] initialized
[2017-05-22 06:09:21,980][INFO ][node                     ] [elasticsearch-logging-v1-9jjf1] starting ...
[2017-05-22 06:09:22,442][INFO ][transport                ] [elasticsearch-logging-v1-9jjf1] publish_address {192.168.10.6:9300}, bound_addresses {[::]:9300}
[2017-05-22 06:09:22,560][INFO ][discovery                ] [elasticsearch-logging-v1-9jjf1] kubernetes-logging/RY_IOcwSSSeuJNtC2E0W7A
[2017-05-22 06:09:30,446][INFO ][cluster.service          ] [elasticsearch-logging-v1-9jjf1] detected_master {elasticsearch-logging-v1-sbcgt}{9--uDYJOTqegj5ctbbCx_A}{192.168.10.8}{192.168.10.8:9300}{master=true}, added {{elasticsearch-logging-v1-sbcgt}{9--uDYJOTqegj5ctbbCx_A}{192.168.10.8}{192.168.10.8:9300}{master=true},}, reason: zen-disco-receive(from master [{elasticsearch-logging-v1-sbcgt}{9--uDYJOTqegj5ctbbCx_A}{192.168.10.8}{192.168.10.8:9300}{master=true}])
[2017-05-22 06:09:30,453][INFO ][http                     ] [elasticsearch-logging-v1-9jjf1] publish_address {192.168.10.6:9200}, bound_addresses {[::]:9200}
[2017-05-22 06:09:30,465][INFO ][node                     ] [elasticsearch-logging-v1-9jjf1] started

 

好了,繼續.....

 

4.配置kibana

 

根據前輩們的經驗,把上面新建立的serviceaccount:k8s-efk顯式賦值給kibana-controller.yaml:

[root@k8s-master fluentd-elasticsearch]# cat kibana-controller.yaml 
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: kibana-logging
  namespace: kube-system
  labels:
    k8s-app: kibana-logging
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
spec:
  replicas: 1
  selector:
    matchLabels:
      k8s-app: kibana-logging
  template:
    metadata:
      labels:
        k8s-app: kibana-logging
    spec:
      serviceAccount: k8s-efk
      containers:
      - name: kibana-logging
        image: gcr.io/google_containers/kibana:v4.6.1-1
        resources:
          # keep request = limit to keep this container in guaranteed class
          limits:
            cpu: 100m
          requests:
            cpu: 100m
        env:
          - name: "ELASTICSEARCH_URL"
            value: "http://elasticsearch-logging:9200"
          - name: "KIBANA_BASE_URL"
            value: "/api/v1/proxy/namespaces/kube-system/services/kibana-logging"
        ports:
        - containerPort: 5601
          name: ui
          protocol: TCP
[root@k8s-master fluentd-elasticsearch]# 

啓動kibana,並觀察pod日誌:

# kubectl logs -f kibana-logging-3604961973-jby53 -n kube-system
ELASTICSEARCH_URL=http://elasticsearch-logging:9200
server.basePath: /api/v1/proxy/namespaces/kube-system/services/kibana-logging
{"type":"log","@timestamp":"2017-03-02T08:30:15Z","tags":["info","optimize"],"pid":6,"message":"Optimizing and caching bundles for kibana and statusPage. This may take a few minutes"}

kibana啓動須要十幾分鍾。抱歉,本人是在一臺8G的筆記本電腦的virtualbox虛擬機上作,以後你將會看到以下日誌:

# kubectl logs -f kibana-logging-3604961973-jby53 -n kube-system
ELASTICSEARCH_URL=http://elasticsearch-logging:9200
server.basePath: /api/v1/proxy/namespaces/kube-system/services/kibana-logging
{"type":"log","@timestamp":"2017-03-02T08:30:15Z","tags":["info","optimize"],"pid":6,"message":"Optimizing and caching bundles for kibana and statusPage. This may take a few minutes"}
{"type":"log","@timestamp":"2017-03-02T08:40:04Z","tags":["info","optimize"],"pid":6,"message":"Optimization of bundles for kibana and statusPage complete in 588.60 seconds"}
{"type":"log","@timestamp":"2017-03-02T08:40:04Z","tags":["status","plugin:kibana@1.0.0","info"],"pid":6,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"}
{"type":"log","@timestamp":"2017-03-02T08:40:05Z","tags":["status","plugin:elasticsearch@1.0.0","info"],"pid":6,"state":"yellow","message":"Status changed from uninitialized to yellow - Waiting for Elasticsearch","prevState":"uninitialized","prevMsg":"uninitialized"}
{"type":"log","@timestamp":"2017-03-02T08:40:05Z","tags":["status","plugin:kbn_vislib_vis_types@1.0.0","info"],"pid":6,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"}
{"type":"log","@timestamp":"2017-03-02T08:40:05Z","tags":["status","plugin:markdown_vis@1.0.0","info"],"pid":6,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"}
{"type":"log","@timestamp":"2017-03-02T08:40:05Z","tags":["status","plugin:metric_vis@1.0.0","info"],"pid":6,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"}
{"type":"log","@timestamp":"2017-03-02T08:40:06Z","tags":["status","plugin:spyModes@1.0.0","info"],"pid":6,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"}
{"type":"log","@timestamp":"2017-03-02T08:40:06Z","tags":["status","plugin:statusPage@1.0.0","info"],"pid":6,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"}
{"type":"log","@timestamp":"2017-03-02T08:40:06Z","tags":["status","plugin:table_vis@1.0.0","info"],"pid":6,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"}
{"type":"log","@timestamp":"2017-03-02T08:40:06Z","tags":["listening","info"],"pid":6,"message":"Server running at http://0.0.0.0:5601"}
{"type":"log","@timestamp":"2017-03-02T08:40:11Z","tags":["status","plugin:elasticsearch@1.0.0","info"],"pid":6,"state":"yellow","message":"Status changed from yellow to yellow - No existing Kibana index found","prevState":"yellow","prevMsg":"Waiting for Elasticsearch"}
{"type":"log","@timestamp":"2017-03-02T08:40:14Z","tags":["status","plugin:elasticsearch@1.0.0","info"],"pid":6,"state":"green","message":"Status changed from yellow to green - Kibana index ready","prevState":"yellow","prevMsg":"No existing Kibana index found"}

須要注意的是:(這也是坑阿)

  • 必須配置dns,不然會出現http://elasticsearch-logging:9200沒法鏈接的狀態
  • 若是不配置dns,那就只能修改control文件把elasticsearch-logging修改爲具體的elasticsearch-logging服務的集羣ip

經過

kubectl cluster-info

能夠獲取kibana服務的地址,其實也就是在

https://{API Server external IP}:{API Server secure port}/api/v1/proxy/namespaces/kube-system/services/kibana-logging/app/kibana#/settings/indices/

 在下面這個界面中發現不管怎麼搞create都不出現,沒法添加index,固然除了直接輸入*能夠create,可是進去沒有任何pod的信息,問題大了!!!!

 

 

 

 5.定位問題過程

仔細對照了Tony Bai的搭建文檔,有參考了無數前輩的無數的帖子,都想換個CentOS 6.5版本從新來過了,無奈CentOS 6.5暫時也沒裝上kubernetes集羣,因此放棄。

對照了一下日誌,出問題的地方極可能是:

  • 根本沒有日誌
  • fluentd服務運行問題
  • elasticsearch logging日誌收集問題

 

  • 修改elasticsearch logging的images(坑阿)

仔細看fluentd的日誌/etc/log/fluent.log,發現根本就是沒有任何日誌的輸出,排除fluentd鏈接elasticsearch logging:9200鏈接不上的問題,

感受是elasticseach logging的本身的問題,而後對比tonybai的elasticsearch的日誌,發現個人只有

[2017-05-22 06:09:30,446][INFO ][cluster.service          ] [elasticsearch-logging-v1-9jjf1] detected_master {elasticsearch-logging-v1-sbcgt}{9--uDYJOTqegj5ctbbCx_A}{192.168.10.8}{192.168.10.8:9300}{master=true}, added {{elasticsearch-logging-v1-sbcgt}{9--uDYJOTqegj5ctbbCx_A}{192.168.10.8}{192.168.10.8:9300}{master=true},}, reason: zen-disco-receive(from master [{elasticsearch-logging-v1-sbcgt}{9--uDYJOTqegj5ctbbCx_A}{192.168.10.8}{192.168.10.8:9300}{master=true}])
[2017-05-22 06:09:30,453][INFO ][http                     ] [elasticsearch-logging-v1-9jjf1] publish_address {192.168.10.6:9200}, bound_addresses {[::]:9200}
[2017-05-22 06:09:30,465][INFO ][node                     ] [elasticsearch-logging-v1-9jjf1] started

就結束了,而tonybai的是

[2017-03-02 08:26:56,955][INFO ][http                     ] [elasticsearch-logging-v1-dklui] publish_address {172.16.57.8:9200}, bound_addresses {[::]:9200}
[2017-03-02 08:26:56,956][INFO ][node                     ] [elasticsearch-logging-v1-dklui] started
[2017-03-02 08:26:57,157][INFO ][gateway                  ] [elasticsearch-logging-v1-dklui] recovered [0] indices into cluster_state
[2017-03-02 08:27:05,378][INFO ][cluster.metadata         ] [elasticsearch-logging-v1-dklui] [logstash-2017.03.02] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings []
[2017-03-02 08:27:06,360][INFO ][cluster.metadata         ] [elasticsearch-logging-v1-dklui] [logstash-2017.03.01] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings []
[2017-03-02 08:27:07,163][INFO ][cluster.routing.allocation] [elasticsearch-logging-v1-dklui] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[logstash-2017.03.01][3], [logstash-2017.03.01][3]] ...]).
[2017-03-02 08:27:07,354][INFO ][cluster.metadata         ] [elasticsearch-logging-v1-dklui] [logstash-2017.03.02] create_mapping [fluentd]
[2017-03-02 08:27:07,988][INFO ][cluster.metadata         ] [elasticsearch-logging-v1-dklui] [logstash-2017.03.01] create_mapping [fluentd]
[2017-03-02 08:27:09,578][INFO ][cluster.routing.allocation] [elasticsearch-logging-v1-dklui] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[logstash-2017.03.02][4]] ...]).

區別以下:

  • recover cluster的信息
  • 有logstash的信息

感受頗有多是image問題,所以把原來的image修改爲和tonybai保持一致,把es-control.yaml中官方的v2.4.1-2修改成

bigwhite/elasticsearch:v2.4.1-1

而後啓動,發現果真有了recover cluster的信息。但仍然沒有logstash的信息。

 

  • 定位日誌問題(又是個大坑)

又繞回到第一個問題,明明經過kubectl logs滿屏幕的日誌

[root@k8s-master fluentd-elasticsearch]# kubectl logs helloworld-service-4d72j
.
.
JAVA Memory arguments: -Djava.security.egd=file:/dev/./urandom
.
CLASSPATH=/u01/oracle/wlserver/../oracle_common/modules/javax.persistence_2.1.jar:/u01/oracle/wlserver/../wlserver/modules/com.oracle.weblogic.jpa21support_1.0.0.0_2-1.jar:/usr/java/jdk1.8.0_101/lib/tools.jar:/u01/oracle/wlserver/server/lib/weblogic_sp.jar:/u01/oracle/wlserver/server/lib/weblogic.jar:/u01/oracle/wlserver/../oracle_common/modules/net.sf.antcontrib_1.1.0.0_1-0b3/lib/ant-contrib.jar:/u01/oracle/wlserver/modules/features/oracle.wls.common.nodemanager_2.0.0.0.jar:/u01/oracle/wlserver/../oracle_common/modules/com.oracle.cie.config-wls-online_8.1.0.0.jar:/u01/oracle/wlserver/common/derby/lib/derbyclient.jar:/u01/oracle/wlserver/common/derby/lib/derby.jar:/u01/oracle/wlserver/server/lib/xqrl.jar
.
PATH=/u01/oracle/wlserver/server/bin:/u01/oracle/wlserver/../oracle_common/modules/org.apache.ant_1.9.2/bin:/usr/java/jdk1.8.0_101/jre/bin:/usr/java/jdk1.8.0_101/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/java/default/bin:/u01/oracle/oracle_common/common/bin:/u01/oracle/oracle_common/common/bin:/u01/oracle/wlserver/common/bin:/u01/oracle/user_projects/domains/base_domain/bin:/u01/oracle
.
***************************************************
*  To start WebLogic Server, use a username and   *
*  password assigned to an admin-level user.  For *
*  server administration, use the WebLogic Server *
*  console at http://hostname:port/console        *
***************************************************
starting weblogic with Java version:
java version "1.8.0_101"
Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)
Starting WLS with line:
/usr/java/jdk1.8.0_101/bin/java -server   -Djava.security.egd=file:/dev/./urandom -Dweblogic.Name=AdminServer -Djava.security.policy=/u01/oracle/wlserver/server/lib/weblogic.policy  -Dweblogic.ProductionModeEnabled=true   -Djava.endorsed.dirs=/usr/java/jdk1.8.0_101/jre/lib/endorsed:/u01/oracle/wlserver/../oracle_common/modules/endorsed  -da -Dwls.home=/u01/oracle/wlserver/server -Dweblogic.home=/u01/oracle/wlserver/server     -Dweblogic.utils.cmm.lowertier.ServiceDisabled=true  weblogic.Server
<May 24, 2017 2:27:39 AM GMT> <Info> <Security> <BEA-090905> <Disabling the CryptoJ JCE Provider self-integrity check for better startup performance. To enable this check, specify -Dweblogic.security.allowCryptoJDefaultJCEVerification=true.> 
<May 24, 2017 2:27:41 AM GMT> <Info> <Security> <BEA-090906> <Changing the default Random Number Generator in RSA CryptoJ from ECDRBG128 to FIPS186PRNG. To disable this change, specify -Dweblogic.security.allowCryptoJDefaultPRNG=true.> 
<May 24, 2017 2:27:44 AM GMT> <Info> <WebLogicServer> <BEA-000377> <Starting WebLogic Server with Java HotSpot(TM) 64-Bit Server VM Version 25.101-b13 from Oracle Corporation.> 
<May 24, 2017 2:27:47 AM GMT> <Info> <Management> <BEA-141107> <Version: WebLogic Server 12.1.3.0.0  Wed May 21 18:53:34 PDT 2014 1604337 > 
<May 24, 2017 2:27:59 AM GMT> <Notice> <WebLogicServer> <BEA-000365> <Server state changed to STARTING.> 
<May 24, 2017 2:28:00 AM GMT> <Info> <WorkManager> <BEA-002900> <Initializing self-tuning thread pool.> 
<May 24, 2017 2:28:00 AM GMT> <Info> <WorkManager> <BEA-002942> <CMM memory level becomes 0. Setting standby thread pool size to 256.> 
<May 24, 2017 2:28:02 AM GMT> <Notice> <Log Management> <BEA-170019> <The server log file /u01/oracle/user_projects/domains/base_domain/servers/AdminServer/logs/AdminServer.log is opened. All server side log events will be written to this file.> 
<May 24, 2017 2:28:18 AM GMT> <Notice> <Security> <BEA-090082> <Security initializing using security realm myrealm.> 
<May 24, 2017 2:28:31 AM GMT> <Notice> <WebLogicServer> <BEA-000365> <Server state changed to STANDBY.> 
<May 24, 2017 2:28:31 AM GMT> <Notice> <WebLogicServer> <BEA-000365> <Server state changed to STARTING.> 
May 24, 2017 2:28:31 AM weblogic.wsee.WseeCoreMessages logWseeServiceStarting
INFO: The Wsee Service is starting
<May 24, 2017 2:28:34 AM GMT> <Warning> <Munger> <BEA-2156203> <A version attribute was not found in element "web-app" in the deployment descriptor /u01/oracle/user_projects/domains/base_domain/servers/AdminServer/upload/HelloWorld.war/WEB-INF/web.xml. A version attribute is required, but this version of the WebLogic Server will assume that the latest version is used. Future versions of WebLogic Server will reject descriptors that do not specify the Java EE version. To eliminate this warning, add an appropriate "version=" to element "web-app" in the deployment descriptor.> 
<May 24, 2017 2:28:39 AM GMT> <Notice> <Log Management> <BEA-170027> <The server has successfully established a connection with the Domain level Diagnostic Service.> 
<May 24, 2017 2:28:41 AM GMT> <Notice> <WebLogicServer> <BEA-000365> <Server state changed to ADMIN.> 
<May 24, 2017 2:28:41 AM GMT> <Notice> <WebLogicServer> <BEA-000365> <Server state changed to RESUMING.> 
<May 24, 2017 2:28:41 AM GMT> <Notice> <Server> <BEA-002613> <Channel "Default[3]" is now listening on 127.0.0.1:7001 for protocols iiop, t3, ldap, snmp, http.> 
<May 24, 2017 2:28:41 AM GMT> <Notice> <Server> <BEA-002613> <Channel "Default" is now listening on 192.168.53.3:7001 for protocols iiop, t3, ldap, snmp, http.> 
<May 24, 2017 2:28:41 AM GMT> <Notice> <Server> <BEA-002613> <Channel "Default[2]" is now listening on 0:0:0:0:0:0:0:1:7001 for protocols iiop, t3, ldap, snmp, http.> 
<May 24, 2017 2:28:41 AM GMT> <Notice> <WebLogicServer> <BEA-000329> <Started the WebLogic Server Administration Server "AdminServer" for domain "base_domain" running in production mode.> 
<May 24, 2017 2:28:41 AM GMT> <Error> <Server> <BEA-002606> <The server is unable to create a server socket for listening on channel "Default[1]". The address fe80:0:0:0:42:c0ff:fea8:3503 might be incorrect or another process is using port 7001: java.net.BindException: Cannot assign requested address> 
<May 24, 2017 2:28:41 AM GMT> <Warning> <Server> <BEA-002611> <The hostname "localhost", maps to multiple IP addresses: 127.0.0.1, 0:0:0:0:0:0:0:1.> 
<May 24, 2017 2:28:41 AM GMT> <Notice> <WebLogicServer> <BEA-000360> <The server started in RUNNING mode.> 
<May 24, 2017 2:28:41 AM GMT> <Notice> <WebLogicServer> <BEA-000365> <Server state changed to RUNNING.> 
[root@k8s-master fluentd-elasticsearch]# 

 

再去minion機器上經過docker logs id也是滿屏的日誌阿!

[root@k8s-node-1 ~]# docker logs bec3e02b2490
.
.
JAVA Memory arguments: -Djava.security.egd=file:/dev/./urandom
.
CLASSPATH=/u01/oracle/wlserver/../oracle_common/modules/javax.persistence_2.1.jar:/u01/oracle/wlserver/../wlserver/modules/com.oracle.weblogic.jpa21support_1.0.0.0_2-1.jar:/usr/java/jdk1.8.0_101/lib/tools.jar:/u01/oracle/wlserver/server/lib/weblogic_sp.jar:/u01/oracle/wlserver/server/lib/weblogic.jar:/u01/oracle/wlserver/../oracle_common/modules/net.sf.antcontrib_1.1.0.0_1-0b3/lib/ant-contrib.jar:/u01/oracle/wlserver/modules/features/oracle.wls.common.nodemanager_2.0.0.0.jar:/u01/oracle/wlserver/../oracle_common/modules/com.oracle.cie.config-wls-online_8.1.0.0.jar:/u01/oracle/wlserver/common/derby/lib/derbyclient.jar:/u01/oracle/wlserver/common/derby/lib/derby.jar:/u01/oracle/wlserver/server/lib/xqrl.jar
.
PATH=/u01/oracle/wlserver/server/bin:/u01/oracle/wlserver/../oracle_common/modules/org.apache.ant_1.9.2/bin:/usr/java/jdk1.8.0_101/jre/bin:/usr/java/jdk1.8.0_101/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/java/default/bin:/u01/oracle/oracle_common/common/bin:/u01/oracle/oracle_common/common/bin:/u01/oracle/wlserver/common/bin:/u01/oracle/user_projects/domains/base_domain/bin:/u01/oracle
.
***************************************************
*  To start WebLogic Server, use a username and   *
*  password assigned to an admin-level user.  For *
*  server administration, use the WebLogic Server *
*  console at http://hostname:port/console        *
***************************************************
starting weblogic with Java version:
java version "1.8.0_101"
Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)
Starting WLS with line:
/usr/java/jdk1.8.0_101/bin/java -server   -Djava.security.egd=file:/dev/./urandom -Dweblogic.Name=AdminServer -Djava.security.policy=/u01/oracle/wlserver/server/lib/weblogic.policy  -Dweblogic.ProductionModeEnabled=true   -Djava.endorsed.dirs=/usr/java/jdk1.8.0_101/jre/lib/endorsed:/u01/oracle/wlserver/../oracle_common/modules/endorsed  -da -Dwls.home=/u01/oracle/wlserver/server -Dweblogic.home=/u01/oracle/wlserver/server     -Dweblogic.utils.cmm.lowertier.ServiceDisabled=true  weblogic.Server
...........

 

由於/var/log/fluentd.log總是停留在配置上沒有刷動,懷疑是configmap的配置問題,找出以前備份的td-agent.conf一看

# The Kubernetes fluentd plugin is used to write the Kubernetes metadata to the log
# record & add labels to the log record if properly configured. This enables users
# to filter & search logs on any metadata.
# For example a Docker container's logs might be in the directory:
#
#  /var/lib/docker/containers/997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b
#
# and in the file:
#
#  997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b-json.log
#
# where 997599971ee6... is the Docker ID of the running container.
# The Kubernetes kubelet makes a symbolic link to this file on the host machine
# in the /var/log/containers directory which includes the pod name and the Kubernetes
# container name:
#
#    synthetic-logger-0.25lps-pod_default_synth-lgr-997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b.log

終於發現問題, 原來都是經過/var/lib/docker/containers/目錄去找,然而個人docker下面容器沒有任何log文件。

 仔細研究了一下docker,原來全部的docker日誌都journal到系統日誌 /var/log/messages下了.爲何呢? 由於常常有人說docker日誌太多致使container容器增加比較快,因此都經過系統的journal進行統一處理。

修改/etc/sysconfig/docker配置文件,把原來的journal改回到當前json.log方式.

#OPTIONS='--selinux-enabled --log-driver=journald --signature-verification=false'
OPTIONS='--selinux-enabled --log-driver=json-file --signature-verification=false'

改完後就發現container下面有不少log文件了.

而後回到/var/log/fluentd.log文件,發現終於滿屏的日誌滾動,輸出正常了

 

2017-05-24 05:44:17 +0000 [info]: following tail of /var/log/containers/fluentd-es-v1.22-351lz_kube-system_POD-aca728523bc307598917d78b2526e718e6c7fdbb38b70c05900d2439399efa10.log
2017-05-24 05:44:17 +0000 [info]: following tail of /var/log/containers/helloworld-service-n5f0s_default_POD-ca013e9ab31b825cd4b85ab4700fad2fcaafd5f39c572778d10d438012ea4435.log
2017-05-24 05:44:17 +0000 [info]: following tail of /var/log/containers/fluentd-es-v1.22-351lz_kube-system_POD-2eb78ece8c2b5c222313ab4cfb53ea6ec32f54e1b7616f729daf48b01d393b65.log
2017-05-24 05:44:17 +0000 [info]: following tail of /var/log/containers/helloworld-service-4d72j_default_POD-1dcbbc2ef71f7f542018069a1043a122117a97378c19f03ddb95b8a71dab4637.log
2017-05-24 05:44:17 +0000 [info]: following tail of /var/log/containers/helloworld-service-n5f0s_default_weblogichelloworld-d7229e5c23c6bf7582ed6559417ba24d99e33e44a68a6079159b4792fe05a673.log
2017-05-24 05:44:17 +0000 [info]: following tail of /var/log/containers/helloworld-service-4d72j_default_weblogichelloworld-71d1d7252dd7504fd45351d714d21c3c615facc5e2650553c68c0bf359e8434a.log
2017-05-24 05:44:17 +0000 [info]: following tail of /var/log/containers/kube-dns-v11-x0vr3_kube-system_kube2sky-c77121c354459f22712b0a99623eff1590f4fdb1a5d3ad2db09db000755f9c2c.log
2017-05-24 05:44:17 +0000 [info]: following tail of /var/log/containers/kube-dns-v11-x0vr3_kube-system_skydns-f3c0fbf4ea5cd840c968a807a40569042c90de08f7722e7344282845d5782a20.log
2017-05-24 05:44:17 +0000 [info]: following tail of /var/log/containers/fluentd-es-v1.22-351lz_kube-system_fluentd-es-93795904ff4870758441dd7288972d4967ffac18f2f25272b12e99ea6b692d44.log
2017-05-24 05:45:03 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2017-05-24 05:44:24 +0000 error_class="Fluent::ElasticsearchOutput::ConnectionFailure" error="Can not reach Elasticsearch cluster ({:host=>\"elasticsearch-logging\", :port=>9200, :scheme=>\"http\"})!" plugin_id="object:3f986d0a5150"
  2017-05-24 05:45:03 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-elasticsearch-1.5.0/lib/fluent/plugin/out_elasticsearch.rb:122:in `client'

 

把全部的組件都啓動起來

[root@k8s-master fluentd-elasticsearch]# kubectl get pods -n kube-system
NAME                              READY     STATUS    RESTARTS   AGE
elasticsearch-logging-v1-1xwnq    1/1       Running   0          29s
elasticsearch-logging-v1-gx6lc    1/1       Running   0          29s
fluentd-es-v1.22-351lz            1/1       Running   1          3h
kibana-logging-3659310023-gcwrn   1/1       Running   0          15s
kube-dns-v11-x0vr3                4/4       Running   28         1d

 

訪問kibana

[root@k8s-master fluentd-elasticsearch]# kubectl cluster-info
Kubernetes master is running at http://localhost:8080
Elasticsearch is running at http://localhost:8080/api/v1/proxy/namespaces/kube-system/services/elasticsearch-logging
Kibana is running at http://localhost:8080/api/v1/proxy/namespaces/kube-system/services/kibana-logging
KubeDNS is running at http://localhost:8080/api/v1/proxy/namespaces/kube-system/services/kube-dns

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

 

終於出現綠色的create

create完後進入

 

點開kubernetes pod name,能夠看到集羣內部全部的pod的日誌

若是咱們只是關注helloworld-service,選擇旁邊的+,而後能夠看到每條weblogic日誌的輸出.

好了,到目前爲止配置完成。

 

6.填坑總結:

  • 此次爲了配置EFK把DNS,ServiceAccount都配置了一便,也算是收穫啦。
  • 對kubernetes內部集羣的機制有更深的理解,也慢慢的幫助咱們加速解決問題吧。

 

最後真心感謝網上那些還沒有謀面的大師的幫助:

http://tonybai.com/2017/03/03/implement-kubernetes-cluster-level-logging-with-fluentd-and-elasticsearch-stack/

http://rootsongjc.github.io/blogs/kubernetes-fluentd-elasticsearch-installation/

http://www.tothenew.com/blog/how-to-install-kubernetes-on-centos/

http://blog.csdn.net/wenwst/article/details/53908144

相關文章
相關標籤/搜索