在初步完成Kubernetes集羣架構的創建後,經過搭建一些監控組件,咱們已經可以實現html
可是,在分佈式架構中節點的規模每每是很龐大的,一個典型的生產環境可能有幾十上百個minion節點,在這種狀況下就須要創建一套集中的日誌監控和管理系統,在本人前期的思考中,也想經過volumn外掛到存儲的方式實現weblogic的日誌輸出到共享存儲,但這種方式的問題在於:java
所以仍是須要尋找平臺級別的架構方案.在kubernetes的官方文檔中,https://kubernetes.io/docs/concepts/cluster-administration/logging/node
Kubernetes給出了幾種日誌方案,並給出Cluster-level logging的參考架構:linux
也就是說,咱們本身啓動運行的Pod的內部容器進程經過streaming的方式把日誌輸出到minion主機,而後由運行在相同主機的另一個pod,logging-agent-pod把日誌獲取到,同時把日誌傳回Backend, Bankend其實是基於不一樣的實現,好比Elasticsearch-logging,以及展現的kibana平臺。git
Kubernetes建議採用這種結點級別的logging-agent,並提供了其中的兩種,一種用於Google Cloud Platform的Stackdriver Logging,另外一種就是Elasticsearch,兩種都是採用fluentd作爲在結點上運行的Agent(日誌代理)github
Using a node-level logging agent is the most common and encouraged approach for a Kubernetes cluster, because it creates only one agent per node, and it doesn’t require any changes to the applications running on the node. However, node-level logging only works for applications’ standard output and standard error.golang
Kubernetes doesn’t specify a logging agent, but two optional logging agents are packaged with the Kubernetes release: Stackdriver Logging for use with Google Cloud Platform, and Elasticsearch. You can find more information and instructions in the dedicated documents. Both use fluentd with custom configuration as an agent on the node.web
好了,下文便開始咱們的填坑指南docker
1. 準備工做express
操做系統: CentOS 7.3
Kubernetes version: 1.5
[root@k8s-master fluentd-elasticsearch]# kubectl version
Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2", GitCommit:"a55267932d501b9fbd6d73e5ded47d79b5763ce5", GitTreeState:"clean", BuildDate:"2017-04-14T13:36:25Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2", GitCommit:"a55267932d501b9fbd6d73e5ded47d79b5763ce5", GitTreeState:"clean", BuildDate:"2017-04-14T13:36:25Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}
git clone https://github.com/kubernetes/kubernetes
http://www.cnblogs.com/ericnie/p/6894688.html
http://www.cnblogs.com/ericnie/p/6897142.html
進入/root/kubernetes/cluster/addons/fluentd-elasticsearch目錄,看到全部的yaml文件
其中fluentd-es-ds.yaml用於構建運行在每一個結點的fluentd DamonSet負責logging Agent角色,es-controller.yaml和es-service.yaml用於構建elasticsearch logging,負責logging backend的日誌彙總,而kibana-controller和kibana-service用於展現
把幾個conroller.yaml文件中的images下載到各個minion節點
docker pull gcr.io/google_containers/elasticsearch:v2.4.1-2 docker pull gcr.io/google_containers/fluentd-elasticsearch:1.22 docker pull gcr.io/google_containers/kibana:v4.6.1-1
2.啓動fluentd DaemonSet
Fluentd會運行在每個minion節點上,經過
# kubectl create -f fluentd-es-ds.yaml daemonset "fluentd-es-v1.22" created
而後在minion節點上經過tail -f /var/log/fluentd.log中查看,結果在minion節點上根本沒有fluentd.log文件啊!
筆者經過
kubectl get pods -n kube-system
看了一下,發現根本沒有fluentd相關的Pod在運行或者在pending! :(
經過
kubectl get -f fluentd-es-ds.yaml
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE-SELECTOR
AGE fluentd-es-v1.22 0 0 0 0 0 beta.kubernetes.io/fluentd-ds-ready=true 2m
查看一下,發現有個NODE-SELECTOR(正常的以下),beta.kubernetes.io/fluentd-ds-ready=true
再
kubectl describe nodes k8s-node-1
發現個人minion節點根本沒有這個Label,經過命令打label
kubectl label node k8s-node-1 beta.kubernetes.io/fluentd-ds-ready=true
從新create後,就發如今k8s-node-1中已經存在/var/log/fluentd.log文件了.
# tail -f /var/log/fluentd.log 2017-03-02 02:27:01 +0000 [info]: reading config file path="/etc/td-agent/td-agent.conf" 2017-03-02 02:27:01 +0000 [info]: starting fluentd-0.12.31 2017-03-02 02:27:01 +0000 [info]: gem 'fluent-mixin-config-placeholders' version '0.4.0' 2017-03-02 02:27:01 +0000 [info]: gem 'fluent-mixin-plaintextformatter' version '0.2.6' 2017-03-02 02:27:01 +0000 [info]: gem 'fluent-plugin-docker_metadata_filter' version '0.1.3' 2017-03-02 02:27:01 +0000 [info]: gem 'fluent-plugin-elasticsearch' version '1.5.0' 2017-03-02 02:27:01 +0000 [info]: gem 'fluent-plugin-kafka' version '0.4.1' 2017-03-02 02:27:01 +0000 [info]: gem 'fluent-plugin-kubernetes_metadata_filter' version '0.24.0' 2017-03-02 02:27:01 +0000 [info]: gem 'fluent-plugin-mongo' version '0.7.16' 2017-03-02 02:27:01 +0000 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '1.5.5' 2017-03-02 02:27:01 +0000 [info]: gem 'fluent-plugin-s3' version '0.8.0' 2017-03-02 02:27:01 +0000 [info]: gem 'fluent-plugin-scribe' version '0.10.14' 2017-03-02 02:27:01 +0000 [info]: gem 'fluent-plugin-td' version '0.10.29' 2017-03-02 02:27:01 +0000 [info]: gem 'fluent-plugin-td-monitoring' version '0.2.2' 2017-03-02 02:27:01 +0000 [info]: gem 'fluent-plugin-webhdfs' version '0.4.2' 2017-03-02 02:27:01 +0000 [info]: gem 'fluentd' version '0.12.31' 2017-03-02 02:27:01 +0000 [info]: adding match pattern="fluent.**" type="null" 2017-03-02 02:27:01 +0000 [info]: adding filter pattern="kubernetes.**" type="kubernetes_metadata" 2017-03-02 02:27:02 +0000 [error]: config error file="/etc/td-agent/td-agent.conf" error="Invalid Kubernetes API v1 endpoint https://192.168.0.105:443/api: 401 Unauthorized" 2017-03-02 02:27:02 +0000 [info]: process finished code=256 2017-03-02 02:27:02 +0000 [warn]: process died within 1 second. exit.
發現fluentd image是經過443端口去連個人ApiServer的,API Server開啓了安全機制,所以須要配置ca_file、client_cert、client_key等key,若是不想從新作images,Kubernetes提供了ConfigMap這一強大的武器,咱們能夠將新版td-agent.conf製做成kubernetes的configmap資源,並掛載到fluentd pod的相應位置以替換image中默認的td-agent.conf。
td-agent.conf的目錄是
/root/kubernetes/cluster/addons/fluentd-elasticsearch/fluentd-es-image
加入ca,client等後如
// td-agent.conf ... ... <filter kubernetes.**> type kubernetes_metadata ca_file /srv/kubernetes/ca.crt client_cert /srv/kubernetes/kubecfg.crt client_key /srv/kubernetes/kubecfg.key </filter> ... ...
須要注意的是
# kubectl create configmap td-agent-config --from-file=./td-agent.conf -n kube-system configmap "td-agent-config" created # kubectl get configmaps td-agent-config -o yaml apiVersion: v1 data: td-agent.conf: | <match fluent.**> type null </match> <source> type tail path /var/log/containers/*.log pos_file /var/log/es-containers.log.pos time_format %Y-%m-%dT%H:%M:%S.%NZ tag kubernetes.* format json read_from_head true </source> ... ...
fluentd-es-ds.yaml也要隨之作一些改動,主要是增長兩個mount:
一個是mount 上面的configmap td-agent-config
另一個就是mount hostpath:/srv/kubernetes以獲取到相關client端的數字證書:
[root@k8s-master fluentd-elasticsearch]# cat fluentd-es-ds.yaml apiVersion: extensions/v1beta1 kind: DaemonSet metadata: name: fluentd-es-v1.22 namespace: kube-system labels: k8s-app: fluentd-es kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile version: v1.22 spec: template: metadata: labels: k8s-app: fluentd-es kubernetes.io/cluster-service: "true" version: v1.22 # This annotation ensures that fluentd does not get evicted if the node # supports critical pod annotation based priority scheme. # Note that this does not guarantee admission on the nodes (#40573). annotations: scheduler.alpha.kubernetes.io/critical-pod: '' spec: containers: - name: fluentd-es image: gcr.io/google_containers/fluentd-elasticsearch:1.22 command: - '/bin/sh' - '-c' - '/usr/sbin/td-agent 2>&1 >> /var/log/fluentd.log' resources: limits: memory: 200Mi requests: cpu: 100m memory: 200Mi volumeMounts: - name: varlog mountPath: /var/log - name: varlibdockercontainers mountPath: /var/lib/docker/containers readOnly: true - name: td-agent-config mountPath: /etc/td-agent - name: tls-files mountPath: /srv/kubernetes nodeSelector: beta.kubernetes.io/fluentd-ds-ready: "true" terminationGracePeriodSeconds: 30 volumes: - name: varlog hostPath: path: /var/log - name: varlibdockercontainers hostPath: path: /var/lib/docker/containers - name: td-agent-config configMap: name: td-agent-config - name: tls-files hostPath: path: /srv/kubernetes [root@k8s-master fluentd-elasticsearch]#
再次建立fluent-es-ds.yaml,而後看minion的/var/log/fluentd.log
...... client_cert /srv/kubernetes/kubecfg.crt client_key /srv/kubernetes/kubecfg.key </filter> <match **> type elasticsearch log_level info include_tag_key true host elasticsearch-logging port 9200 logstash_format true buffer_chunk_limit 2M buffer_queue_limit 32 flush_interval 5s max_retry_wait 30 disable_retry_limit num_threads 8 </match> </ROOT>
出現這個基本算是成功了,貌似沒問題吧,其實有個坑,能夠接下來繼續配置elasticsearch log.
3.配置elasticsearch
建立elasticsearch,
# kubectl create -f es-controller.yaml replicationcontroller "elasticsearch-logging-v1" created # kubectl create -f es-service.yaml service "elasticsearch-logging" created get pods: kube-system elasticsearch-logging-v1-3bzt6 1/1 Running 0 7s 172.16.57.8 10.46.181.146 kube-system elasticsearch-logging-v1-nvbe1 1/1 Running 0 7s 172.16.99.10 10.47.136.60
查看日誌
# kubectl logs -f elasticsearch-logging-v1-3bzt6 -n kube-system F0302 03:59:41.036697 8 elasticsearch_logging_discovery.go:60] kube-system namespace doesn't exist: the server has asked for the client to provide credentials (get namespaces kube-system) goroutine 1 [running]: k8s.io/kubernetes/vendor/github.com/golang/glog.stacks(0x19a8100, 0xc400000000, 0xc2, 0x186) ... ... main.main() elasticsearch_logging_discovery.go:60 +0xb53 [2017-03-02 03:59:42,587][INFO ][node ] [elasticsearch-logging-v1-3bzt6] version[2.4.1], pid[16], build[c67dc32/2016-09-27T18:57:55Z] [2017-03-02 03:59:42,588][INFO ][node ] [elasticsearch-logging-v1-3bzt6] initializing ... [2017-03-02 03:59:44,396][INFO ][plugins ] [elasticsearch-logging-v1-3bzt6] modules [reindex, lang-expression, lang-groovy], plugins [], sites [] ... ... [2017-03-02 03:59:44,441][INFO ][env ] [elasticsearch-logging-v1-3bzt6] heap size [1007.3mb], compressed ordinary object pointers [true] [2017-03-02 03:59:48,355][INFO ][node ] [elasticsearch-logging-v1-3bzt6] initialized [2017-03-02 03:59:48,355][INFO ][node ] [elasticsearch-logging-v1-3bzt6] starting ... [2017-03-02 03:59:48,507][INFO ][transport ] [elasticsearch-logging-v1-3bzt6] publish_address {172.16.57.8:9300}, bound_addresses {[::]:9300} [2017-03-02 03:59:48,547][INFO ][discovery ] [elasticsearch-logging-v1-3bzt6] kubernetes-logging/7_f_M2TKRZWOw4NhBc4EqA [2017-03-02 04:00:18,552][WARN ][discovery ] [elasticsearch-logging-v1-3bzt6] waited for 30s and no initial state was set by the discovery [2017-03-02 04:00:18,562][INFO ][http ] [elasticsearch-logging-v1-3bzt6] publish_address {172.16.57.8:9200}, bound_addresses {[::]:9200} [2017-03-02 04:00:18,562][INFO ][node ] [elasticsearch-logging-v1-3bzt6] started
發現錯誤,沒法提供安全的credential,經過在網上參考Tony Bai的技術文檔,發現是默認的Service Account的問題,其中原理還須要研究一下。
先run起來再說,解決方案以下:
建立一個新的serviceaccount在kube-system namespace下:
/serviceaccount.yaml apiVersion: v1 kind: ServiceAccount metadata: name: k8s-efk # kubectl create -f serviceaccount.yaml -n kube-system serviceaccount "k8s-efk" created # kubectl get serviceaccount -n kube-system NAME SECRETS AGE default 1 139d k8s-efk 1 17s
修改es-controller.yaml使用service account 「k8s-efk」:
[root@k8s-master fluentd-elasticsearch]# cat es-controller.yaml apiVersion: v1 kind: ReplicationController metadata: name: elasticsearch-logging-v1 namespace: kube-system labels: k8s-app: elasticsearch-logging version: v1 kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile spec: replicas: 2 selector: k8s-app: elasticsearch-logging version: v1 template: metadata: labels: k8s-app: elasticsearch-logging version: v1 kubernetes.io/cluster-service: "true" spec: serviceAccount: k8s-efk containers: - image: gcr.io/google_containers/elasticsearch:v2.4.1-2 name: elasticsearch-logging resources: # need more cpu upon initialization, therefore burstable class limits: cpu: 1000m requests: cpu: 100m ports: - containerPort: 9200 name: db protocol: TCP - containerPort: 9300 name: transport protocol: TCP volumeMounts: - name: es-persistent-storage mountPath: /data env: - name: "NAMESPACE" valueFrom: fieldRef: fieldPath: metadata.namespace volumes: - name: es-persistent-storage emptyDir: {}
從新建立elasticsearch logging service後,咱們再來查看elasticsearch-logging pod的日誌,貌似OK,其實也是個坑,呆會我繼續說:
[2017-05-22 06:09:12,155][INFO ][node ] [elasticsearch-logging-v1-9jjf1] version[2.4.1], pid[1], build[c67dc32/2016-09-27T18:57:55Z] [2017-05-22 06:09:12,156][INFO ][node ] [elasticsearch-logging-v1-9jjf1] initializing ... [2017-05-22 06:09:13,657][INFO ][plugins ] [elasticsearch-logging-v1-9jjf1] modules [reindex, lang-expression, lang-groovy], plugins [], sites [] [2017-05-22 06:09:13,733][INFO ][env ] [elasticsearch-logging-v1-9jjf1] using [1] data paths, mounts [[/data (/dev/mapper/cl-root)]], net usable_space [25gb], net total_space [37.2gb], spins? [possibly], types [xfs] [2017-05-22 06:09:13,738][INFO ][env ] [elasticsearch-logging-v1-9jjf1] heap size [1015.6mb], compressed ordinary object pointers [true] [2017-05-22 06:09:21,946][INFO ][node ] [elasticsearch-logging-v1-9jjf1] initialized [2017-05-22 06:09:21,980][INFO ][node ] [elasticsearch-logging-v1-9jjf1] starting ... [2017-05-22 06:09:22,442][INFO ][transport ] [elasticsearch-logging-v1-9jjf1] publish_address {192.168.10.6:9300}, bound_addresses {[::]:9300} [2017-05-22 06:09:22,560][INFO ][discovery ] [elasticsearch-logging-v1-9jjf1] kubernetes-logging/RY_IOcwSSSeuJNtC2E0W7A [2017-05-22 06:09:30,446][INFO ][cluster.service ] [elasticsearch-logging-v1-9jjf1] detected_master {elasticsearch-logging-v1-sbcgt}{9--uDYJOTqegj5ctbbCx_A}{192.168.10.8}{192.168.10.8:9300}{master=true}, added {{elasticsearch-logging-v1-sbcgt}{9--uDYJOTqegj5ctbbCx_A}{192.168.10.8}{192.168.10.8:9300}{master=true},}, reason: zen-disco-receive(from master [{elasticsearch-logging-v1-sbcgt}{9--uDYJOTqegj5ctbbCx_A}{192.168.10.8}{192.168.10.8:9300}{master=true}]) [2017-05-22 06:09:30,453][INFO ][http ] [elasticsearch-logging-v1-9jjf1] publish_address {192.168.10.6:9200}, bound_addresses {[::]:9200} [2017-05-22 06:09:30,465][INFO ][node ] [elasticsearch-logging-v1-9jjf1] started
好了,繼續.....
4.配置kibana
根據前輩們的經驗,把上面新建立的serviceaccount:k8s-efk顯式賦值給kibana-controller.yaml:
[root@k8s-master fluentd-elasticsearch]# cat kibana-controller.yaml apiVersion: extensions/v1beta1 kind: Deployment metadata: name: kibana-logging namespace: kube-system labels: k8s-app: kibana-logging kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile spec: replicas: 1 selector: matchLabels: k8s-app: kibana-logging template: metadata: labels: k8s-app: kibana-logging spec: serviceAccount: k8s-efk containers: - name: kibana-logging image: gcr.io/google_containers/kibana:v4.6.1-1 resources: # keep request = limit to keep this container in guaranteed class limits: cpu: 100m requests: cpu: 100m env: - name: "ELASTICSEARCH_URL" value: "http://elasticsearch-logging:9200" - name: "KIBANA_BASE_URL" value: "/api/v1/proxy/namespaces/kube-system/services/kibana-logging" ports: - containerPort: 5601 name: ui protocol: TCP [root@k8s-master fluentd-elasticsearch]#
啓動kibana,並觀察pod日誌:
# kubectl logs -f kibana-logging-3604961973-jby53 -n kube-system ELASTICSEARCH_URL=http://elasticsearch-logging:9200 server.basePath: /api/v1/proxy/namespaces/kube-system/services/kibana-logging {"type":"log","@timestamp":"2017-03-02T08:30:15Z","tags":["info","optimize"],"pid":6,"message":"Optimizing and caching bundles for kibana and statusPage. This may take a few minutes"}
kibana啓動須要十幾分鍾。抱歉,本人是在一臺8G的筆記本電腦的virtualbox虛擬機上作,以後你將會看到以下日誌:
# kubectl logs -f kibana-logging-3604961973-jby53 -n kube-system ELASTICSEARCH_URL=http://elasticsearch-logging:9200 server.basePath: /api/v1/proxy/namespaces/kube-system/services/kibana-logging {"type":"log","@timestamp":"2017-03-02T08:30:15Z","tags":["info","optimize"],"pid":6,"message":"Optimizing and caching bundles for kibana and statusPage. This may take a few minutes"} {"type":"log","@timestamp":"2017-03-02T08:40:04Z","tags":["info","optimize"],"pid":6,"message":"Optimization of bundles for kibana and statusPage complete in 588.60 seconds"} {"type":"log","@timestamp":"2017-03-02T08:40:04Z","tags":["status","plugin:kibana@1.0.0","info"],"pid":6,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"} {"type":"log","@timestamp":"2017-03-02T08:40:05Z","tags":["status","plugin:elasticsearch@1.0.0","info"],"pid":6,"state":"yellow","message":"Status changed from uninitialized to yellow - Waiting for Elasticsearch","prevState":"uninitialized","prevMsg":"uninitialized"} {"type":"log","@timestamp":"2017-03-02T08:40:05Z","tags":["status","plugin:kbn_vislib_vis_types@1.0.0","info"],"pid":6,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"} {"type":"log","@timestamp":"2017-03-02T08:40:05Z","tags":["status","plugin:markdown_vis@1.0.0","info"],"pid":6,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"} {"type":"log","@timestamp":"2017-03-02T08:40:05Z","tags":["status","plugin:metric_vis@1.0.0","info"],"pid":6,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"} {"type":"log","@timestamp":"2017-03-02T08:40:06Z","tags":["status","plugin:spyModes@1.0.0","info"],"pid":6,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"} {"type":"log","@timestamp":"2017-03-02T08:40:06Z","tags":["status","plugin:statusPage@1.0.0","info"],"pid":6,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"} {"type":"log","@timestamp":"2017-03-02T08:40:06Z","tags":["status","plugin:table_vis@1.0.0","info"],"pid":6,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"} {"type":"log","@timestamp":"2017-03-02T08:40:06Z","tags":["listening","info"],"pid":6,"message":"Server running at http://0.0.0.0:5601"} {"type":"log","@timestamp":"2017-03-02T08:40:11Z","tags":["status","plugin:elasticsearch@1.0.0","info"],"pid":6,"state":"yellow","message":"Status changed from yellow to yellow - No existing Kibana index found","prevState":"yellow","prevMsg":"Waiting for Elasticsearch"} {"type":"log","@timestamp":"2017-03-02T08:40:14Z","tags":["status","plugin:elasticsearch@1.0.0","info"],"pid":6,"state":"green","message":"Status changed from yellow to green - Kibana index ready","prevState":"yellow","prevMsg":"No existing Kibana index found"}
須要注意的是:(這也是坑阿)
經過
kubectl cluster-info
能夠獲取kibana服務的地址,其實也就是在
https://{API Server external IP}:{API Server secure port}/api/v1/proxy/namespaces/kube-system/services/kibana-logging/app/kibana#/settings/indices/
在下面這個界面中發現不管怎麼搞create都不出現,沒法添加index,固然除了直接輸入*能夠create,可是進去沒有任何pod的信息,問題大了!!!!
5.定位問題過程
仔細對照了Tony Bai的搭建文檔,有參考了無數前輩的無數的帖子,都想換個CentOS 6.5版本從新來過了,無奈CentOS 6.5暫時也沒裝上kubernetes集羣,因此放棄。
對照了一下日誌,出問題的地方極可能是:
仔細看fluentd的日誌/etc/log/fluent.log,發現根本就是沒有任何日誌的輸出,排除fluentd鏈接elasticsearch logging:9200鏈接不上的問題,
感受是elasticseach logging的本身的問題,而後對比tonybai的elasticsearch的日誌,發現個人只有
[2017-05-22 06:09:30,446][INFO ][cluster.service ] [elasticsearch-logging-v1-9jjf1] detected_master {elasticsearch-logging-v1-sbcgt}{9--uDYJOTqegj5ctbbCx_A}{192.168.10.8}{192.168.10.8:9300}{master=true}, added {{elasticsearch-logging-v1-sbcgt}{9--uDYJOTqegj5ctbbCx_A}{192.168.10.8}{192.168.10.8:9300}{master=true},}, reason: zen-disco-receive(from master [{elasticsearch-logging-v1-sbcgt}{9--uDYJOTqegj5ctbbCx_A}{192.168.10.8}{192.168.10.8:9300}{master=true}]) [2017-05-22 06:09:30,453][INFO ][http ] [elasticsearch-logging-v1-9jjf1] publish_address {192.168.10.6:9200}, bound_addresses {[::]:9200} [2017-05-22 06:09:30,465][INFO ][node ] [elasticsearch-logging-v1-9jjf1] started
就結束了,而tonybai的是
[2017-03-02 08:26:56,955][INFO ][http ] [elasticsearch-logging-v1-dklui] publish_address {172.16.57.8:9200}, bound_addresses {[::]:9200} [2017-03-02 08:26:56,956][INFO ][node ] [elasticsearch-logging-v1-dklui] started [2017-03-02 08:26:57,157][INFO ][gateway ] [elasticsearch-logging-v1-dklui] recovered [0] indices into cluster_state [2017-03-02 08:27:05,378][INFO ][cluster.metadata ] [elasticsearch-logging-v1-dklui] [logstash-2017.03.02] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings [] [2017-03-02 08:27:06,360][INFO ][cluster.metadata ] [elasticsearch-logging-v1-dklui] [logstash-2017.03.01] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings [] [2017-03-02 08:27:07,163][INFO ][cluster.routing.allocation] [elasticsearch-logging-v1-dklui] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[logstash-2017.03.01][3], [logstash-2017.03.01][3]] ...]). [2017-03-02 08:27:07,354][INFO ][cluster.metadata ] [elasticsearch-logging-v1-dklui] [logstash-2017.03.02] create_mapping [fluentd] [2017-03-02 08:27:07,988][INFO ][cluster.metadata ] [elasticsearch-logging-v1-dklui] [logstash-2017.03.01] create_mapping [fluentd] [2017-03-02 08:27:09,578][INFO ][cluster.routing.allocation] [elasticsearch-logging-v1-dklui] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[logstash-2017.03.02][4]] ...]).
區別以下:
感受頗有多是image問題,所以把原來的image修改爲和tonybai保持一致,把es-control.yaml中官方的v2.4.1-2修改成
bigwhite/elasticsearch:v2.4.1-1
而後啓動,發現果真有了recover cluster的信息。但仍然沒有logstash的信息。
又繞回到第一個問題,明明經過kubectl logs滿屏幕的日誌
[root@k8s-master fluentd-elasticsearch]# kubectl logs helloworld-service-4d72j . . JAVA Memory arguments: -Djava.security.egd=file:/dev/./urandom . CLASSPATH=/u01/oracle/wlserver/../oracle_common/modules/javax.persistence_2.1.jar:/u01/oracle/wlserver/../wlserver/modules/com.oracle.weblogic.jpa21support_1.0.0.0_2-1.jar:/usr/java/jdk1.8.0_101/lib/tools.jar:/u01/oracle/wlserver/server/lib/weblogic_sp.jar:/u01/oracle/wlserver/server/lib/weblogic.jar:/u01/oracle/wlserver/../oracle_common/modules/net.sf.antcontrib_1.1.0.0_1-0b3/lib/ant-contrib.jar:/u01/oracle/wlserver/modules/features/oracle.wls.common.nodemanager_2.0.0.0.jar:/u01/oracle/wlserver/../oracle_common/modules/com.oracle.cie.config-wls-online_8.1.0.0.jar:/u01/oracle/wlserver/common/derby/lib/derbyclient.jar:/u01/oracle/wlserver/common/derby/lib/derby.jar:/u01/oracle/wlserver/server/lib/xqrl.jar . PATH=/u01/oracle/wlserver/server/bin:/u01/oracle/wlserver/../oracle_common/modules/org.apache.ant_1.9.2/bin:/usr/java/jdk1.8.0_101/jre/bin:/usr/java/jdk1.8.0_101/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/java/default/bin:/u01/oracle/oracle_common/common/bin:/u01/oracle/oracle_common/common/bin:/u01/oracle/wlserver/common/bin:/u01/oracle/user_projects/domains/base_domain/bin:/u01/oracle . *************************************************** * To start WebLogic Server, use a username and * * password assigned to an admin-level user. For * * server administration, use the WebLogic Server * * console at http://hostname:port/console * *************************************************** starting weblogic with Java version: java version "1.8.0_101" Java(TM) SE Runtime Environment (build 1.8.0_101-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode) Starting WLS with line: /usr/java/jdk1.8.0_101/bin/java -server -Djava.security.egd=file:/dev/./urandom -Dweblogic.Name=AdminServer -Djava.security.policy=/u01/oracle/wlserver/server/lib/weblogic.policy -Dweblogic.ProductionModeEnabled=true -Djava.endorsed.dirs=/usr/java/jdk1.8.0_101/jre/lib/endorsed:/u01/oracle/wlserver/../oracle_common/modules/endorsed -da -Dwls.home=/u01/oracle/wlserver/server -Dweblogic.home=/u01/oracle/wlserver/server -Dweblogic.utils.cmm.lowertier.ServiceDisabled=true weblogic.Server <May 24, 2017 2:27:39 AM GMT> <Info> <Security> <BEA-090905> <Disabling the CryptoJ JCE Provider self-integrity check for better startup performance. To enable this check, specify -Dweblogic.security.allowCryptoJDefaultJCEVerification=true.> <May 24, 2017 2:27:41 AM GMT> <Info> <Security> <BEA-090906> <Changing the default Random Number Generator in RSA CryptoJ from ECDRBG128 to FIPS186PRNG. To disable this change, specify -Dweblogic.security.allowCryptoJDefaultPRNG=true.> <May 24, 2017 2:27:44 AM GMT> <Info> <WebLogicServer> <BEA-000377> <Starting WebLogic Server with Java HotSpot(TM) 64-Bit Server VM Version 25.101-b13 from Oracle Corporation.> <May 24, 2017 2:27:47 AM GMT> <Info> <Management> <BEA-141107> <Version: WebLogic Server 12.1.3.0.0 Wed May 21 18:53:34 PDT 2014 1604337 > <May 24, 2017 2:27:59 AM GMT> <Notice> <WebLogicServer> <BEA-000365> <Server state changed to STARTING.> <May 24, 2017 2:28:00 AM GMT> <Info> <WorkManager> <BEA-002900> <Initializing self-tuning thread pool.> <May 24, 2017 2:28:00 AM GMT> <Info> <WorkManager> <BEA-002942> <CMM memory level becomes 0. Setting standby thread pool size to 256.> <May 24, 2017 2:28:02 AM GMT> <Notice> <Log Management> <BEA-170019> <The server log file /u01/oracle/user_projects/domains/base_domain/servers/AdminServer/logs/AdminServer.log is opened. All server side log events will be written to this file.> <May 24, 2017 2:28:18 AM GMT> <Notice> <Security> <BEA-090082> <Security initializing using security realm myrealm.> <May 24, 2017 2:28:31 AM GMT> <Notice> <WebLogicServer> <BEA-000365> <Server state changed to STANDBY.> <May 24, 2017 2:28:31 AM GMT> <Notice> <WebLogicServer> <BEA-000365> <Server state changed to STARTING.> May 24, 2017 2:28:31 AM weblogic.wsee.WseeCoreMessages logWseeServiceStarting INFO: The Wsee Service is starting <May 24, 2017 2:28:34 AM GMT> <Warning> <Munger> <BEA-2156203> <A version attribute was not found in element "web-app" in the deployment descriptor /u01/oracle/user_projects/domains/base_domain/servers/AdminServer/upload/HelloWorld.war/WEB-INF/web.xml. A version attribute is required, but this version of the WebLogic Server will assume that the latest version is used. Future versions of WebLogic Server will reject descriptors that do not specify the Java EE version. To eliminate this warning, add an appropriate "version=" to element "web-app" in the deployment descriptor.> <May 24, 2017 2:28:39 AM GMT> <Notice> <Log Management> <BEA-170027> <The server has successfully established a connection with the Domain level Diagnostic Service.> <May 24, 2017 2:28:41 AM GMT> <Notice> <WebLogicServer> <BEA-000365> <Server state changed to ADMIN.> <May 24, 2017 2:28:41 AM GMT> <Notice> <WebLogicServer> <BEA-000365> <Server state changed to RESUMING.> <May 24, 2017 2:28:41 AM GMT> <Notice> <Server> <BEA-002613> <Channel "Default[3]" is now listening on 127.0.0.1:7001 for protocols iiop, t3, ldap, snmp, http.> <May 24, 2017 2:28:41 AM GMT> <Notice> <Server> <BEA-002613> <Channel "Default" is now listening on 192.168.53.3:7001 for protocols iiop, t3, ldap, snmp, http.> <May 24, 2017 2:28:41 AM GMT> <Notice> <Server> <BEA-002613> <Channel "Default[2]" is now listening on 0:0:0:0:0:0:0:1:7001 for protocols iiop, t3, ldap, snmp, http.> <May 24, 2017 2:28:41 AM GMT> <Notice> <WebLogicServer> <BEA-000329> <Started the WebLogic Server Administration Server "AdminServer" for domain "base_domain" running in production mode.> <May 24, 2017 2:28:41 AM GMT> <Error> <Server> <BEA-002606> <The server is unable to create a server socket for listening on channel "Default[1]". The address fe80:0:0:0:42:c0ff:fea8:3503 might be incorrect or another process is using port 7001: java.net.BindException: Cannot assign requested address> <May 24, 2017 2:28:41 AM GMT> <Warning> <Server> <BEA-002611> <The hostname "localhost", maps to multiple IP addresses: 127.0.0.1, 0:0:0:0:0:0:0:1.> <May 24, 2017 2:28:41 AM GMT> <Notice> <WebLogicServer> <BEA-000360> <The server started in RUNNING mode.> <May 24, 2017 2:28:41 AM GMT> <Notice> <WebLogicServer> <BEA-000365> <Server state changed to RUNNING.> [root@k8s-master fluentd-elasticsearch]#
再去minion機器上經過docker logs id也是滿屏的日誌阿!
[root@k8s-node-1 ~]# docker logs bec3e02b2490 . . JAVA Memory arguments: -Djava.security.egd=file:/dev/./urandom . CLASSPATH=/u01/oracle/wlserver/../oracle_common/modules/javax.persistence_2.1.jar:/u01/oracle/wlserver/../wlserver/modules/com.oracle.weblogic.jpa21support_1.0.0.0_2-1.jar:/usr/java/jdk1.8.0_101/lib/tools.jar:/u01/oracle/wlserver/server/lib/weblogic_sp.jar:/u01/oracle/wlserver/server/lib/weblogic.jar:/u01/oracle/wlserver/../oracle_common/modules/net.sf.antcontrib_1.1.0.0_1-0b3/lib/ant-contrib.jar:/u01/oracle/wlserver/modules/features/oracle.wls.common.nodemanager_2.0.0.0.jar:/u01/oracle/wlserver/../oracle_common/modules/com.oracle.cie.config-wls-online_8.1.0.0.jar:/u01/oracle/wlserver/common/derby/lib/derbyclient.jar:/u01/oracle/wlserver/common/derby/lib/derby.jar:/u01/oracle/wlserver/server/lib/xqrl.jar . PATH=/u01/oracle/wlserver/server/bin:/u01/oracle/wlserver/../oracle_common/modules/org.apache.ant_1.9.2/bin:/usr/java/jdk1.8.0_101/jre/bin:/usr/java/jdk1.8.0_101/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/java/default/bin:/u01/oracle/oracle_common/common/bin:/u01/oracle/oracle_common/common/bin:/u01/oracle/wlserver/common/bin:/u01/oracle/user_projects/domains/base_domain/bin:/u01/oracle . *************************************************** * To start WebLogic Server, use a username and * * password assigned to an admin-level user. For * * server administration, use the WebLogic Server * * console at http://hostname:port/console * *************************************************** starting weblogic with Java version: java version "1.8.0_101" Java(TM) SE Runtime Environment (build 1.8.0_101-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode) Starting WLS with line: /usr/java/jdk1.8.0_101/bin/java -server -Djava.security.egd=file:/dev/./urandom -Dweblogic.Name=AdminServer -Djava.security.policy=/u01/oracle/wlserver/server/lib/weblogic.policy -Dweblogic.ProductionModeEnabled=true -Djava.endorsed.dirs=/usr/java/jdk1.8.0_101/jre/lib/endorsed:/u01/oracle/wlserver/../oracle_common/modules/endorsed -da -Dwls.home=/u01/oracle/wlserver/server -Dweblogic.home=/u01/oracle/wlserver/server -Dweblogic.utils.cmm.lowertier.ServiceDisabled=true weblogic.Server ...........
由於/var/log/fluentd.log總是停留在配置上沒有刷動,懷疑是configmap的配置問題,找出以前備份的td-agent.conf一看
# The Kubernetes fluentd plugin is used to write the Kubernetes metadata to the log # record & add labels to the log record if properly configured. This enables users # to filter & search logs on any metadata. # For example a Docker container's logs might be in the directory: # # /var/lib/docker/containers/997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b # # and in the file: # # 997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b-json.log # # where 997599971ee6... is the Docker ID of the running container. # The Kubernetes kubelet makes a symbolic link to this file on the host machine # in the /var/log/containers directory which includes the pod name and the Kubernetes # container name: # # synthetic-logger-0.25lps-pod_default_synth-lgr-997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b.log
終於發現問題, 原來都是經過/var/lib/docker/containers/目錄去找,然而個人docker下面容器沒有任何log文件。
仔細研究了一下docker,原來全部的docker日誌都journal到系統日誌 /var/log/messages下了.爲何呢? 由於常常有人說docker日誌太多致使container容器增加比較快,因此都經過系統的journal進行統一處理。
修改/etc/sysconfig/docker配置文件,把原來的journal改回到當前json.log方式.
#OPTIONS='--selinux-enabled --log-driver=journald --signature-verification=false' OPTIONS='--selinux-enabled --log-driver=json-file --signature-verification=false'
改完後就發現container下面有不少log文件了.
而後回到/var/log/fluentd.log文件,發現終於滿屏的日誌滾動,輸出正常了
2017-05-24 05:44:17 +0000 [info]: following tail of /var/log/containers/fluentd-es-v1.22-351lz_kube-system_POD-aca728523bc307598917d78b2526e718e6c7fdbb38b70c05900d2439399efa10.log 2017-05-24 05:44:17 +0000 [info]: following tail of /var/log/containers/helloworld-service-n5f0s_default_POD-ca013e9ab31b825cd4b85ab4700fad2fcaafd5f39c572778d10d438012ea4435.log 2017-05-24 05:44:17 +0000 [info]: following tail of /var/log/containers/fluentd-es-v1.22-351lz_kube-system_POD-2eb78ece8c2b5c222313ab4cfb53ea6ec32f54e1b7616f729daf48b01d393b65.log 2017-05-24 05:44:17 +0000 [info]: following tail of /var/log/containers/helloworld-service-4d72j_default_POD-1dcbbc2ef71f7f542018069a1043a122117a97378c19f03ddb95b8a71dab4637.log 2017-05-24 05:44:17 +0000 [info]: following tail of /var/log/containers/helloworld-service-n5f0s_default_weblogichelloworld-d7229e5c23c6bf7582ed6559417ba24d99e33e44a68a6079159b4792fe05a673.log 2017-05-24 05:44:17 +0000 [info]: following tail of /var/log/containers/helloworld-service-4d72j_default_weblogichelloworld-71d1d7252dd7504fd45351d714d21c3c615facc5e2650553c68c0bf359e8434a.log 2017-05-24 05:44:17 +0000 [info]: following tail of /var/log/containers/kube-dns-v11-x0vr3_kube-system_kube2sky-c77121c354459f22712b0a99623eff1590f4fdb1a5d3ad2db09db000755f9c2c.log 2017-05-24 05:44:17 +0000 [info]: following tail of /var/log/containers/kube-dns-v11-x0vr3_kube-system_skydns-f3c0fbf4ea5cd840c968a807a40569042c90de08f7722e7344282845d5782a20.log 2017-05-24 05:44:17 +0000 [info]: following tail of /var/log/containers/fluentd-es-v1.22-351lz_kube-system_fluentd-es-93795904ff4870758441dd7288972d4967ffac18f2f25272b12e99ea6b692d44.log 2017-05-24 05:45:03 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2017-05-24 05:44:24 +0000 error_class="Fluent::ElasticsearchOutput::ConnectionFailure" error="Can not reach Elasticsearch cluster ({:host=>\"elasticsearch-logging\", :port=>9200, :scheme=>\"http\"})!" plugin_id="object:3f986d0a5150" 2017-05-24 05:45:03 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-elasticsearch-1.5.0/lib/fluent/plugin/out_elasticsearch.rb:122:in `client'
把全部的組件都啓動起來
[root@k8s-master fluentd-elasticsearch]# kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE elasticsearch-logging-v1-1xwnq 1/1 Running 0 29s elasticsearch-logging-v1-gx6lc 1/1 Running 0 29s fluentd-es-v1.22-351lz 1/1 Running 1 3h kibana-logging-3659310023-gcwrn 1/1 Running 0 15s kube-dns-v11-x0vr3 4/4 Running 28 1d
訪問kibana
[root@k8s-master fluentd-elasticsearch]# kubectl cluster-info Kubernetes master is running at http://localhost:8080 Elasticsearch is running at http://localhost:8080/api/v1/proxy/namespaces/kube-system/services/elasticsearch-logging Kibana is running at http://localhost:8080/api/v1/proxy/namespaces/kube-system/services/kibana-logging KubeDNS is running at http://localhost:8080/api/v1/proxy/namespaces/kube-system/services/kube-dns To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
終於出現綠色的create
create完後進入
點開kubernetes pod name,能夠看到集羣內部全部的pod的日誌
若是咱們只是關注helloworld-service,選擇旁邊的+,而後能夠看到每條weblogic日誌的輸出.
好了,到目前爲止配置完成。
6.填坑總結:
最後真心感謝網上那些還沒有謀面的大師的幫助:
http://tonybai.com/2017/03/03/implement-kubernetes-cluster-level-logging-with-fluentd-and-elasticsearch-stack/
http://rootsongjc.github.io/blogs/kubernetes-fluentd-elasticsearch-installation/
http://www.tothenew.com/blog/how-to-install-kubernetes-on-centos/
http://blog.csdn.net/wenwst/article/details/53908144