ELK日誌分析平臺（一）ELK簡介、ElasticSearch集羣

時間 2019-11-05

標籤 elk 日誌分析平臺簡介 elasticsearch 集羣欄目日誌分析简体版

原文原文鏈接

* ELK簡介：

ELK是Elasticsearch 、 Logstash、Kibana三個開源軟件的縮寫。ELK Stack 5.0版本以後新增Beats工具，所以，ELK Stack也更名爲Elastic Stack，Beats是一個輕量級的日誌收集處理工具(Agent)，Beats佔用資源少，適合於在各個服務器上搜集日誌後傳輸給Logstash，也是官方推薦的工具。java

Elastic Stack包含：node

Elasticsearch：是一個基於 Lucene 構建的開源分佈式搜索引擎，提供蒐集、分析、存儲數據三大功能。特色：分佈式，零配置，自動發現，索引自動分片，索引副本機制，restful風格接口，多數據源，自動搜索負載等，做爲 ELK 的核心，它集中存儲數據。
Logstash：主要是日誌的蒐集、分析、過濾日誌的工具，支持以 TCP/UDP/HTTP 多種方式收集數據（也能夠接受 Beats 傳輸來的數據），並對數據作進一步豐富或提取字段處理。通常工做方式爲c/s架構，client端安裝在須要收集日誌的主機上，server端負責將收到的各節點日誌進行過濾、修改等操做，再一併發往elasticsearch。
Kibana：是一個開源的分析和可視化工具，能夠爲 Logstash 和 ElasticSearch 提供日誌分析友好的 Web 界面，能夠幫助彙總、分析和搜索重要數據日誌。
Beats：輕量級日誌採集器，早期的ELK架構中使用Logstash收集、解析日誌，可是Logstash對內存、cpu、io等資源消耗比較高。相比 Logstash，Beats所佔系統的CPU和內存幾乎能夠忽略不計，目前Beats包含六種工具：
Packetbeat：網絡數據（收集網絡流量數據）
Metricbeat：指標（收集系統、進程和文件系統級別的 CPU 和內存使用狀況等數據）
Filebeat：日誌文件（收集文件數據）
Winlogbeat： windows事件日誌（收集 Windows 事件日誌數據）
Auditbeat：審計數據（收集審計日誌）
Heartbeat：運行時間監控（收集系統運行時的數據）

x-pack工具linux

x-pack對Elastic Stack提供了安全、警報、監控、報表、圖表於一身的擴展包，是收費的。git

Elastic Stack官網：https://www.elastic.co/cn/

ELK 架構

安裝 ELK 時，各應用最好選擇統一的版本，避免出現一些莫名其妙的問題。例如：因爲版本不統一，致使三個應用間的通信異常。github

ElasticSearch集羣

實驗環境：

系統：web

[root@localhost soft]# uname -r
3.10.0-862.el7.x86_64
[root@localhost soft]#

IP：10.15.97.136-138npm

* java環境安裝

java環境必須是1.8版本以上bootstrap

[root@localhost soft]# rpm -ivh jdk-8u161-linux-x64.rpm

* ElasticSearch安裝

yum安裝

[root@localhost soft]# rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
[root@localhost soft]# cat /etc/yum.repos.d/elastic.repo
[elasticsearch-6.x]
name=Elasticsearch repository for 6.x packages
baseurl=https://artifacts.elastic.co/packages/6.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md
[root@localhost soft]# yum install -y elasticsearch

手動安裝

[root@localhost soft]# wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.5.1.tar.gz
[root@localhost soft]# tar -zxvf elasticsearch-6.5.1.tar.gz -C /app/
[root@localhost soft]# cd /app/elasticsearch-6.5.1/
[root@localhost elasticsearch-6.5.1]# mkdir data                      //存放data數據的目錄
[root@localhost elasticsearch-6.5.1]# useradd elasticsearch
[root@localhost elasticsearch-6.5.1]# echo "elasticsearch"|passwd --stdin elasticsearch

修改elasticsearch的配置文件

elasticsearch的config文件夾裏面有兩個配置文件：elasticsearch.yml（es的基本配置文件）和logging.yml（日誌配置文件，es也是使用log4j來記錄日誌）vim

[root@localhost elasticsearch-6.5.1]# cd config/
[root@localhost config]# vim elasticsearch.yml 
# ---------------------------------- Cluster -----------------------------------
cluster.name: ES-Justin
# ------------------------------------ Node ------------------------------------
node.name: ES-Justin-master
# ----------------------------------- Paths ------------------------------------
path.data: /app/elasticsearch-6.5.1/data
path.logs: /app/elasticsearch-6.5.1/logs
# ----------------------------------- Memory -----------------------------------
bootstrap.memory_lock: false
# ---------------------------------- Network -----------------------------------
network.host: 10.15.97.136
http.port: 9200
transport.tcp.port: 9300
# --------------------------------- Discovery ----------------------------------
discovery.zen.ping.unicast.hosts: ["10.15.97.136", "10.15.97.137","10.15.97.138"]
discovery.zen.minimum_master_nodes: 2
# ---------------------------------- Various -----------------------------------
#action.destructive_requires_name: true
discovery.zen.commit_timeout: 100s  #如下是集羣參數調整
discovery.zen.publish_timeout: 100s
discovery.zen.ping_timeout: 100s
discovery.zen.fd.ping_timeout: 100s
discovery.zen.fd.ping_interval: 10s
discovery.zen.fd.ping_retries: 10
action.destructive_requires_name: true #集羣安全設置

http.cors.enabled: true
http.cors.allow-origin: "*"
[root@localhost config]# chown -R elasticsearch:elasticsearch /app/elasticsearch-6.5.1/

將以上/app/elasticsearch-6.5.1文件夾copy到另外2臺機器。windows

elasticsearch.yml說明：

cluster.name:ES-Justin      #es的集羣名稱，默認是elasticsearch，es會自動發如今同一網段下的es，若是在同一網段下有多個集羣，就能夠用這個屬性來區分不一樣的集羣。
    node.name:」ES-Justin-master」            #節點名稱，默認隨機指定一個name列表中名字，該列表在es的jar包中config文件夾裏name.txt文件中。
    node.master:true                #指定該節點是否有資格被選舉成爲node，默認是true，es是默認集羣中的第一臺機器爲master，若是這臺機掛了就會從新選舉master。
    node.data:true                  #指定該節點是否存儲索引數據，默認爲true。
    node.ingest: true

node.master、node.data、node.ingest有4種組合：
node.master: true
node.data: true
node.ingest: true
elasticsearch默認配置，表示這個節點既有成爲主節點的資格，又能夠存儲數據，還能夠做爲預處理節點，這個時候若是某個節點被選舉成爲了真正的主節點，那麼他還要存儲數據，這樣至關於主節點和數據節點的角色混合到一塊了，節點的壓力就比較大了。
node.master: false
node.data: true
node.ingest: false
表示這個節點沒有成爲主節點的資格，也就不參與選舉，只會存儲數據。這個節點稱爲 data(數據)節點。在集羣中須要單獨設置幾個這樣的節點負責存儲數據。後期提供存儲和查詢服務
node.master: true
node.data: false
node.ingest: false
表示這個節點不會存儲數據，有成爲主節點的資格，能夠參與選舉，有可能成爲真正的主節點。這個節點稱爲master節點
node.master: false
node.data: false
node.ingest: true
表示這個節點即不會成爲主節點，也不會存儲數據，這個節點的意義是做爲一個 client(客戶端)節點，主要是針對海量請求的時候能夠進行負載均衡。在ElasticSearch5.x 以後該節點稱之爲：coordinate 節點，其中還增長了一個叫：ingest 節點，用於預處理數據（索引和搜索階段均可以用到），做爲通常應用是不須要這個預處理節點作什麼額外的預處理過程，這個節點和 client 節點之間能夠看作是等同的，在代碼中配置訪問節點就均可以配置這些 ingest 節點便可。

建議集羣中設置 3臺以上的節點做爲 master 節點【node.master: true node.data: false node.ingest:false】，這些節點只負責成爲主節點，維護整個集羣的狀態。再根據數據量設置一批 data節點【node.master: false node.data: true node.ingest:false】，這些節點只負責存儲數據，後期提供創建索引和查詢索引的服務，這樣的話若是用戶請求比較頻繁，這些節點的壓力也會比較大，因此在集羣中建議再設置一批 ingest 節點也稱之爲 client 節點【node.master: false node.data: false node.ingest:true】，這些節點只負責處理用戶請求，實現請求轉發，負載均衡等功能。
master節點：普通服務器便可(CPU 內存消耗通常)
data節點：主要消耗磁盤，內存
client | ingest 節點：普通服務器便可(若是要進行分組聚合操做的話，建議這個節點內存也分配多一點

index.number_of_shards:5        #設置默認索引分片個數，默認爲5片。
    index.number_of_replicas:1      #設置默認索引副本個數，默認爲1個副本。
    path.conf:/path/to/conf         #設置配置文件的存儲路徑，默認是es根目錄下的config文件夾。
    path.data:/path/to/data         #設置索引數據的存儲路徑，默認是es根目錄下的data文件夾，能夠設置多個存儲路徑，用逗號隔開，例：
    path.data:/path/to/data1,/path/to/data2
    path.work:/path/to/work         #設置臨時文件的存儲路徑，默認是es根目錄下的work文件夾。
    path.logs:/path/to/logs         #設置日誌文件的存儲路徑，默認是es根目錄下的logs文件夾
    path.plugins:/path/to/plugins   #設置插件的存放路徑，默認是es根目錄下的plugins文件夾
    bootstrap.mlockall:true         #設置爲true來鎖住內存。由於當jvm開始swapping時es的效率會下降，因此要保證它不swap，能夠把ES_MIN_MEM和ES_MAX_MEM兩個環境變量設置成同一個值，而且保證機器有足夠的內存分配給es。同時也要容許elasticsearch的進程能夠鎖住內存，linux下能夠經過`ulimit-lunlimited`命令。
    network.bind_host:192.168.0.1   #設置綁定的ip地址，能夠是ipv4或ipv6的，默認爲0.0.0.0。
    network.publish_host:192.168.0.1 #設置其它節點和該節點交互的ip地址，若是不設置它會自動判斷，值必須是個真實的ip地址。
    network.host:192.168.0.1        #這個參數是用來同時設置bind_host和publish_host上面兩個參數。
    transport.tcp.port:9300         #設置節點間交互的tcp端口，默認是9300。
    transport.tcp.compress:true     #設置是否壓縮tcp傳輸時的數據，默認爲false，不壓縮。
    http.port:9200                  #設置對外服務的http端口，默認爲9200。
    http.max_content_length:100mb   #設置內容的最大容量，默認100mb
    http.enabled:false              #是否使用http協議對外提供服務，默認爲true，開啓。
    gateway.type:local              #gateway的類型，默認爲local即爲本地文件系統，能夠設置爲本地文件系統，分佈式文件系統，hadoop的HDFS，和amazon的s3服務器，其它文件系統的設置方法下次再詳細說。
    gateway.recover_after_nodes:1   #設置集羣中N個節點啓動時進行數據恢復，默認爲1。
    gateway.recover_after_time:5m   #設置初始化數據恢復進程的超時時間，默認是5分鐘。
    gateway.expected_nodes:2        #設置這個集羣中節點的數量，默認爲2，一旦這N個節點啓動，就會當即進行數據恢復。
    cluster.routing.allocation.node_initial_primaries_recoveries:4  #初始化數據恢復時，併發恢復線程的個數，默認爲4。
    cluster.routing.allocation.node_concurrent_recoveries:2         #添加刪除節點或負載均衡時併發恢復線程的個數，默認爲4。
    indices.recovery.max_size_per_sec:0                             #設置數據恢復時限制的帶寬，如入100mb，默認爲0，即無限制。
    indices.recovery.concurrent_streams:5                           #設置這個參數來限制從其它分片恢復數據時最大同時打開併發流的個數，默認爲5。
    discovery.zen.minimum_master_nodes:1                            #設置這個參數來保證集羣中的節點能夠知道其它N個有master資格的節點。默認爲1，對於大的集羣來講，能夠設置大一點的值（2-4）
    discovery.zen.ping.timeout:3s                                   #設置集羣中自動發現其它節點時ping鏈接超時時間，默認爲3秒，對於比較差的網絡環境能夠高點的值來防止自動發現時出錯。
    discovery.zen.ping.multicast.enabled:false                      #設置是否打開多播發現節點，默認是true。
    discovery.zen.ping.unicast.hosts:[「host1″,」host2:port」,」host3[portX-portY]」]    #設置集羣中master節點的初始列表，能夠經過這些節點來自動發現新加入集羣的節點       
    node.max_local_storage_nodes: 2 # 多個節點能夠在同一個安裝路徑啓動
    bootstrap.memory_lock: false        #配置內存使用用交換分區
    http.cors.enabled: true         #增長新的參數，是否支持跨域，這樣head插件能夠訪問es (5.x版本，若是沒有能夠本身手動加)
    http.cors.allow-origin: "*"     #增長新的參數，*表示支持全部域名，這樣head插件能夠訪問es (5.x版本，若是沒有能夠本身手動加)

啓動ElasticSearch

[root@localhost bin]# su - elasticsearch
[elasticsearch@localhost ~]$ cd /app/elasticsearch-6.5.1/bin/
[elasticsearch@localhost bin]$ ./elasticsearch -d

注意：啓動的時候要以非root用戶啓動，不然會報錯。若是非要用root啓動，在啓動的時候添加參數：./elasticsearch -Des.insecure.allow.root=true

日誌查看

ElasticSearch的日誌的名稱是以集羣名稱命名的。

[elasticsearch@localhost bin]$ tail -500f ../logs/ES-Justin.log

驗證

[root@localhost bin]# curl http://10.15.97.136:9200     節點信息
[root@localhost bin]# curl 'http://10.15.97.136:9200/_cluster/health?pretty'       集羣的健康檢查
{
  "cluster_name" : "ES-Justin",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 6,
  "active_shards" : 12,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}
[root@localhost bin]# curl 'http://10.15.97.136:9200/_cluster/state?pretty'            集羣的詳細信息

ElasticSearch-head插件安裝

[root@localhost app]# yum install -y npm
[root@localhost app]# git clone git://github.com/mobz/elasticsearch-head.git
[root@localhost app]# cd elasticsearch-head
[root@localhost elasticsearch-head]# npm install -g grunt --registry=https://registry.npm.taobao.org       安裝grunt
[root@localhost elasticsearch-head]# npm install

在elasticsearch-head/node_modules/grunt下若是沒有grunt二進制程序，須要執行：npm install grunt --save
修改 elasticsearch-head/Gruntfile.js文件

[root@localhost elasticsearch-head]# vim Gruntfile.js
connect: {
                        server: {
                                options: {
                                        hostname: '10.15.97.136',    #新增
                                        port: 9100,
                                        base: '.',
                                        keepalive: true
                                }
                        }
                }

修改 elasticsearch-head/_site/app.js 中http://localhost:9200字段到本機ES端口與IP

[root@localhost elasticsearch-head]# vim _site/app.js
init: function(parent) {
                        this._super();
                        this.prefs = services.Preferences.instance();
                        this.base_uri = this.config.base_uri || this.prefs.get("app-base_uri") || "http://10.15.97.136:9200";
                        if( this.base_uri.charAt( this.base_uri.length - 1 ) !== "/" ) {
                                // XHR request fails if the URL is not ending with a "/"
                                this.base_uri += "/";
                        }

啓動head

[root@localhost elasticsearch-head]# cd /app/elasticsearch-head/node_modules/grunt/bin
[root@localhost bin]# ./grunt server &
[1] 17780
        [root@localhost bin]# Running "connect:server" (connect) task
        Waiting forever...
        Started connect web server on http://10.15.97.136:9100
[root@localhost bin]# netstat -antp |grep 9100
tcp        0      0 10.15.97.136:9100       0.0.0.0:*               LISTEN      17780/grunt         
[root@localhost bin]#

* ElasticSearch 性能優化

系統級別調優

去掉文件句柄限制

Linux中，每一個進程默認打開的最大文件句柄數是1000，

[root@localhost bin]# vim /etc/security/limits.conf
elasticsearch hard nofile 65536
elasticsearch soft nofile 131072
elasticsearch hard nproc 4096
elasticsearch soft nproc 4096
* hard memlock unlimited
* soft memlock unlimited
[root@localhost ~]# vim /etc/pam.d/login
session    required     /lib/security/pam_limits.so
[root@localhost ~]# ulimit -l
64
[root@localhost ~]# reboot
[root@localhost ~]# ulimit -l
unlimited
[root@localhost ~]#

虛擬內存設置

max_map_count定義了進程能擁有的最多內存區域，默認啓動ES會提示：
max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

[root@localhost ~]# vim /etc/sysctl.conf
vm.max_map_count=655360
[root@localhost ~]# sysctl -p

儘可能不用交換空間

若是不由用交換空間，應控制操做系統嘗試交換內存的積極性，能夠嘗試下降swappiness。該值控制操做系統嘗試交換內存的積極性。這能夠防止在正常狀況下交換，但仍然容許操做系統在緊急內存狀況下進行交換。

[root@localhost ~]# vim /etc/sysctl.conf
vm.swappiness = 1
[root@localhost ~]#

1的swappiness優於0，由於在某些內核版本上，swappiness爲0能夠調用OOM殺手。

ElasticSearch級別調優

ElasticSearch內存

[root@localhost ~]# vim /app/elasticsearch-6.5.1/config/jvm.options
-Xms2g
-Xmx2g
[root@localhost ~]#

ES內存建議採用分配機器物理內存的一半（Lucene利用底層操做系統來緩存內存中的數據結構，Lucene的性能依賴於與操做系統的這種交互，若是把全部可用的內存都給了Elasticsearch的堆，Lucene就不會有任何剩餘的內存。會嚴重影響性能。），但最大不要超過32GB，如何判斷內存設置是否恰當，看ES啓動日誌中的：

[elasticsearch@localhost bin]$ grep "compressed ordinary object pointers" ../logs/ES-Justin.log 
[2018-12-05T12:41:58,098][INFO ][o.e.e.NodeEnvironment    ] [ES-Justin-Salve2] heap size [1.9gb], compressed ordinary object pointers [true]
[elasticsearch@localhost bin]$

若是[true]，則表示ok。一半超過32GB會爲[false]，請依次下降內存設置試試。

堆內存爲何不能超過物理機內存的一半？

堆對於Elasticsearch絕對重要，被許多內存數據結構用來提供快速操做，但還有另一個很是重要的內存使用者Lucene。
Lucene旨在利用底層操做系統來緩存內存中的數據結構。 Lucene段(segment)存儲在單個文件中。由於段是一成不變的，因此這些文件永遠不會改變。這使得它們很是容易緩存，而且底層操做系統將愉快地將熱段（hot segments）保留在內存中以便更快地訪問。這些段包括倒排索引（用於全文搜索）和文檔值（用於聚合）。
Lucene的性能依賴於與操做系統的這種交互。可是若是把全部可用的內存都給了Elasticsearch的堆，那麼Lucene就不會有任何剩餘的內存。這會嚴重影響性能。建議是將可用內存的50％提供給Elasticsearch堆，而將其餘50％空閒。它不會被閒置; Lucene會高興地吞噬掉剩下的東西。

堆內存爲何不能超過32GB

在Java中，全部對象都分配在堆上並由指針引用。普通的對象指針（OOP）指向這些對象，傳統上它們是CPU本地字的大小：32位或64位，取決於處理器。
對於32位系統，這意味着最大堆大小爲4 GB。對於64位系統，堆大小可能會變得更大，可是64位指針的開銷意味着僅僅由於指針較大而存在更多的浪費空間。而且比浪費的空間更糟糕，當在主存儲器和各類緩存（LLC，L1等等）之間移動值時，較大的指針消耗更多的帶寬。
Java使用稱爲壓縮oops的技巧來解決這個問題。而不是指向內存中的確切字節位置，指針引用對象偏移量。這意味着一個32位指針能夠引用40億個對象，而不是40億個字節。最終，這意味着堆能夠增加到約32 GB的物理尺寸，同時仍然使用32位指針。
一旦你穿越了這個神奇的〜32 GB的邊界，指針就會切換回普通的對象指針。每一個指針的大小增長，使用更多的CPU內存帶寬，而且實際上會丟失內存。實際上，在使用壓縮oops得到32 GB如下堆的相同有效內存以前，須要大約40-50 GB的分配堆。
小結：即便你有足夠的內存空間，儘可能避免跨越32GB的堆邊界；不然會致使浪費了內存，下降了CPU的性能，並使GC在大堆中掙扎。

堆內存優化建議
方式一：最好的辦法是在系統上徹底禁用交。
方式二：控制操做系統嘗試交換內存的積極性。
方式三：mlockall容許JVM鎖定其內存並防止其被操做系統交換。

給ES分配的內存有一個魔法上限值26GB，這樣能夠確保啓用zero based Compressed Oops，這樣性能纔是最佳的。

*FQ

Starting elasticsearch: Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x0000000085330000, 2060255232, 0) failed; error='Cannot allocate memory' (errno=12)
        #
        # There is insufficient memory for the Java Runtime Environment to continue.
        # Native memory allocation (mmap) failed to map 2060255232 bytes for committing reserved memory.
        # An error report file with more information is saved as:
        # /tmp/hs_err_pid2616.log

默認使用的內存大小爲2G，虛擬機沒有那麼多的空間
解決方法：vim /etc/elasticsearch/jvm.options
-Xms512m
-Xmx512m

[2018-12-5T19:19:01,641][INFO ][o.e.b.BootstrapChecks    ] [elk-1] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2018-12-5T19:19:01,658][ERROR][o.e.b.Bootstrap          ] [elk-1] node validation exception
        [1] bootstrap checks failed
        [1]: system call filters failed to install; check the logs and fix your configuration or disable system call filters at your own risk

解決方法： vim /etc/elasticsearch/elasticsearch.yml
bootstrap.system_call_filter: false

[2018-12-06T19:19:01,641][INFO ][o.e.b.BootstrapChecks    ] [elk-1] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2018-12-06T19:19:01,658][ERROR][o.e.b.Bootstrap          ] [elk-1] node validation exception
[1] bootstrap checks failed
[1]: system call filters failed to install; check the logs and fix your configuration or disable system call filters at your own risk

解決：修改配置文件，在配置文件添加一項參數（目前還沒明白此參數的做用）
vim /etc/elasticsearch/elasticsearch.yml
bootstrap.system_call_filter: false