前面介紹並初試了es6.5系列的單節點的操做,如今搭建es6.5系列的集羣:html
環境:三節點:master-172.16.23.128.node1-172.16.23.129.node2-172.16.23.130,首先查看es的服務狀態:node
[root@master ~]# ansible all_nodes -m shell -a "systemctl status elasticsearch"|grep -i running Active: active (running) since 六 2018-12-29 12:06:55 CST; 3h 33min ago Active: active (running) since 六 2018-12-29 12:07:43 CST; 3h 32min ago Active: active (running) since 六 2018-12-29 15:38:47 CST; 1min 42s ago
查看各節點上面的es的配置文件:python
[root@master ~]# ansible all_nodes -m shell -a 'cat /etc/elasticsearch/elasticsearch.yml|egrep -v "^$|^#"' 172.16.23.128 | CHANGED | rc=0 >> cluster.name: estest node.name: esnode2 path.data: /var/lib/elasticsearch path.logs: /var/log/elasticsearch network.host: 0.0.0.0 http.port: 9200 discovery.zen.ping.unicast.hosts: ["172.16.23.128", "172.16.23.131"] 172.16.23.130 | CHANGED | rc=0 >> path.data: /var/lib/elasticsearch path.logs: /var/log/elasticsearch 172.16.23.129 | CHANGED | rc=0 >> cluster.name: es node.name: node1 path.data: /var/lib/elasticsearch path.logs: /var/log/elasticsearch network.host: 0.0.0.0 http.port: 9200
如今基於discovery.zen作集羣配置參考:https://www.elastic.co/guide/en/elasticsearch/reference/6.5/modules-discovery-zen.html,具體配置以下:es6
[root@master ~]# ansible all_nodes -m shell -a 'cat /etc/elasticsearch/elasticsearch.yml|egrep -v "^$|^#"' 172.16.23.128 | CHANGED | rc=0 >> cluster.name: estest node.name: master path.data: /var/lib/elasticsearch path.logs: /var/log/elasticsearch network.host: 0.0.0.0 http.port: 9200 discovery.zen.ping.unicast.hosts: ["172.16.23.128", "172.16.23.129", "172.16.23.130"] 172.16.23.130 | CHANGED | rc=0 >> cluster.name: estest node.name: node2 path.data: /var/lib/elasticsearch path.logs: /var/log/elasticsearch network.host: 0.0.0.0 http.port: 9200 discovery.zen.ping.unicast.hosts: ["172.16.23.128", "172.16.23.129", "172.16.23.130"] 172.16.23.129 | CHANGED | rc=0 >> cluster.name: estest node.name: node1 path.data: /var/lib/elasticsearch path.logs: /var/log/elasticsearch network.host: 0.0.0.0 http.port: 9200 discovery.zen.ping.unicast.hosts: ["172.16.23.128", "172.16.23.129", "172.16.23.130"]
重啓elasticsearch服務:shell
[root@master ~]# ansible all_nodes -m shell -a "systemctl restart elasticsearch" 172.16.23.130 | CHANGED | rc=0 >> 172.16.23.128 | CHANGED | rc=0 >> 172.16.23.129 | CHANGED | rc=0 >>
而後查看集羣狀態:json
[root@master ~]# curl -X GET "localhost:9200/_cluster/health" -s|python -m json.tool { "active_primary_shards": 0, "active_shards": 0, "active_shards_percent_as_number": 100.0, "cluster_name": "estest", "delayed_unassigned_shards": 0, "initializing_shards": 0, "number_of_data_nodes": 3, "number_of_in_flight_fetch": 0, "number_of_nodes": 3, "number_of_pending_tasks": 0, "relocating_shards": 0, "status": "green", "task_max_waiting_in_queue_millis": 0, "timed_out": false, "unassigned_shards": 0 }
查看節點個數:bash
[root@master ~]# curl -X GET "localhost:9200/_cat/nodes?v" ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name 172.16.23.128 28 71 3 0.04 0.11 0.08 mdi * master 172.16.23.130 29 67 4 0.04 0.11 0.10 mdi - node2 172.16.23.129 28 58 4 0.12 0.20 0.13 mdi - node1
單單隻看master節點:app
[root@master ~]# curl -X GET "localhost:9200/_cat/master?v" id host ip node hVY-U_ocQueMtcryoGGbTg 172.16.23.128 172.16.23.128 master
查看集羣health:curl
[root@master ~]# curl -X GET "localhost:9200/_cat/health?v" epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent 1546070536 08:02:16 estest green 3 3 0 0 0 0 0 0 - 100.0%
查看nodeattrs屬性:elasticsearch
[root@master ~]# curl -X GET "localhost:9200/_cat/nodeattrs?v" node host ip attr value master 172.16.23.128 172.16.23.128 ml.machine_memory 3956293632 master 172.16.23.128 172.16.23.128 xpack.installed true master 172.16.23.128 172.16.23.128 ml.max_open_jobs 20 master 172.16.23.128 172.16.23.128 ml.enabled true node2 172.16.23.130 172.16.23.130 ml.machine_memory 3956293632 node2 172.16.23.130 172.16.23.130 ml.max_open_jobs 20 node2 172.16.23.130 172.16.23.130 xpack.installed true node2 172.16.23.130 172.16.23.130 ml.enabled true node1 172.16.23.129 172.16.23.129 ml.machine_memory 3956293632 node1 172.16.23.129 172.16.23.129 ml.max_open_jobs 20 node1 172.16.23.129 172.16.23.129 xpack.installed true node1 172.16.23.129 172.16.23.129 ml.enabled true
如今手動建立一個index爲test:
# curl -X PUT "localhost:9200/test"
而後查看各節點index狀況:
[root@master ~]# ansible all_nodes -m shell -a 'curl -X GET "localhost:9200/_cat/indices?v" -s' [WARNING]: Consider using the get_url or uri module rather than running curl. If you need to use command because get_url or uri is insufficient you can add warn=False to this command task or set command_warnings=False in ansible.cfg to get rid of this message. 172.16.23.128 | CHANGED | rc=0 >> health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open test l0Js1PJLTPSFEdXhanVSHA 5 1 0 0 1.7kb 1.1kb 172.16.23.130 | CHANGED | rc=0 >> health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open test l0Js1PJLTPSFEdXhanVSHA 5 1 0 0 1.7kb 1.1kb 172.16.23.129 | CHANGED | rc=0 >> health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open test l0Js1PJLTPSFEdXhanVSHA 5 1 0 0 1.7kb 1.1kb
查看index的分片狀況:
[root@master ~]# curl -X GET "localhost:9200/_cat/shards?v" index shard prirep state docs store ip node test 3 p STARTED 0 230b 172.16.23.128 master test 3 r STARTED 0 230b 172.16.23.130 node2 test 2 r STARTED 0 230b 172.16.23.129 node1 test 2 p STARTED 0 230b 172.16.23.130 node2 test 1 p STARTED 0 230b 172.16.23.129 node1 test 1 r UNASSIGNED test 4 p STARTED 0 230b 172.16.23.129 node1 test 4 r UNASSIGNED test 0 p STARTED 0 230b 172.16.23.128 master test 0 r STARTED 0 230b 172.16.23.130 node2
由上面能夠看出有兩個分片是UNASSIGNED狀態,查看集羣health:
[root@master ~]# curl -X GET "localhost:9200/_cat/health?v" epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent 1546071645 08:20:45 estest yellow 3 3 8 5 0 0 2 0 - 80.0%
使用下面的命令定位有問題的分片以及緣由:
[root@master ~]# curl -XGET localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason -s| grep UNASSIGNED test 1 r UNASSIGNED INDEX_CREATED test 4 r UNASSIGNED INDEX_CREATED
獲取分片更多信息:
[root@master ~]# curl -XGET localhost:9200/_cluster/allocation/explain?pretty { "index" : "test", "shard" : 1, "primary" : false, "current_state" : "unassigned", "unassigned_info" : { "reason" : "INDEX_CREATED", "at" : "2018-12-29T08:14:47.378Z", "last_allocation_status" : "no_attempt" }, "can_allocate" : "no", "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes", "node_allocation_decisions" : [ { "node_id" : "hVY-U_ocQueMtcryoGGbTg", "node_name" : "master", "transport_address" : "172.16.23.128:9300", "node_attributes" : { "ml.machine_memory" : "3956293632", "xpack.installed" : "true", "ml.max_open_jobs" : "20", "ml.enabled" : "true" }, "node_decision" : "no", "weight_ranking" : 1, "deciders" : [ { "decider" : "node_version", "decision" : "NO", "explanation" : "cannot allocate replica shard to a node with version [6.5.2] since this is older than the primary version [6.5.4]" } ] }, { "node_id" : "q95yZ4W4Tj6PaXyzLZZYDQ", "node_name" : "node1", "transport_address" : "172.16.23.129:9300", "node_attributes" : { "ml.machine_memory" : "3956293632", "ml.max_open_jobs" : "20", "xpack.installed" : "true", "ml.enabled" : "true" }, "node_decision" : "no", "weight_ranking" : 2, "deciders" : [ { "decider" : "same_shard", "decision" : "NO", "explanation" : "the shard cannot be allocated to the same node on which a copy of the shard already exists [[test][1], node[q95yZ4W4Tj6PaXyzLZZYDQ], [P], s[STARTED], a[id=j7V8PBUvQnOZzISPAxK9Uw]]" } ] }, { "node_id" : "_ADSWG04TEqNfX_88ejtzQ", "node_name" : "node2", "transport_address" : "172.16.23.130:9300", "node_attributes" : { "ml.machine_memory" : "3956293632", "ml.max_open_jobs" : "20", "xpack.installed" : "true", "ml.enabled" : "true" }, "node_decision" : "no", "weight_ranking" : 3, "deciders" : [ { "decider" : "node_version", "decision" : "NO", "explanation" : "cannot allocate replica shard to a node with version [6.5.2] since this is older than the primary version [6.5.4]" } ] } ] }
由上面結果可知node1,node2的es版本不一樣於master的es版本:
[root@master ~]# ansible all_nodes -m shell -a 'rpm -qa|grep elasticsearch' [WARNING]: Consider using the yum, dnf or zypper module rather than running rpm. If you need to use command because yum, dnf or zypper is insufficient you can add warn=False to this command task or set command_warnings=False in ansible.cfg to get rid of this message. 172.16.23.128 | CHANGED | rc=0 >> elasticsearch-6.5.2-1.noarch 172.16.23.130 | CHANGED | rc=0 >> elasticsearch-6.5.2-1.noarch 172.16.23.129 | CHANGED | rc=0 >> elasticsearch-6.5.4-1.noarch
將其中上面版本不一致的替換掉後,開啓es服務,而後觀察集羣以及shards狀況:
[root@master ~]# curl -X GET "localhost:9200/_cat/health?v" epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent 1546073143 08:45:43 estest red 1 1 2 2 0 0 8 0 - 20.0% [root@master ~]# curl -X GET "localhost:9200/_cat/health?v" epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent 1546073274 08:47:54 estest green 3 3 10 5 0 0 0 0 - 100.0%
[root@master ~]# curl -X GET "localhost:9200/_cat/shards?v" index shard prirep state docs store ip node test 3 p STARTED 0 261b 172.16.23.128 master test 3 r STARTED 0 261b 172.16.23.130 node2 test 4 r STARTED 0 261b 172.16.23.128 master test 4 p STARTED 0 261b 172.16.23.129 node1 test 2 r STARTED 0 261b 172.16.23.129 node1 test 2 p STARTED 0 261b 172.16.23.130 node2 test 1 p STARTED 0 261b 172.16.23.129 node1 test 1 r STARTED 0 261b 172.16.23.130 node2 test 0 p STARTED 0 261b 172.16.23.128 master test 0 r STARTED 0 261b 172.16.23.130 node2
索引test由10個分片組成,五個主分片,5個replica shard,replica shard是primary shard的副本,負責容錯,以及承擔讀請求負載,primary shard的數量在建立索引的時候就固定了,replica shard的數量能夠隨時修改,primary shard的默認數量是5,replica默認是1
[root@master ~]# curl -XGET localhost:9200/test?pretty { "test" : { "aliases" : { }, "mappings" : { }, "settings" : { "index" : { "creation_date" : "1546071287243", "number_of_shards" : "5", "number_of_replicas" : "1", "uuid" : "l0Js1PJLTPSFEdXhanVSHA", "version" : { "created" : "6050299" }, "provided_name" : "test" } } } }
[root@master ~]# curl -XGET localhost:9200/_cat/indices?v health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open test l0Js1PJLTPSFEdXhanVSHA 5 1 0 0 2.5kb 1.2kb
primary shard不能和本身的replica shard放在同一個節點上(不然節點宕機,primary shard和副本都丟失,起不到容錯的做用),可是能夠和其餘primary shard的replica shard放在 同一個節點上
節點以及shards數分配參考:https://blog.csdn.net/qq_38486203/article/details/80077844
而後這裏梳理一下es中一些基礎概念:
1.cluster:
集羣,一個ES集羣由一個或多個節點(Node)組成,每一個集羣都有一個cluster name做爲標識
2.node:
節點,一個ES實例就是一個node,一個機器能夠有多個實例,一個集羣由多個節點構成,大多數狀況下每一個node運行在一個獨立的環境或虛擬機上。
3.index:
索引,即一系列documents的集合
3.shard:
分片,ES是分佈式搜索引擎,每一個索引有一個或多個分片,索引的數據被分配到各個分片上,至關於一桶水用了N個杯子裝
分片有助於橫向擴展,N個分片會被儘量平均地(rebalance)分配在不一樣的節點上(例如你有2個節點,4個主分片(不考慮備份),那麼每一個節點會分到2個分片,後來你增長了2個節點,那麼你這4個節點上都會有1個分片,這個過程叫relocation,ES感知後自動完成)
分片是獨立的,對於一個Search Request的行爲,每一個分片都會執行這個Request.另外每一個分片都是一個Lucene Index,因此一個分片只能存放 Integer.MAX_VALUE - 128 = 2,147,483,519 個docs
4.replica:
複製,能夠理解爲備份分片,相應地有primary shard(主分片)
主分片和備分片不會出如今同一個節點上(防止單點故障),默認狀況下一個索引建立5個分片一個備份(即5primary+5replica=10個分片)
若是你只有一個節點,那麼5個replica都沒法分配(unassigned),此時cluster status會變成Yellow。
ES集羣的三種狀態:
Green: 全部主分片和備份分片都準備就緒,分配成功, 即便有一臺機器掛了(假設一臺機器實例),數據都不會丟失,可是會變成yellow狀態.
Yellow: 全部主分片準備就緒,但至少一個主分片(假設是A)對應的備份分片沒有就緒,此時集羣處於告警狀態,意味着高可用和容災能力降低.若是恰好A所在的機器掛了,而且你只設置了一個備份(已處於未繼續狀態), 那麼A的數據就會丟失(查詢不完整),此時集羣處於Red狀態.
Red:至少有一個主分片沒有就緒(直接緣由是找不到對應的備份分片成爲新的主分片),此時查詢的結果會出現數據丟失(不完整).
容災:primary分片丟失,replica分片就會被頂上去成爲新的主分片,同時根據這個新的主分片建立新的replica,集羣數據安然無恙
提升查詢性能:replica和primary分片的數據是相同的,因此對於一個query既能夠查主分片也能夠查備分片,在合適的範圍內多個replica性能會更優(但要考慮資源佔用也會提高[cpu/disk/heap]),另外index request只能發生在主分片上,replica不能執行index request。
對於一個索引,除非重建索引不然不能調整分片的數目(主分片數, number_of_shards),但能夠隨時調整replica數(number_of_replicas)。