ElasticSearch——冷熱(hot&warm)架構部署

時間 2020-04-18

標籤 elasticsearch 冷熱 hot&warm hot warm 架構部署欄目日誌分析简体版

原文原文鏈接

背景

最近在作訂單數據存儲到ElasticSearch，考慮到數據量比較大，採用冷熱架構來存儲，每個月創建一個新索引，數據先寫入到熱索引，經過工具將3個月後的索引自動遷移到冷節點上。node

ElasticSearch版本：6.2.4linux

冷熱架構

官方叫法：熱暖架構——「Hot-Warm」 Architecture。shell

通俗解讀：熱節點存放用戶最關心的熱數據；溫節點或者冷節點存放用戶不太關心或者關心優先級低的冷數據或者暖數據。json

1.1 官方解讀冷熱架構

爲了保證Elasticsearch的讀寫性能，官方建議磁盤使用SSD固態硬盤。然而Elasticsearch要解決的是海量數據的存儲和檢索問題，海量的數據就意味須要大量的存儲空間，若是都使用SSD固態硬盤成本將成爲一個很大的問題，這也是制約許多企業和我的使用Elasticsearch的因素之一。爲了解決這個問題，Elasticsearch冷熱分離架構應運而生。bootstrap

冷熱架構是一項十分強大的功能，可以讓您將 Elasticsearch 部署劃分爲「熱」數據節點和「冷」數據節點。vim

熱數據節點處理全部新輸入的數據，而且存儲速度也較快，以便確保快速地採集和檢索數據。
冷節點的存儲密度則較大，如需在較長保留期限內保留日誌數據，不失爲一種具備成本效益的方法。

將這兩種類型的數據節點結合到一塊兒後，您便可以有效地處理輸入數據，並將其用於查詢，同時還能在節省成本的前提下在較長時間內保留數據。此架構對日誌用例來講尤爲大有幫助，由於在日誌用例中，人們的大部分精力都會專一於近期的日誌（例如最近兩週），而較早的日誌（因爲合規性或者其餘緣由仍須要保留）則能夠接受較慢的查詢時間。bash

1.2 典型應用場景

一句話：在成本有限的前提下，讓客戶關注的實時數據和歷史數據硬件隔離，最大化解決客戶反應的響應時間慢的問題。業務場景描述：
每日增量6TB日誌數據，高峯時段寫入及查詢頻率都較高，集羣壓力較大，查詢ES時，常出現查詢緩慢問題。服務器

ES集羣的索引寫入及查詢速度主要依賴於磁盤的IO速度，冷熱數據分離的關鍵爲使用SSD磁盤存儲熱數據，提高查詢效率。
若所有使用SSD，成本太高，且存放冷數據較爲浪費，於是使用普通SATA磁盤與SSD磁盤混搭，可作到資源充分利用，性能大幅提高的目標。

實現原理

藉助 Elasticsearch的分片分配策略，確切的說是：架構

第一：集羣節點層面支持規劃節點類型，這是劃分熱暖節點的前提。

具體方式是在elasticsearch.yml文件中增長如下配置：app

node.attr.{attribute}: {value}

其中attribute爲用戶自定義的任意標籤名，value爲該節點對應的該標籤的值，例如對於冷熱分離，可使用以下設置

node.attr.temperature: hot //熱節點
node.attr.temperature: cold //冷節點

第二：索引層面支持將數據路由到給定節點，這爲數據寫入冷、熱節點作了保障。

具體方式是在建立模板或索引時指定屬性：

index.routing.allocation.include.{attribute} 　　//表示索引能夠分配在包含多個值中其中一個的節點上。
index.routing.allocation.require.{attribute}　　 //表示索引要分配在包含索引指定值的節點上（一般通常設置一個值）。
index.routing.allocation.exclude.{attribute}　　 //表示索引只能分配在不包含全部指定值的節點上。

實現方案

1.1 集羣設計：

節點名稱	服務器類型	存儲數據
es-master1	4C 16G 1T SATA	元數據
es-master2
es-master3
es-hot1	16C 64G 1T SSD	Hot
es-hot2
es-hot3
es-cold1	8C 32G 5T SATA	Cold
es-cold2

2.1 配置Master節點

Master1節點配置（其餘節點配置相似）

[root@es-master1 ~]# cd /etc/elasticsearch/
[root@es-master1 elasticsearch]# vim elasticsearch.yml
cluster.name: linuxplus
node.name: es-master1.linuxplus.com
node.attr.rack: r6
node.master: true
node.data: false
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 0.0.0.0
discovery.zen.ping.unicast.hosts: ["es-master1.linuxplus.com:9300","es-master2.linuxplus.com:9300","es-master3.linuxplus.com:9300","es-hot1.linuxplus.com:9300","es-hot2.linuxplus.com:9300","es-hot3.linuxplus.com:9300","es-stale1.linuxplus.com:9300","es-stale2.linuxplus.com:9300"]
discovery.zen.minimum_master_nodes: 1
bootstrap.system_call_filter: false

2.2 配置Hot節點

Hot1節點配置（其餘節點配置相似）

[root@es-hot1 elasticsearch]# vim elasticsearch.yml
cluster.name: linuxplus
node.name: es-hot1.linuxplus.com     # 提示：自行修改其餘節點的名稱
node.attr.rack: r1
node.master: false
node.data: true
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 10.10.10.24           # 提示：自行修改其餘節點的IP
discovery.zen.ping.unicast.hosts: ["es-master1.linuxplus.com:9300","es-master2.linuxplus.com:9300","es-master3.linuxplus.com:9300"]
discovery.zen.minimum_master_nodes: 1
bootstrap.system_call_filter: false node.attr.hotwarm_type: hot　　　　　# 標識爲熱數據節點　
[root@es-hot1 elasticsearch]# /etc/init.d/elasticsearch start

2.3 配置Cold節點

Cold1節點配置（其餘節點配置相似）

[root@es-stale1 elasticsearch]# vim elasticsearch.yml
cluster.name: linuxplus
node.name: es-stale1.linuxplus.com　　　　# 提示：自行修改其餘節點的名稱
node.attr.rack: r1
node.master: false
node.data: true
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 10.10.10.27　　　　　　　　　# 提示：自行修改其餘節點的IP
discovery.zen.ping.unicast.hosts: ["es-master1.linuxplus.com:9300","es-master2.linuxplus.com:9300","es-master3.linuxplus.com:9300"]
discovery.zen.minimum_master_nodes: 1
bootstrap.system_call_filter: false node.attr.hotwarm_type: cold　　　　　　# 標識爲冷數據節點
[root@es-stale1 elasticsearch]# /etc/init.d/elasticsearch start

3.1 數據寫入

方案一：經過模板指定冷熱數據節點

PUT _template/order_template
{
    "index_patterns": "order_*",
    "settings": {
　　　　  "index.routing.allocation.require.hotwarm_type": "hot",　　# 指定默認爲熱數據節點　　　　
        "index.number_of_replicas": "0"

     }
}

　注：以【order_】開頭索引命名的，都將其數據放到hot節點上

方案二：經過索引指定冷熱數據節點

PUT /order_2019-12
{
  "settings": {
    "index.routing.allocation.require.hotwarm_type": "hot",　　　# 指定爲熱數據節點　
    "number_of_replicas": 0
  }
}

熱節點效果圖：分別建立2個索引，包含3個分片1個副本

4.1 數據遷移至冷節點

方案一：手工修改索引路由爲：cold

ES看到有新的標記就會將這個索引自動遷移到冷數據節點中

#在kibana裏操做:

PUT /order_stpprdinf_2019-12/_settings 
{ 
  "settings": { 
    "index.routing.allocation.require.hotwarm_type": "cold"    # 指定數據存放到冷數據節點
  } 
}

方案二：經過shell腳本按期遷移數據

#!/bin/bash  hot數據（保留7天）遷移到cold

Time=$(date -d "1 week ago" +"%Y.%m.%d")
Hostname=$(hostname)
arr=("order_stpprdinf" "order_stppayinf")
for var in ${arr[@]}
do
    curl -H "Content-Type: application/json" -XPUT http://$Hostname:9200/$var_$Time/_settings?pretty -d'
    { 
       "settings": { 
             "index.routing.allocation.require.hotwarm_type": "cold"    # 指定數據存放到冷數據節點
        } 
    }'
done

方案三：經過curator按期遷移數據

　　步驟1：建立config.yml，填寫Elasticsearch集羣配置信息。

# Rmember, leave a key empty if there is no value.  None will be a string,
# not a Python "NoneType"
client:
  hosts: ["10.0.101.100", "10.0.101.101", "10.0.101.102"]
  port: 9200
  url_prefix:
  use_ssl: False
  certificate:
  client_cert:
  client_key:
  ssl_no_validate: False
  http_auth:
  timeout: 30
  master_only: False

logging:
  loglevel: INFO
  logfile: /opt/elasticsearch-curator/logs/run.log
  logformat: default
  blacklist: ['elasticsearch', 'urllib3']

　　步驟2：建立action.yml

# Remember, leave a key empty if there is no value.  None will be a string,
# not a Python "NoneType"
#
# Also remember that all examples have 'disable_action' set to True.  If you
# want to use this action as a template, be sure to set this to False after
# copying it.
actions:
  1:
    action: allocation　　　　　　　　　# 這裏執行操做類型爲刪除索引
    description: >-
      Apply shard allocation routing to 'require' 'tag=cold' for hot/cold node
      setup for logstash- indices older than 3 days, based on index_creation date.
    options:
      key: hotwarm_type　　　　　　　　 # 這是es節點中定義的屬性
      value: cold　　　　　　　　　　　　# 這是要更新的值，變爲冷節點
      allocation_type: require　　　　 # 這裏alloction的類型
      disable_action: false
    filters:
    - filtertype: pattern　　
      kind: prefix　　　　　　　　　　　 # 這裏是指匹配前綴爲 「order_」 的索引，還能夠支持正則匹配等，詳見官方文檔
      value: order_ - filtertype: age　　　　　　　　　 # 這裏匹配時間
      source: name　　　　　　　　　　　 # 這裏根據索引name來匹配，還能夠根據字段等，詳見官方文檔
      direction: older
      timestring: "%Y-%m"　　　　　　　 # 用於匹配和提取索引或快照名稱中的時間戳
      unit: months　　　　　　　　　　　　# 這裏定義的是months，還有days,weeks等，總時間爲unit * unit_count
      unit_count: 3

　　步驟3：運行curator

單次運行：

cd /opt/elasticsearch-curator
curator --config config.yml action.yml

cron定時任務運行：

crontab -e
#添加以下配置,天天0時運行一次
0 0 */1 * * curator --config /opt/elasticsearch-curator/config.yml /opt/elasticsearch-curator/action.yml

遷移冷節點效果圖：

應用

由於按時間分了多個索引，查詢的時候能夠跨多個索引進行查詢，打分、排序、分頁和搜單個索引沒什麼區別。

    /**
     * 查詢.
     *
     * @param indexName    索引名稱
     * @param type         索引類型
     * @param conditionMap 查詢條件Map
     * @param orderByMap   排序Map
     * @param page         分頁page
     * @return 查詢結果
     */
    @Override
    public List<Map<String, Object>> query(final String[] indexName, final String type,
                                           final Map<String, Object> conditionMap, final Map<String, String> orderByMap,
                                           final Page page) {
        logger.info("查詢elasticSearch數據......");
        logger.info("indexName={}", Arrays.toString(indexName));
        logger.info("conditionMap={}", conditionMap.toString());
        logger.info("orderByMap={}", orderByMap.toString());

        final long currentTimeMillis = System.currentTimeMillis();
        RestHighLevelClient client = null;
        List<Map<String, Object>> resultList = new ArrayList<>();
        try {
            // 一、建立鏈接
            client = createConnect();

            // 二、建立search請求
            SearchRequest searchRequest = new SearchRequest(indexName);
            searchRequest.types(type);


　　　　　　　這裏省略幾百行代碼................  

}