Elasticsearch跨集羣數據備份和遷移

時間 2021-04-24

標籤 node mysql linux golang sql 數據庫 npm vim api 網絡欄目日誌分析简体版

原文原文鏈接

（一）、簡述
不一樣集羣的ES環境搭建在不一樣的環境中，能夠在公有云和私有云進行搭建，用戶也能夠根據本身的業務須要選擇合適的遷移方案，若是業務能夠停服務的話或者暫停寫操做，能夠採用離線遷移，離線遷移有四種方式可供選擇：node

Elasticsearch-dump
snapshot
reindex
logstash(golang)
（二）、使用方式
一、elasticsearch-dump的安裝和使用

1.一、elasticdump安裝
####下載nodes
wget https://nodejs.org/dist/v10.13.0/node-v10.13.0-linux-x64.tar.xz
####解壓
xz -d node-v10.13.0-linux-x64.tar.xz 
tar xvf node-v10.13.0-linux-x64.tar -C /opt
####配置和生效環境變量
vim /etc/profile
export NODE_HOME=/opt/node-v10.13.0-linux-x64
export PATH=$PATH:$NODE_HOME/bin
export NODE_PATH=$NODE_HOME/lib/node_modules
source /etc/profile
npm -v
1.二、
####安裝elasticdump工具
npm install elasticdump -g
elasticdump --help

1.三、遷移單個索引
####遷移setting、mapping、data
 elasticdump --input=http://10.16.0.8:9200/companydatabase --output=http://172.16.0.20:9200/companydatabase --type=settings
    elasticdump --input=http://10.16.0.8:9200/companydatabase --output=http://172.16.0.20:9200/companydatabase --type=mapping
    elasticdump --input=http://10.16.0.8:9200/companydatabase --output=http://172.16.0.20:9200/companydatabase --type=data

###備註

--input: 源地址，可爲ES集羣URL、文件或stdin,可指定索引，格式爲：{protocol}://{host}:{port}/{index}
 --input-index: 源ES集羣中的索引
 --output: 目標地址，可爲ES集羣地址URL、文件或stdout，可指定索引，格式爲：{protocol}://{host}:{port}/{index}
 --output-index: 目標ES集羣的索引
 --type: 遷移類型，默認爲data,代表只遷移數據，可選settings, analyzer, data, mapping, alias

二、snapshotmysql

適用於數據量大的場景
snapshot api是Elasticsearch用於對數據進行備份和恢復的一組api接口，能夠經過snapshot api進行跨集羣的數據遷移，原理就是從源ES集羣建立數據快照，而後在目標ES集羣中進行恢復。須要注意ES的版本問題：
目標ES集羣的主版本號(如5.6.4中的5爲主版本號)要大於等於源ES集羣的主版本號;
1.x版本的集羣建立的快照不能在5.x版本中恢復;
具體步驟以下：linux

一、源ES集羣中建立repository
建立快照前必須先建立repository倉庫，一個repository倉庫能夠包含多份快照文件，二、二、二、repository主要有一下幾種類型
fs: 共享文件系統，將快照文件存放於文件系統中
 url: 指定文件系統的URL路徑，支持協議：http,https,ftp,file,jar
 s3: AWS S3對象存儲,快照存放於S3中，以插件形式支持
 hdfs: 快照存放於hdfs中，以插件形式支持
 cos: 快照存放於騰訊雲COS對象存儲中，以插件形式支持
 若是須要從自建ES集羣遷移至騰訊雲的ES集羣，能夠直接使用fs類型倉庫，注意須要在Elasticsearch配置文件elasticsearch.yml設置倉庫路徑：
 path.repo: ["/data/es/backup"]

 一、配置文件
vim elasticsearch.yml
path.repo: ["/data/es/backup"]
二、註冊快照倉庫repository到ES中
PUT /_snapshot/es_backup
{
  "type": "fs",
  "settings": {
    "location": "/data/es/backup/"
  }
}
三、建立索引並添加文檔
DELETE test
PUT test/_doc/1
{
  "key": "value1",
  "name": "lqbyz",
  "age":30
}
四、建立快照
#建立全部索引的快照
PUT _snapshot/es_backup/snapshot4?wait_for_completion=true
#建立指定索引建立快照
PUT /_snapshot/es_backup/snapshot_3?wait_for_completion=true
{
  "indices": "test",
  "ignore_unavailable": true,
  "include_global_state": false,
  "metadata": {
    "taken_by": "lqbyz",
    "taken_because": "backup before delete"
  }
}
五、查看相關的快照
GET _snapshot/es_backup/_all
GET _snapshot/es_backup/
GET _cat/indices
六、刪除相關的快照
DELETE _snapshot/my_backup
GET test/_search
七、恢復快照
POST _snapshot/es_backup/snapshot_3/_restore
{}
#指定索引進行restore
POST /_snapshot/es_backup/snapshot1/_restore
{
  "indices": "elk-info-test-2020-06-26",
  "index_settings": {
    "index.number_of_replicas": 1
  },
  "ignore_index_settings": [
     "index.refresh_interval"
    ]
}
八、刪除索引和快照
DELETE test
DELETE _snapshot/my_fs_backup

#####相關操做的查詢
#####刪除es_backup倉庫
DELETE _snapshot/es_backup
###查看倉庫信息
GET _snapshot/_all
####查看快照的
GET _snapshot/snapshot4/_status
###建立一個snapshot快照（包含全部的索引）
PUT _snapshot/es_backup/snapshot4?wait_for_completion=true
###建立一個快照只包含test2的索引
PUT _snapshot/es_backup/snapshot2
{
  "indices": "test2"
}
###查看剛纔建立的快照信息
GET /_snapshot/es_backup/snapshot2
###查看全部的快照信息
GET _snapshot/es_backup/_all
###刪除一個快照
DELETE /_snapshot/es_backup/snapshot2
###刪除一個倉庫
DELETE /_snapshot/es_backup
若是要中止一個正在運行的snapshot任務（備份和恢復），將其刪除便可。

三、reindexgolang

reindex是Elasticsearch提供的一個api接口，能夠把數據從源ES集羣導入到當前的ES集羣，一樣實現了數據的遷移，限於騰訊雲ES的實現方式，當前版本不支持reindex操做。簡單介紹一下reindex接口的使用方式。sql

一、配置reindex.remote.whitelist參數

須要在目標ES集羣中配置該參數，指明可以reindex的遠程集羣的白名單

二、調用reindex api
如下操做表示從源ES集羣中查詢名爲test1的索引，查詢條件爲title字段爲elasticsearch，將結果寫入當前集羣的test2索引
 POST _reindex
 {
     "source": {
         "remote": {
             "host": "http://10.16.0.8:9200"
         },
         "index": "test1",
         "query": {
             "match": {
                 "title": "elasticsearch"
             }
         }
     },
     "dest": {
         "index": "test2"
     }
 }

四、logstash(filebeat、golang)
logstash支持從一個ES集羣中讀取數據而後寫入到另外一個ES集羣，所以可使用logstash進行數據遷移，具體的配置文件以下：數據庫

input {
        elasticsearch {
            hosts => ["http://10.16.0.8:9200"]
            index => "*"
            docinfo => true
        }
    }
    output {
        elasticsearch {
            hosts => ["http://10.16.0.9:9200"]
            index => "%{[@metadata][_index]}"
        }
    }

總結：
一、elasticsearch-dump和logstash作跨集羣數據遷移時，都要求用於執行遷移任務的機器能夠同時訪問到兩個集羣，否則網絡沒法連通的狀況下就沒法實現遷移。而使用snapshot的方式沒有這個限制，由於snapshot方式是徹底離線的。所以elasticsearch-dump和logstash遷移方式更適合於源ES集羣和目標ES集羣處於同一網絡的狀況下進行遷移，而須要跨雲廠商的遷移，好比從阿里雲ES集羣遷移至騰訊雲ES集羣，能夠選擇使用snapshot的方式進行遷移，固然也能夠經過打通網絡實現集羣互通，可是成本較高。
二、elasticsearchdump工具和mysql數據庫用於作數據備份的工具mysqldump工具相似，都是邏輯備份，須要將數據一條一條導出後再執行導入，因此適合數據量小的場景下進行遷移；
三、snapshot的方式適合數據量大的場景下進行遷移，推薦使用npm