etcd是一個分佈式k-v數據庫,在kubernetes中使用其管理集羣的元數據。這裏介紹etcd集羣數據的備份和數據恢復的方法和步驟。php
crontab按期執行備份腳本,每半小時備份一次,本地、異地都備份(暫定:本地備份保留最近10個備份,異地保留一個月的備份)linux
#!/bin/bash # ETCDCTL_PATH='/usr/etcd/bin/etcdctl' ENDPOINTS='1.1.1.1:2379,1.1.1.2:2379,1.1.1.3:2379' BACKUP_DIR='/home/apps/backup' DATE=`date +%Y%m%d-%H%M%S` [ ! -d $BACKUP_DIR ] && mkdir -p $BACKUP_DIR export ETCDCTL_API=3;$ETCDCTL_PATH --endpoints=$ENDPOINTS snapshot save $BACKUP_DIR/snapshot-$DATE\.db cd $BACKUP_DIR;ls -lt $BACKUP_DIR|awk '{if(NR>11){print "rm -rf "$9}}'|sh
提早準備部署好鏡像集羣數據庫
nohup etcdctl make-mirror <destination> &> /apps/logs/etcdmirror.log &
/apps/logs/etcdmirror.log會保存已經同步的key數量,每30s打印一次vim
crontab按期數據同步,爲避免數據遭到誤刪除清空形成災難性影響,可恢復上一個同步週期以前的數據後端
#!/bin/bash # ETCDCTL_PATH='/apps/svr/etcd/bin/etcdctl' ENDPOINTS='1.1.1.1:2379,1.1.1.2:2379,1.1.1.3:2379' DEST_ENDPOINT='1.1.1.1:2389' CMD="$ETCDCTL_PATH make-mirror --endpoints=$ENDPOINTS $DEST_ENDPOINT" BaseName=$(basename $BASH_SOURCE) export ETCDCTL_API=3 $CMD & echo $! > /tmp/$BaseName\.pid sleep 90 kill `cat /tmp/$BaseName\.pid`
etcd集羣與zk集羣類似,建議採用基數設備來搭建集羣,可用性爲(N-1)/2,假設集羣數量N是3臺設備,可最多可故障1臺設備,而不影響集羣使用。bash
當leader故障時,etcd集羣會自動選擇一個新leader,因爲失敗檢測模型是基於超時的(heartbeat-interval),所以選舉新leader須要大約選舉超時。
在leader選舉期間,集羣不能處理任何寫入操做。在選舉期間發送的寫入請求排隊等待處理,直到選出新的leader。
已經發送給old leader但還沒有提交的文字可能會丟失。新leader有權重寫old leader的任何未提交的條目。從用戶的角度來看,一些寫入請求可能會超時,可是,沒有提交的寫入會丟失。mvc
[root@ETCD-CLUSTER-001 bin]# export ETCDCTL_API=3;./etcdctl --write-out="table" --endpoints='10.201.46.112:2379,10.201.46.113:2379,10.201.46.114:2379' endpoint status +--------------------+------------------+---------+---------+-----------+-----------+------------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX | +--------------------+------------------+---------+---------+-----------+-----------+------------+ | 10.201.46.112:2379 | fd86741fb271523a | 3.1.10 | 25 kB | false | 2 | 10 | | 10.201.46.113:2379 | fe2dd3624258de7 | 3.1.10 | 25 kB | false | 2 | 10 | | 10.201.46.114:2379 | c649fd5192da5ca1 | 3.1.10 | 25 kB | true | 2 | 10 | +--------------------+------------------+---------+---------+-----------+-----------+------------+
[root@ETCD-CLUSTER-001 bin]# export ETCDCTL_API=3;./etcdctl put hello nihao OK
[root@ETCD-CLUSTER-001 bin]# export ETCDCTL_API=3;./etcdctl get hello nihao hello nihao
[root@ETCD-CLUSTER-003 ~]# sh /apps/sh/etcd.sh stop Stopping etcd: [ OK ]
[root@ETCD-CLUSTER-001 bin]# export ETCDCTL_API=3;./etcdctl get hello nihao hello nihao
爲了從災難性故障中恢復,etcd v3提供了快照和恢復功能,以便在沒有v3密鑰數據丟失的狀況下從新建立羣集.
要恢復集羣,只須要一個快照「db」文件。集羣恢復將etcdctl snapshot restore建立新的etcd數據目錄; 全部成員應該使用相同的快照進行恢復。恢復會覆蓋某些快照元數據(特別是成員ID和羣集ID); 該成員失去了其之前的身份。此元數據覆蓋可防止新成員無心中加入現有羣集。所以,爲了從快照啓動集羣,還原必須啓動新的邏輯集羣。app
在還原時能夠選擇驗證快照完整性。若是使用快照etcdctl snapshot save,它將具備經過檢查的完整性散列etcdctl snapshot restore。若是快照是從數據目錄複製的(配置文件中的data-dir),則不存在完整性哈希,而且只能使用恢復--skip-hash-check。分佈式
[root@ETCD-CLUSTER-001 bin]# export ETCDCTL_API=3;./etcdctl --endpoints='10.201.46.112:2379,10.201.46.113:2379,10.201.46.114:2379' snapshot save ~/.snapshot.db Snapshot saved at /root/.snapshot.db
每臺機器上執行,ide
$ etcdctl snapshot restore snapshot.db \ --name ETCD-CLUSTER-001 \ --initial-cluster ETCD-CLUSTER-001=http://10.201.46.112:2380,ETCD-CLUSTER-002=http://10.201.46.113:2380,ETCD-CLUSTER-003=http://10.201.46.114:2380 \ --initial-advertise-peer-urls http://10.201.46.112:2380 $ etcdctl snapshot restore snapshot.db \ --name ETCD-CLUSTER-002 \ --initial-cluster ETCD-CLUSTER-001=http://10.201.46.112:2380,ETCD-CLUSTER-002=http://10.201.46.113:2380,ETCD-CLUSTER-003=http://10.201.46.114:2380 \ --initial-advertise-peer-urls http://10.201.46.113:2380 $ etcdctl snapshot restore snapshot.db \ --name ETCD-CLUSTER-003 \ --initial-cluster ETCD-CLUSTER-001=http://10.201.46.112:2380,ETCD-CLUSTER-002=http://10.201.46.113:2380,ETCD-CLUSTER-003=http://10.201.46.114:2380 \ --initial-advertise-peer-urls http://10.201.46.114:2380
sudo sh /apps/sh/etcd.sh start
[root@ETCD-CLUSTER-001 etcd-v3.2.17-linux-amd64]# export ETCDCTL_API=3;/apps/svr/etcd/bin/etcdctl --write-out="table" --endpoints='1.1.1.1:2379,1.1.1.2:2379,1.1.1.3:2379' member list +------------------+---------+------------------+---------------------------+-----------------------+ | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | +------------------+---------+------------------+---------------------------+-----------------------+ | 28c987fd1ef634f8 | started | ETCD-CLUSTER-003 | http://1.1.1.3:2380 | http://localhost:2379 | | 635b8eabdf3280ef | started | ETCD-CLUSTER-002 | http://1.1.1.2:2380 | http://localhost:2379 | | e9a434659e36d3bc | started | ETCD-CLUSTER-001 | http://1.1.1.1:2380 | http://localhost:2379 | +------------------+---------+------------------+---------------------------+-----------------------+
[root@ETCD-CLUSTER-001 etcd-v3.2.17-linux-amd64]# export ETCDCTL_API=3;/apps/svr/etcd/bin/etcdctl --endpoints='1.1.1.1:2379,1.1.1.2:2379,1.1.1.3:2379' member remove e9a434659e36d3bc Member e9a434659e36d3bc removed from cluster 7055108fef63cdab [root@ETCD-CLUSTER-001 etcd-v3.2.17-linux-amd64]# export ETCDCTL_API=3;/apps/svr/etcd/bin/etcdctl --write-out="table" --endpoints='1.1.1.1:2379,1.1.1.2:2379,1.1.1.3:2379' member list +------------------+---------+------------------+---------------------------+-----------------------+ | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | +------------------+---------+------------------+---------------------------+-----------------------+ | 28c987fd1ef634f8 | started | ETCD-CLUSTER-003 | http://1.1.1.3:2380 | http://localhost:2379 | | 635b8eabdf3280ef | started | ETCD-CLUSTER-002 | http://1.1.1.2:2380 | http://localhost:2379 | +------------------+---------+------------------+---------------------------+-----------------------+
注意步驟順序
[root@ETCD-CLUSTER-001 ]# export ETCDCTL_API=3;/apps/svr/etcd/bin/etcdctl --endpoints='1.1.1.2:2379,1.1.1.3:2379' member add ETCD-CLUSTER-001 --peer-urls=http://1.1.1.1:2380 Member 433fd69a958b8432 added to cluster 7055108fef63cdab ETCD_NAME="ETCD-CLUSTER-001" ETCD_INITIAL_CLUSTER="ETCD-CLUSTER-003=http://1.1.1.3:2380,ETCD-CLUSTER-001=http://1.1.1.1:2380,ETCD-CLUSTER-002=http://1.1.1.2:2380" ETCD_INITIAL_CLUSTER_STATE="existing" [root@ETCD-CLUSTER-001 ]# export ETCDCTL_API=3;/apps/svr/etcd/bin/etcdctl --write-out="table" --endpoints='1.1.1.2:2379,1.1.1.3:2379' member list +------------------+-----------+------------------+---------------------------+-----------------------+ | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | +------------------+-----------+------------------+---------------------------+-----------------------+ | 28c987fd1ef634f8 | started | ETCD-CLUSTER-003 | http://1.1.1.3:2380 | http://localhost:2379 | | 433fd69a958b8432 | unstarted | | http://1.1.1.1:2380 | | | 635b8eabdf3280ef | started | ETCD-CLUSTER-002 | http://1.1.1.2:2380 | http://localhost:2379 | +------------------+-----------+------------------+---------------------------+-----------------------+
刪除新增成員舊數據目錄,而且啓動新增成員etcd服務,加入集羣時要改下配置文件,把初始化集羣狀態由new改爲existing
[root@ETCD-CLUSTER-001 ~]# vim /apps/conf/etcd/etcd.conf initial-cluster-state: "existing" [root@ETCD-CLUSTER-001 ~]# export ETCDCTL_API=3;/apps/svr/etcd/bin/etcdctl --write-out="table" --endpoints='1.1.1.2:2379,1.1.1.3:2379' member list +------------------+---------+------------------+---------------------------+-----------------------+ | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | +------------------+---------+------------------+---------------------------+-----------------------+ | 28c987fd1ef634f8 | started | ETCD-CLUSTER-003 | http://1.1.1.3:2380 | http://localhost:2379 | | 433fd69a958b8432 | started | ETCD-CLUSTER-001 | http://1.1.1.1:2380 | http://localhost:2379 | | 635b8eabdf3280ef | started | ETCD-CLUSTER-002 | http://1.1.1.2:2380 | http://localhost:2379 | +------------------+---------+------------------+---------------------------+-----------------------+ [root@ETCD-CLUSTER-001 ~]# export ETCDCTL_API=3;/apps/svr/etcd/bin/etcdctl --write-out="table" --endpoints='1.1.1.1:2379,1.1.1.2:2379,1.1.1.3:2379' endpoint status +--------------------+------------------+---------+---------+-----------+-----------+------------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX | +--------------------+------------------+---------+---------+-----------+-----------+------------+ | 1.1.1.1:2379 | 433fd69a958b8432 | 3.1.10 | 98 kB | false | 3 | 13 | | 1.1.1.2:2379 | 635b8eabdf3280ef | 3.1.10 | 98 kB | false | 3 | 13 | | 1.1.1.3:2379 | 28c987fd1ef634f8 | 3.1.10 | 98 kB | true | 3 | 13 | +--------------------+------------------+---------+---------+-----------+-----------+------------+
# 遷移前,須要逐臺服務中止 sh /apps/sh/etcd.sh stop # 遷移數據,數據目錄根據實際填寫 export ETCDCTL_API=3 etcdctl --endpoints=$ENDPOINT migrate --data-dir="default.etcd" --wal-dir="default.etcd/member/wal" # 逐臺服務啓動 sh /apps/sh/etcd.sh start # 檢查確認數據已經遷移 export ETCDCTL_API=3;etcdctl --endpoints=$ENDPOINTS get /foo
--snapshot-count:指定有多少事務(transaction)被提交時,觸發截取快照保存到磁盤,在v3.2以前的版本,默認的參數是10000條,3.2以後調整爲100000條
這個條目數量不能配置太高或者太低,太低會致使頻繁的io壓力,太高會致使佔用高內存以及會致使etcd GC過慢。建議設置爲10W-20W條。
key空間長期的時候,若是沒有作壓縮清理,到達上限的閾值時,集羣會處於一個只能刪除和讀的狀態,沒法進行寫操做。所以對集羣的歷史日誌作一個壓縮清理是頗有必要。
數據壓縮並非清理現有數據,只是對數據的歷史版本進行清理,清理後數據的歷史版本將不能訪問,但不會影響現有最新數據的訪問。
使用客戶端工具進行清理
#壓縮清理revision爲10以前的歷史數據 [apps@test ~]$ export ETCDCTL_API=3;/apps/svr/etcd/bin/etcdctl compaction 10 compacted revision 10 #訪問revision10以前的數據會提示已經不存在 [apps@test ~]$ export ETCDCTL_API=3;/apps/svr/etcd/bin/etcdctl get aa --rev=9 Error: etcdserver: mvcc: required revision has been compacted
使用--auto-compaction-retention=1,表示每小時進行一次數據壓縮。
進行compaction操做以後,舊的revision被壓縮,會產生內部的碎片,內部碎片是指空閒狀態的,能被後端使用可是仍然消耗存儲空間的磁盤空間。去碎片化其實是將存儲空間還給文件系統。
[apps@test ~]$ etcdctl defrag