#查看集羣member狀況node
etcdctl --endpoints=${exist-advertise-peer-urls} member list
#動態擴容git
etcdctl --endpoints=${exist-advertise-peer-urls} member add infra4 --peer-urls=${new-advertise-peer-urls}
#運行時縮容github
etcdctl --endpoints=${exist-advertise-peer-urls} member remove ${cluster_id}
使用member remove命令進行縮容bootstrap
使用member add命令進行擴容。控制檯會輸出以下內容(新節點加入集羣的重要啓動參數):app
啓動新實例的參數:--name、--initial-advertise-peer-urls、--initial-cluster-state、--initial-cluster必須和控制檯輸出保持一致,不然啓動失敗。ide
啓動新實例的參數:--name、--initial-advertise-peer-urls、--initial-cluster-state、--initial-cluster必須和控制檯輸出保持一致,不然啓動失敗。 參數詳解: --initial-cluster-state: 設置成existing,必須確保在啓動時候其餘member是存活的(peer端口),不然啓動失敗。用在擴容新實例的啓動。 設置成new,用在cluster已知member的啓動。
新節點加入集羣的重要啓動參數,按照參數去啓動:url
ETCD_NAME="infra1" ETCD_INITIAL_CLUSTER="infra3=http://127.0.0.1:32380,infra2=http://127.0.0.1:22380,infra1=http://127.0.0.1:12380" ETCD_INITIAL_ADVERTISE_PEER_URLS="http://127.0.0.1:12380" ETCD_INITIAL_CLUSTER_STATE="existing"
示例:debug
etcd \ --name ${ETCD_NAME} \ --listen-client-urls http://127.0.0.1:42379 \ --advertise-client-urls http://127.0.0.1:42379 \ --listen-peer-urls http://127.0.0.1:42380 \ --initial-advertise-peer-urls ${ETCD_INITIAL_ADVERTISE_PEER_URLS} \ --initial-cluster-state ${ETCD_INITIAL_CLUSTER_STATE} \ --initial-cluster ${ETCD_INITIAL_CLUSTER}
member信息會持久化到磁盤上,數據丟失的節點必須以新的member身份加入,必須嚴格按照以下操做:日誌
移除failure節點:使用member remove命令剔除錯誤節點。保證當前集羣的健康情況。code
完全清理數據目錄:錯誤節點必須中止,而後刪除data dir。保證member信息被清理乾淨,清空member目錄。
集羣擴容:使用member add命令添加步驟1的錯誤節點。參考3.2。
從新啓動:步驟1的錯誤節點進行啓動,參考3.2
數據丟失後,啓動參數使用 --initial-cluster-state="new",錯誤日誌以下,提示:member ddd67b312462fd7b has already been bootstrapped
2019-07-09 00:24:55.880988 I | etcdmain: etcd Version: 3.3.10 2019-07-09 00:24:55.881077 I | etcdmain: Git SHA: 27fc7e2 2019-07-09 00:24:55.881082 I | etcdmain: Go Version: go1.10.4 2019-07-09 00:24:55.881089 I | etcdmain: Go OS/Arch: darwin/amd64 2019-07-09 00:24:55.881093 I | etcdmain: setting maximum number of CPUs to 8, total number of available CPUs is 8 2019-07-09 00:24:55.881099 N | etcdmain: failed to detect default host (default host not supported on darwin_amd64) 2019-07-09 00:24:55.881106 W | etcdmain: no data-dir provided, using default data-dir ./infra1.etcd 2019-07-09 00:24:55.881236 I | embed: listening for peers on http://127.0.0.1:12380 2019-07-09 00:24:55.881254 I | embed: pprof is enabled under /debug/pprof 2019-07-09 00:24:55.881299 I | embed: listening for client requests on 127.0.0.1:2380 2019-07-09 00:24:55.883626 C | etcdmain: member ddd67b312462fd7b has already been bootstrapped
數據丟失後,啓動參數使用 --initial-cluster-state="existing",錯誤日誌以下,提示:Was the raft log corrupted, truncated, or lost?
tocommit(10) is out of range [lastIndex(0)]. Was the raft log corrupted, truncated, or lost? panic: tocommit(10) is out of range [lastIndex(0)]. Was the raft log corrupted, truncated, or lost? goroutine 135 [running]: github.com/coreos/etcd/cmd/vendor/github.com/coreos/pkg/capnslog.(*PackageLogger).Panicf(0xc42000a660, 0x1c0cad8, 0x5d, 0xc42000a160, 0x2, 0x2) /tmp/etcd-release-3.3.10/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/pkg/capnslog/pkg_logger.go:75 +0x162 github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft.(*raftLog).commitTo(0xc420277500, 0xa) /tmp/etcd-release-3.3.10/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft/log.go:191 +0x15c github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft.(*raft).handleHeartbeat(0xc420244300, 0x8, 0xddd67b312462fd7b, 0x9e737febb6b99eee, 0x5, 0x0, 0x0, 0x0, 0x0, 0x0, ...) /tmp/etcd-release-3.3.10/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft/raft.go:1194 +0x54 github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft.stepFollower(0xc420244300, 0x8, 0xddd67b312462fd7b, 0x9e737febb6b99eee, 0x5, 0x0, 0x0, 0x0, 0x0, 0x0, ...) /tmp/etcd-release-3.3.10/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft/raft.go:1140 +0x3ff github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft.(*raft).Step(0xc420244300, 0x8, 0xddd67b312462fd7b, 0x9e737febb6b99eee, 0x5, 0x0, 0x0, 0x0, 0x0, 0x0, ...) /tmp/etcd-release-3.3.10/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft/raft.go:868 +0x12f1 github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft.(*node).run(0xc4201df320, 0xc420244300) /tmp/etcd-release-3.3.10/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft/node.go:323 +0x1059 Nov 9 19:14:20 kubernetes-65 systemd: etcd.service: main process exited, code=exited, status=2/INVALIDARGUMENT Nov 9 19:14:20 kubernetes-65 systemd: Failed to start Etcd Server. Nov 9 19:14:20 kubernetes-65 systemd: Unit etcd.service entered failed state. Nov 9 19:14:20 kubernetes-65 systemd: etcd.service failed.
步驟中1和3正確執行,而遺漏步驟2而且中間有錯誤啓動,使得磁盤留有錯誤member信息。錯誤日誌以下,提示:
2019-07-09 01:24:19.311630 E | rafthttp: failed to find member 9e737febb6b99eee in cluster 73841b4a9097c907 2019-07-09 01:24:19.311710 E | rafthttp: failed to find member 628170c800dbcee in cluster 73841b4a9097c907 2019-07-09 01:24:19.410573 E | rafthttp: failed to find member 9e737febb6b99eee in cluster 73841b4a9097c907 2019-07-09 01:24:19.410616 E | rafthttp: failed to find member 628170c800dbcee in cluster 73841b4a9097c907 2019-07-09 01:24:19.410678 E | rafthttp: failed to find member 9e737febb6b99eee in cluster 73841b4a9097c907 2019-07-09 01:24:19.410767 E | rafthttp: failed to find member 628170c800dbcee in cluster 73841b4a