Zookeeper筆記之使用zk實現集羣選主

 

1、需求

在主從結構的集羣中,咱們假設硬件機器是很脆弱的,隨時可能會宕機,當master掛掉以後須要從slave中選出一個節點做爲新的master,使用zookeeper能夠很簡單的實現集羣選主功能。html

 

2、分析

下面爲了方便敘述,將使用更通用的技術術語,即便用leader表示master,使用follower表示slave。java

集羣選主涉及到兩個問題:node

1. 誰來作leaderapache

2. leader掛掉了怎麼被follower感知到網絡

首先是第一個問題,誰來作leader,其實能夠將這個問題看作是多線程中的互斥鎖搶佔,鎖只有一把,而且只能被一我的搶到,這裏就把一個zookeeper上的一個節點/leader-info看作是鎖,集羣中的每臺機器都嘗試去建立這個節點(搶佔鎖),由於zookeeper建立節點是原子性操做,因此只有一臺機器可以建立成功其它都會失敗,建立成功的那臺機器就做爲leader,其它機器作follower,通常leader搶佔成功了以後會在/leader-info節點上存儲一些與本身相關的信息,好比hostname、id之類的,以讓follower知道誰搶佔成功成爲了leader,而後去鏈接leader進行一些數據交換或指令控制之類的,那就是選主以後的事了不在此篇文章的討論範圍以內。session

第二個問題是leader掛掉了怎麼通知其它的follower,zookeeper中的節點按照有效時間分爲持久節點和臨時節點,臨時節點跟session綁定,而一個session表示一個客戶端,當客戶端下線的時候session失效,當session失效的時候跟它綁定的臨時節點就會被刪除,利用這個特性能夠檢測節點是否還在存活狀態,實現follower對leader下線的感知,只是須要注意在建立/leader-info節點的時候要將其建立爲臨時節點,而後衆多follower都在這個節點上添加一個watcher監聽其刪除事件,這樣當leader掛掉的時候session失效,而後與此session綁定的臨時節點會被刪除,即/leader-info節點將被刪除,同時給全部的follower發送事件通知,follower一看leader掛了就燥起來了,將本身的狀態置爲looking,開始新一輪的選舉。多線程

 

總結一下選主的流程:dom

1. 集羣中的全部機器將本身置爲looking狀態,準備開始選舉。分佈式

2. 全部looking狀態的機器嘗試去建立/leader-info節點。優化

3. 建立成功的將本身的狀態修改成leader,同時將本身的一些信息寫入到/leader-info這個節點上。

4. 建立失敗的將本身的狀態置爲follower,同時嘗試從/leader-info獲取leader信息進行一些leader改變的邏輯(在這裏這個不是重點,打印一下便可),follower在獲取/leader-info節點數據的同時要設置一個watcher,監聽此節點的刪除事件,當節點被刪除事件觸發時啓動新一輪的選舉,由於獲取數據設置watcher這個操做是原子性的,因此要麼這個節點存在獲取數據成功,而且設置watcher也成功,要麼節點不存在拋出KeeperException.NoNodeException異常。

5. 爲何在follower設置watcher的時候還有可能會拋異常呢,leader不是已經建立了這個節點了嗎?由於follower從嘗試建立/leader-info節點失敗到去獲取此節點的數據同時設置watcher這一段操做不是原子性的,在這中間可能會發生一些變故,leader可能剛成爲leader就掛掉了(或者由於一些網絡抖動緣由,總之是session失效了),leader掛掉以後它建立的臨時節點就被zookeeper刪除了,因此當follower在設置watcher的時候若是檢測到KeeperException.NoNodeException,說明以前的leader掛掉了,此時集羣中已經沒有了leader,follower又燥起來了,它將本身的狀態置爲looking開始新一輪的選舉。

 

3、實現

Node.java:

package cc11001100.zookeeper.leaderElection;

import cc11001100.zookeeper.utils.ZooKeeperUtil;
import org.apache.zookeeper.CreateMode;
import org.apache.zookeeper.KeeperException;
import org.apache.zookeeper.Watcher;
import org.apache.zookeeper.ZooDefs;
import org.apache.zookeeper.ZooKeeper;

import java.io.IOException;
import java.io.UnsupportedEncodingException;

/**
 * 表示集羣中的一個節點
 *
 * @author CC11001100
 */
public class Node {

	private Status status;
	private String nodeForLeaderInfo;
	private ZooKeeper zooKeeper;

	public Node(String listenerNodeForLeader) throws IOException {
		this.nodeForLeaderInfo = listenerNodeForLeader;
		this.zooKeeper = ZooKeeperUtil.getZooKeeper();
		lookingForLeader();
	}

	public void lookingForLeader() {
		status = Status.LOOKING;
		try {
			String leaderInfo = Thread.currentThread().getName();
			// 須要注意這裏建立的是臨時節點
			zooKeeper.create(nodeForLeaderInfo, leaderInfo.getBytes(), ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL);
			// 若是上一步沒有拋異常,說明本身已是leader了
			status = Status.LEADER;
			String logMsg = Thread.currentThread().getName() + " is leader";
			System.out.println(logMsg);
		} catch (KeeperException.NodeExistsException e) {
			// 節點已經存在,說明leader已經被別人註冊成功了,本身是follower
			status = Status.FOLLOWER;
			try {
				byte[] leaderInfoBytes = zooKeeper.getData(nodeForLeaderInfo, event -> {
					if (event.getType() == Watcher.Event.EventType.NodeDeleted) {
						lookingForLeader();
					}
				}, null);
				String logMsg = Thread.currentThread().getName() + " is follower, master is " + new String(leaderInfoBytes, "UTF-8");
				System.out.println(logMsg);
			} catch (KeeperException.NoNodeException e1) {
				// 若是在獲取leader信息的時候報了節點不存在,說明這個leader比較短命,剛搶到leader就又掛掉了
				lookingForLeader();
			} catch (KeeperException | InterruptedException | UnsupportedEncodingException e1) {
				e1.printStackTrace();
			}
		} catch (KeeperException | InterruptedException e) {
			e.printStackTrace();
		}
	}

	public void shutdown() {
		try {
			if (zooKeeper != null) {
				zooKeeper.close();
			}
		} catch (InterruptedException e) {
			e.printStackTrace();
		}
	}

	public Status getStatus() {
		return status;
	}

	// 當前節點的狀態,節點的狀態必須在這三個中的一個
	public enum Status {
		LOOKING, // 選舉中
		LEADER, // 選舉完畢,當前節點爲leader
		FOLLOWER; // 選舉完畢,當前節點爲follower
	}

}

LeaderElectionTest.java:

package cc11001100.zookeeper.leaderElection;

import java.io.IOException;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.concurrent.atomic.AtomicLong;

/**
 * @author CC11001100
 */
public class LeaderElectionTest {

	private static void sleep(long mils) {
		try {
			TimeUnit.MILLISECONDS.sleep(mils);
		} catch (InterruptedException e) {
			e.printStackTrace();
		}
	}

	public static void main(String[] args) throws IOException {

		final String LEADER_INFO_NODE = "/leader-info";
		int nodeNum = 10;
		AtomicLong idGenerator = new AtomicLong();
		AtomicInteger activeNodeCount = new AtomicInteger();
		while (true) {
			if (activeNodeCount.get() >= nodeNum) {
				sleep(10);
				continue;
			}

			// 線程啓動須要必定時間,將線程啓動看作開機過程,在開機以前就算一臺新的機器加入了
			activeNodeCount.incrementAndGet();
			new Thread(() -> {
				try {
					Node node = new Node(LEADER_INFO_NODE);
					while (true) {
						sleep(1000);
						// 這裏爲了試驗就讓leader有輕微自殺傾向...
						if (node.getStatus() == Node.Status.LEADER && Math.random() < 0.3) {
							String logMsg = "----------------------------- " + Thread.currentThread().getName() + " shutdown -----------------------------";
							System.out.println(logMsg);
							node.shutdown();
							break;
						}
					}
				} catch (IOException e) {
					e.printStackTrace();
				} finally {
					activeNodeCount.decrementAndGet();
				}
			}, "node-" + idGenerator.getAndIncrement()).start();
		}
	}

}

控制檯輸出:

...
node-4 is leader
node-3 is follower, master is node-4
node-0 is follower, master is node-4
node-9 is follower, master is node-4
node-7 is follower, master is node-4
node-5 is follower, master is node-4
node-1 is follower, master is node-4
node-6 is follower, master is node-4
node-8 is follower, master is node-4
node-2 is follower, master is node-4
----------------------------- node-4 shutdown -----------------------------
node-0-EventThread is leader
node-6-EventThread is follower, master is node-0-EventThread
node-3-EventThread is follower, master is node-0-EventThread
node-7-EventThread is follower, master is node-0-EventThread
node-1-EventThread is follower, master is node-0-EventThread
node-5-EventThread is follower, master is node-0-EventThread
node-9-EventThread is follower, master is node-0-EventThread
node-2-EventThread is follower, master is node-0-EventThread
node-8-EventThread is follower, master is node-0-EventThread
node-10 is follower, master is node-0-EventThread
----------------------------- node-0 shutdown -----------------------------
node-6-EventThread is leader
node-7-EventThread is follower, master is node-6-EventThread
node-1-EventThread is follower, master is node-6-EventThread
node-3-EventThread is follower, master is node-6-EventThread
node-10-EventThread is follower, master is node-6-EventThread
node-9-EventThread is follower, master is node-6-EventThread
node-5-EventThread is follower, master is node-6-EventThread
node-2-EventThread is follower, master is node-6-EventThread
node-8-EventThread is follower, master is node-6-EventThread
node-11 is follower, master is node-6-EventThread
----------------------------- node-6 shutdown -----------------------------
node-1-EventThread is leader
node-10-EventThread is follower, master is node-1-EventThread
node-7-EventThread is follower, master is node-1-EventThread
node-11-EventThread is follower, master is node-1-EventThread
node-8-EventThread is follower, master is node-1-EventThread
node-5-EventThread is follower, master is node-1-EventThread
node-9-EventThread is follower, master is node-1-EventThread
node-3-EventThread is follower, master is node-1-EventThread
node-2-EventThread is follower, master is node-1-EventThread
node-12 is follower, master is node-1-EventThread
----------------------------- node-1 shutdown -----------------------------
node-3-EventThread is leader
node-12-EventThread is follower, master is node-3-EventThread
node-11-EventThread is follower, master is node-3-EventThread
node-5-EventThread is follower, master is node-3-EventThread
node-7-EventThread is follower, master is node-3-EventThread
node-9-EventThread is follower, master is node-3-EventThread
node-2-EventThread is follower, master is node-3-EventThread
node-10-EventThread is follower, master is node-3-EventThread
node-8-EventThread is follower, master is node-3-EventThread
node-13 is follower, master is node-3-EventThread
----------------------------- node-3 shutdown -----------------------------
node-5-EventThread is leader
node-13-EventThread is follower, master is node-5-EventThread
node-12-EventThread is follower, master is node-5-EventThread
node-7-EventThread is follower, master is node-5-EventThread
node-11-EventThread is follower, master is node-5-EventThread
node-10-EventThread is follower, master is node-5-EventThread
node-9-EventThread is follower, master is node-5-EventThread
node-2-EventThread is follower, master is node-5-EventThread
node-8-EventThread is follower, master is node-5-EventThread
node-14 is follower, master is node-5-EventThread
----------------------------- node-5 shutdown -----------------------------
node-7-EventThread is leader
node-13-EventThread is follower, master is node-7-EventThread
node-12-EventThread is follower, master is node-7-EventThread
node-9-EventThread is follower, master is node-7-EventThread
node-11-EventThread is follower, master is node-7-EventThread
node-14-EventThread is follower, master is node-7-EventThread
node-10-EventThread is follower, master is node-7-EventThread
node-8-EventThread is follower, master is node-7-EventThread
node-2-EventThread is follower, master is node-7-EventThread
node-15 is follower, master is node-7-EventThread
----------------------------- node-7 shutdown -----------------------------
node-14-EventThread is leader
node-13-EventThread is follower, master is node-14-EventThread
node-11-EventThread is follower, master is node-14-EventThread
node-2-EventThread is follower, master is node-14-EventThread
node-12-EventThread is follower, master is node-14-EventThread
node-15-EventThread is follower, master is node-14-EventThread
node-10-EventThread is follower, master is node-14-EventThread
node-9-EventThread is follower, master is node-14-EventThread
node-8-EventThread is follower, master is node-14-EventThread
node-16 is follower, master is node-14-EventThread
----------------------------- node-14 shutdown -----------------------------
node-13-EventThread is leader
node-12-EventThread is follower, master is node-13-EventThread
node-15-EventThread is follower, master is node-13-EventThread
node-9-EventThread is follower, master is node-13-EventThread
node-10-EventThread is follower, master is node-13-EventThread
node-2-EventThread is follower, master is node-13-EventThread
node-8-EventThread is follower, master is node-13-EventThread
node-11-EventThread is follower, master is node-13-EventThread
node-16-EventThread is follower, master is node-13-EventThread
node-17 is follower, master is node-13-EventThread
...

 

最後有個須要注意的地方就是是否須要將leader節點設置爲順序臨時節點呢?相似於分佈式鎖那樣,這樣的話每次喚醒一個節點就能夠了,這看上去像是一個能夠優化的點。

其實並非,當leader掛掉的時候必須全部follower都可以感知到leader的變動,即便他們不參與搶主也必須醒來執行leader變動的邏輯。

 

.

相關文章
相關標籤/搜索