corosync做爲HA方案中成員管理層(membership layer),負責集羣成員管理、通訊方式(單播、廣播、組播)等功能,pacemaker做爲CRM層。在利用corosync+pacemaker 主備模式實踐中,遇到一個問題,即腦裂問題。何謂腦裂: 在HA集羣中,節點間經過心跳線進行網絡通訊,一旦心跳網絡異常。致使成員互不相認,各自做爲集羣中的DC,這樣資源同時會在主、備兩節點啓動。腦裂是corosync仍是pacemaker致使的呢?一開始我認爲是corosync,緣由在於心跳端致使corosync不能正常通訊。後來發如今pacemaker官網有找到腦裂(split-brain)的方案。pacemaker做爲crm,主責是管理資源,還有一個做用是選擇leader。html
在[1](http://drbd.linbit.com/users-guide-emb/s-configure-split-brain-behavior.html)文中給出了一種解決方法。本文中討論另外一種方法,即爲pacemaker配置搶佔資源。原理在於,pacemaker 能夠定義資源的執行順序。若是將獨佔資源放在最前面,後面的資源的啓動則依賴與它,成也獨佔資源,敗也獨佔資源。小心跳網絡故障時候,誰先搶佔到該資源,該節點就接管服務資源,提供服務。這種方案必須解決兩個問題,一是必須定義一個搶佔資源,二是自定義pacemaker RA,去搶奪資源。python
本文利用互斥鎖來實現獨佔資源。具體由python實現一個簡單的web服務,提供lock,unlock,updatelock服務。web
__author__ = 'ZHANGTIANJIONG629' import BaseHTTPServer import threading import time lock_timeout_seconds = 8 lock = threading.Lock() lock_client_ip = "" lock_time = 0 class LockService(BaseHTTPServer.BaseHTTPRequestHandler): def do_GET(self): '''define url route''' pass def lock(self, client_ip): global lock_client_ip global lock_time # if lock is free if lock.acquire(): lock_client_ip = client_ip lock_time = time.time() self.send_response(200, 'ok') self.close_connection return # if current client hold lock,updte lock time elif lock_client_ip == client_ip: lock_time = time.time() self.send_response(200, 'ok,update') self.close_connection return else: # lock timeout,grab lock if time.time() - lock_time > lock_timeout_seconds: lock_client_ip = client_ip; lock_time = time.time() self.send_response(200, 'ok,grab lock') self.close_connection return else: self.send_response(403, 'lock is hold by other') self.close_connection def update_lock(self, client_ip): global lock_client_ip global lock_time if lock_client_ip == client_ip: lock_time = time.time() self.send_response(200, 'ok,update') self.close_connection return else: self.send_response(403, 'lock is hold by other') self.close_connection return def unlock(self, client_ip): global lock_client_ip global lock_time if lock.acquire(): lock.release() self.send_response(200, 'ok,unlock') self.close_connection return elif lock_client_ip == client_ip: lock.release() lock_time = 0 lock_client_ip = '' self.send_response(200, 'ok,unlock') self.close_connection return else: self.send_response(403, 'lock is hold by other') self.close_connection return if __name__ == '__main__': http_server = BaseHTTPServer.HTTPServer(('127.0.0.1', '88888'), LockService) http_server.serve_forever()
下一篇介紹自定義RA腳本。網絡