當mnesia集羣出現網絡分區(network_partition)時,各自的分區可能會寫入不一樣的數據,從而出現數據不一致的現象,當網絡分區恢復後,mnesia會上報一個inconsistent_database的系統事件,而且數據仍然處於不一致的狀態。這裏簡單分析下inconsistent_database是怎樣產生的。 java
1. mnesia怎樣感知其餘節點的啓停狀態 node
在mnesia應用中,mnesia_monitor進程負責對集羣中節點的鏈接狀態進行監控,進程在初始化時調用net_kernel:monitor_nodes(true)對節點的狀態進行訂閱,當有節點鏈接或者斷開時,mnesia_monitor進程會收到相應的消息。 網絡
handle_call(init, _From, State) -> net_kernel:monitor_nodes(true), EarlyNodes = State#state.early_connects, State2 = State#state{tm_started = true}, {reply, EarlyNodes, State2}; handle_info({nodeup,Node}, State) -> ... handle_info({nodedown, _Node}, State) -> ...
另外,mneisa集羣的節點間會進行協商,協商完成後,彼此的mnesia_monitor進程會相互link。 app
%% From remote monitor.. handle_call({negotiate_protocol, Mon, Version, Protocols}, From, State) when node(Mon) /= node() -> Protocol = protocol_version(), MyVersion = mnesia:system_info(version), case lists:member(Protocol, Protocols) of true -> accept_protocol(Mon, MyVersion, Protocol, From, State); false -> %% in this release we should be able to handle the previous %% protocol case hd(Protocols) of ?previous_protocol_version -> accept_protocol(Mon, MyVersion, ?previous_protocol_version, From, State); {7,6} -> accept_protocol(Mon, MyVersion, {7,6}, From, State); _ -> verbose("Connection with ~p rejected. " "version = ~p, protocols = ~p, " "expected version = ~p, expected protocol = ~p~n", [node(Mon), Version, Protocols, MyVersion, Protocol]), {reply, {node(), {reject, self(), MyVersion, Protocol}}, State} end end; accept_protocol(Mon, Version, Protocol, From, State) -> ... case lists:member(Node, State#state.going_down) of true -> ... false -> link(Mon), %% link to remote Monitor ...
2. 出現網絡分區時mnesia幹了些什麼事情 this
當出現網絡分區時,mnesia_monitor進程會收到{nodedown,Node},{'EXIT',Pid,_Reason}消息。對於{nodedown,Node}消息,mneisa_monitor進程不作任何處理;而對於{'EXIT',Pid,_Reason}消息,則判斷進程pid是不是本節點的,若是不是本節點的則假定是集羣中其餘節點的mnesia_monitor進程結束了,這時,mneisa_monitor進程會依次向mnesia_recover,mneisa_controller,mnesia_tm,mnesia_locker進程發送mnesia_down消息,通知進程進行相應處理。 spa
mnesia_recover進程收到mnesia_down消息後對未決議的事務進行相應處理。 日誌
mnesia_controller進程收到mnesia_down消息後在latest日誌文件和mnesia_decision表中記錄mnesia_down的相關信息,重置全部表的where_to_commit,active_replicas,where_to_write,where_to_wlock等屬性並進行其餘相關處理。 code
mnesia_tm進程收到mnesia_down消息後對本節點發起的事務與參與的事務進行相應處理。 orm
mnesia_locker進程則釋放對應節點佔用的鎖。 進程
最後mnesia_monitor進程記錄mnesia_down事件,並將該事件發送給訂閱者,而後進行自身相關狀態的清理工做。
3. 網絡分區恢復後mnesia又幹了些什麼事情
當網絡分區恢復後,mnesia_monitor進程收到一個{nodeup,Node}消息,若是本節點和遠端節點都認爲對方down掉過,即本地有記錄對應的mnesi_down事件,則上報inconsistent_database事件,而且mneisa不會再進行schema的merge及表數據的同步工做。
handle_info({nodeup, Node}, State) -> %% Ok, we are connected to yet another Erlang node %% Let's check if Mnesia is running there in order %% to detect if the network has been partitioned %% due to communication failure. HasDown = mnesia_recover:has_mnesia_down(Node), ImRunning = mnesia_lib:is_running(), if %% If I'm not running the test will be made later. HasDown == true, ImRunning == yes -> spawn_link(?MODULE, detect_partitioned_network, [self(), Node]); true -> ignore end, {noreply, State}; detect_partitioned_network(Mon, Node) -> detect_inconcistency([Node], running_partitioned_network), unlink(Mon), exit(normal). detect_inconcistency([], _Context) -> ok; detect_inconcistency(Nodes, Context) -> Downs = [N || N <- Nodes, mnesia_recover:has_mnesia_down(N)], {Replies, _BadNodes} = rpc:multicall(Downs, ?MODULE, has_remote_mnesia_down, [node()]), report_inconsistency(Replies, Context, ok). report_inconsistency([{true, Node} | Replies], Context, _Status) -> %% Oops, Mnesia is already running on the %% other node AND we both regard each %% other as down. The database is %% potentially inconsistent and we has to %% do tell the applications about it, so %% they may perform some clever recovery %% action. Msg = {inconsistent_database, Context, Node}, mnesia_lib:report_system_event(Msg), report_inconsistency(Replies, Context, inconsistent_database); report_inconsistency([{false, _Node} | Replies], Context, Status) -> report_inconsistency(Replies, Context, Status); report_inconsistency([{badrpc, _Reason} | Replies], Context, Status) -> report_inconsistency(Replies, Context, Status); report_inconsistency([], _Context, Status) -> Status.===========================================================
出現網絡分區後,能夠經過mnesia:set_master_nodes(Nodes)或者mnesia:change_config(extra_db_nodes,Nodes)簡單的進行恢復,可是這種處理方式容易出現數據丟失的問題。好一點的處理方式是訂閱mnesia的系統事件,自行編寫數據恢復的處理程序,在收到inconsistent_database消息後進行相應的處理。