mnesia之inconsistent_database

當mnesia集羣出現網絡分區(network_partition)時,各自的分區可能會寫入不一樣的數據,從而出現數據不一致的現象,當網絡分區恢復後,mnesia會上報一個inconsistent_database的系統事件,而且數據仍然處於不一致的狀態。這裏簡單分析下inconsistent_database是怎樣產生的。 java

1. mnesia怎樣感知其餘節點的啓停狀態 node

在mnesia應用中,mnesia_monitor進程負責對集羣中節點的鏈接狀態進行監控,進程在初始化時調用net_kernel:monitor_nodes(true)對節點的狀態進行訂閱,當有節點鏈接或者斷開時,mnesia_monitor進程會收到相應的消息。 網絡

handle_call(init, _From, State) ->
    net_kernel:monitor_nodes(true),
    EarlyNodes = State#state.early_connects,
    State2 = State#state{tm_started = true},
    {reply, EarlyNodes, State2};


handle_info({nodeup,Node}, State) ->
    ...

handle_info({nodedown, _Node}, State) ->
    ...

另外,mneisa集羣的節點間會進行協商,協商完成後,彼此的mnesia_monitor進程會相互link。 app

%% From remote monitor..
handle_call({negotiate_protocol, Mon, Version, Protocols}, From, State)
  when node(Mon) /= node() ->
    Protocol = protocol_version(),
    MyVersion = mnesia:system_info(version),
    case lists:member(Protocol, Protocols) of
	true ->
	    accept_protocol(Mon, MyVersion, Protocol, From, State);
	false ->
	    %% in this release we should be able to handle the previous
	    %% protocol
	    case hd(Protocols) of
		?previous_protocol_version ->
		    accept_protocol(Mon, MyVersion, ?previous_protocol_version, From, State);
		{7,6} ->
		    accept_protocol(Mon, MyVersion, {7,6}, From, State);
		_ ->
		    verbose("Connection with ~p rejected. "
			    "version = ~p, protocols = ~p, "
			    "expected version = ~p, expected protocol = ~p~n",
			    [node(Mon), Version, Protocols, MyVersion, Protocol]),
		    {reply, {node(), {reject, self(), MyVersion, Protocol}}, State}
	    end
    end;

accept_protocol(Mon, Version, Protocol, From, State) ->
    ...
    case lists:member(Node, State#state.going_down) of
    true ->
        ...
    false ->
        link(Mon),  %% link to remote Monitor
        ...

 

2. 出現網絡分區時mnesia幹了些什麼事情 this

當出現網絡分區時,mnesia_monitor進程會收到{nodedown,Node},{'EXIT',Pid,_Reason}消息。對於{nodedown,Node}消息,mneisa_monitor進程不作任何處理;而對於{'EXIT',Pid,_Reason}消息,則判斷進程pid是不是本節點的,若是不是本節點的則假定是集羣中其餘節點的mnesia_monitor進程結束了,這時,mneisa_monitor進程會依次向mnesia_recover,mneisa_controller,mnesia_tm,mnesia_locker進程發送mnesia_down消息,通知進程進行相應處理。 spa


mnesia_recover進程收到mnesia_down消息後對未決議的事務進行相應處理。 日誌

mnesia_controller進程收到mnesia_down消息後在latest日誌文件和mnesia_decision表中記錄mnesia_down的相關信息,重置全部表的where_to_commit,active_replicas,where_to_write,where_to_wlock等屬性並進行其餘相關處理。 code

mnesia_tm進程收到mnesia_down消息後對本節點發起的事務與參與的事務進行相應處理。 orm

mnesia_locker進程則釋放對應節點佔用的鎖。 進程

最後mnesia_monitor進程記錄mnesia_down事件,並將該事件發送給訂閱者,而後進行自身相關狀態的清理工做。

3. 網絡分區恢復後mnesia又幹了些什麼事情

當網絡分區恢復後,mnesia_monitor進程收到一個{nodeup,Node}消息,若是本節點和遠端節點都認爲對方down掉過,即本地有記錄對應的mnesi_down事件,則上報inconsistent_database事件,而且mneisa不會再進行schema的merge及表數據的同步工做。

handle_info({nodeup, Node}, State) ->
    %% Ok, we are connected to yet another Erlang node
    %% Let's check if Mnesia is running there in order
    %% to detect if the network has been partitioned
    %% due to communication failure.

    HasDown   = mnesia_recover:has_mnesia_down(Node),
    ImRunning = mnesia_lib:is_running(),

    if
	%% If I'm not running the test will be made later.
	HasDown == true, ImRunning == yes ->
	    spawn_link(?MODULE, detect_partitioned_network, [self(), Node]);
	true ->
	    ignore
    end,
    {noreply, State};

detect_partitioned_network(Mon, Node) ->
    detect_inconcistency([Node], running_partitioned_network),
    unlink(Mon),
    exit(normal).

detect_inconcistency([], _Context) ->
    ok;
detect_inconcistency(Nodes, Context) ->
    Downs = [N || N <- Nodes, mnesia_recover:has_mnesia_down(N)],
    {Replies, _BadNodes} =
	rpc:multicall(Downs, ?MODULE, has_remote_mnesia_down, [node()]),
    report_inconsistency(Replies, Context, ok).

report_inconsistency([{true, Node} | Replies], Context, _Status) ->
    %% Oops, Mnesia is already running on the
    %% other node AND we both regard each
    %% other as down. The database is
    %% potentially inconsistent and we has to
    %% do tell the applications about it, so
    %% they may perform some clever recovery
    %% action.
    Msg = {inconsistent_database, Context, Node},
    mnesia_lib:report_system_event(Msg),
    report_inconsistency(Replies, Context, inconsistent_database);
report_inconsistency([{false, _Node} | Replies], Context, Status) ->
    report_inconsistency(Replies, Context, Status);
report_inconsistency([{badrpc, _Reason} | Replies], Context, Status) ->
    report_inconsistency(Replies, Context, Status);
report_inconsistency([], _Context, Status) ->
    Status.
===========================================================

出現網絡分區後,能夠經過mnesia:set_master_nodes(Nodes)或者mnesia:change_config(extra_db_nodes,Nodes)簡單的進行恢復,可是這種處理方式容易出現數據丟失的問題。好一點的處理方式是訂閱mnesia的系統事件,自行編寫數據恢復的處理程序,在收到inconsistent_database消息後進行相應的處理。

相關文章
相關標籤/搜索