最近在看paxos協議,感受很複雜,一步步來吧,學習的過程就是這樣。併發
Mike Burrows, inventor of the Chubby service at Google, says that 「there is only one consensus protocol, and that’s Paxos」 – all other approaches are just broken versions of Paxos. app
摘自:Consensus Protocols: Two-Phase Commit學習
相對來講,raft使用狀態機來簡化paxos協議,而後riak_ensemble實現的是Multi-Paxos協議。理論上說,Multi-Paxos比raft協議要併發強。code
今天看到riak_ensemble中的文件檢驗,其中使用了CRC32檢驗。xml
%% 在riak_ensemble_save.erl文件中 write(File, Data) -> CRC = erlang:crc32(Data), Size = byte_size(Data), Meta = <<CRC:32/integer, Size:32/integer>>, Out = [Meta, Data, %% copy 1 Data, Meta], %% copy 2 ok = filelib:ensure_dir(File), try _ = Out, ok = riak_ensemble_util:replace_file(File, Out), ok = riak_ensemble_util:replace_file(File ++ ".backup", Out), ok catch _:Err -> {error, Err} end. %% 在riak_ensemble_util.erl文件中 replace_file(FN, Data) -> TmpFN = FN ++ ".tmp", {ok, FH} = file:open(TmpFN, [write, raw]), try ok = file:write(FH, Data), ok = file:sync(FH), ok = file:close(FH), ok = file:rename(TmpFN, FN), {ok, Contents} = read_file(FN), true = (Contents == iolist_to_binary(Data)), ok catch _:Err -> {error, Err} end. read_file(FName) -> case file:open(FName, [read, raw, binary]) of {ok, FD} -> Result = read_file(FD, []), ok = file:close(FD), case Result of {ok, IOList} -> {ok, iolist_to_binary(IOList)}; {error, _}=Err -> Err end; {error,_}=Err -> Err end. read_file(FD, Acc) -> case file:read(FD, 4096) of {ok, Data} -> read_file(FD, [Data|Acc]); eof -> {ok, lists:reverse(Acc)}; {error, _}=Err -> Err end.
能夠看到,寫文件前,先檢驗文件內容的crc32值以及長度,而後同時寫入文件和文件備份,寫入後,再以4k對齊方式讀入文件內容,確保文件寫入沒有問題。使用try catch 確保文件寫入、同步、檢驗是在一個事務裏面。而後在寫入文件和文件備份時,使用try catch確保文件和文件備份是同時完成的。blog
而後是文件讀入,這時也要檢驗:事務
%% 在riak_ensemble_save.erl 文件中 do_read(File) -> case riak_ensemble_util:read_file(File) of {ok, Binary} -> safe_read(Binary); {error, _} -> not_found end. safe_read(<<CRC:32/integer, Size:32/integer, Data:Size/binary, Rest/binary>>) -> case erlang:crc32(Data) of CRC -> {ok, Data}; _ -> safe_read_backup(Rest) end; safe_read(Binary) -> safe_read_backup(Binary). safe_read_backup(Binary) when byte_size(Binary) =< 8 -> not_found; safe_read_backup(Binary) -> BinSize = byte_size(Binary), Skip = BinSize - 8, <<_:Skip/binary, CRC:32/integer, Size:32/integer>> = Binary, Skip2 = Skip - Size, case Binary of <<_:Skip2/binary, Data:Size/binary, _:8/binary>> -> case erlang:crc32(Data) of CRC -> {ok, Data}; _ -> not_found end; _ -> not_found end.
能夠看到,讀入文件時,會按照上面寫入的規則經過crc32值來檢驗一下文件內容。若是檢驗失敗,則嘗試去掉內容頭部,而後對內容按照規格檢驗。ip