[erlang PR獵人] Optimize v3_kernel for thousands of clauses #2165

本 PR 中, Jose Valim 優化了 v3_kernel 中對於 match clauses 的處理函數 match_con/4, 在有上千個clauses 的狀況下, 編譯速度有 10% 的提高.性能優化

如下這一段邏輯被刪除了, 這裏對 Cs 有一次遍歷:函數

%% old
match_con(Us, Cs0, Def, St) ->
    %% Expand literals at the top level.
    Cs = [expand_pat_lit_clause(C) || C <- Cs0],
    match_con_1(Us, Cs, Def, St).
複製代碼

它首先對 Cs 中全部的 clause 作了 expand_pat_lit_clause/1 操做. 以後 match_con_1/4 函數體中的邏輯, 與新代碼中有些許不一樣:性能

%% old
match_con_1([U|_Us] = L, Cs, Def, St0) ->
    %% Extract clauses for different constructors (types).
    %%ok = io:format("match_con ~p~n", [Cs]),
    Ttcs0 = select_types([k_binary], Cs) ++ select_bin_con(Cs) ++
        select_types([k_cons,k_tuple,k_map,k_atom,k_float,
                      k_int,k_nil], Cs),
    Ttcs = opt_single_valued(Ttcs0),

%% new
match_con([U|_Us] = L, Cs, Def, St0) ->
    Ttcs0 = select_types(Cs, [], [], [], [], [], [], [], [], []),
    Ttcs1 = [{T, Types} || {T, [_ | _] = Types} <- Ttcs0],
    Ttcs = opt_single_valued(Ttcs1),
複製代碼

注意到, 在執行最後一行以前, 都經過 select_types函數對 Cs 作了處理. 在old 代碼中, 對於k_binary type, 要遍歷一次Cs; 對於select_bin_con, 又要遍歷一次 Cs; 對於其它 types, 還要遍歷一次 Cs. 而在new 代碼中, 只遍歷了一次 Cs. select_types 函數是這個 PR 裏改動最大的地方, 讓咱們來看一下:優化

%% old
select_types(Types, Cs) ->
    [{T,Tcs} || T <- Types, begin Tcs = select(T, Cs), Tcs =/= [] end].

%% select(Con, [Clause]) -> [Clause].

select(T, Cs) -> [ C || C <- Cs, clause_con(C) =:= T ].


%% new
select_types([NoExpC | Cs], Bin, BinCon, Cons, Tuple, Map, Atom, Float, Int, Nil) ->
    C = expand_pat_lit_clause(NoExpC),
    case clause_con(C) of
	k_binary ->
	    select_types(Cs, [C |Bin], BinCon, Cons, Tuple, Map, Atom, Float, Int, Nil);
	k_bin_seg ->
	    select_types(Cs, Bin, [C | BinCon], Cons, Tuple, Map, Atom, Float, Int, Nil);
	k_bin_end ->
	    select_types(Cs, Bin, [C | BinCon], Cons, Tuple, Map, Atom, Float, Int, Nil);
	k_cons ->
	    select_types(Cs, Bin, BinCon, [C | Cons], Tuple, Map, Atom, Float, Int, Nil);
	k_tuple ->
	    select_types(Cs, Bin, BinCon, Cons, [C | Tuple], Map, Atom, Float, Int, Nil);
	k_map ->
	    select_types(Cs, Bin, BinCon, Cons, Tuple, [C | Map], Atom, Float, Int, Nil);
	k_atom ->
	    select_types(Cs, Bin, BinCon, Cons, Tuple, Map, [C | Atom], Float, Int, Nil);
	k_float ->
	    select_types(Cs, Bin, BinCon, Cons, Tuple, Map, Atom, [C | Float], Int, Nil);
	k_int ->
	    select_types(Cs, Bin, BinCon, Cons, Tuple, Map, Atom, Float, [C | Int], Nil);
	k_nil ->
	    select_types(Cs, Bin, BinCon, Cons, Tuple, Map, Atom, Float, Int, [C | Nil])
    end;
select_types([], Bin, BinCon, Cons, Tuple, Map, Atom, Float, Int, Nil) ->
    [{k_binary, reverse(Bin)}] ++ handle_bin_con(reverse(BinCon)) ++
	[
	    {k_cons, reverse(Cons)},
	    {k_tuple, reverse(Tuple)},
	    {k_map, reverse(Map)},
	    {k_atom, reverse(Atom)},
	    {k_float, reverse(Float)},
	    {k_int, reverse(Int)},
	    {k_nil, reverse(Nil)}
	].
複製代碼

注意到儘管新代碼裏只須要遍歷一次 Cs, 但最後的結果仍是要每一個小 list 都作一次反轉的. 因此, 此 PR 的性能優化點是在把對於一個 list 的四次遍歷變爲了一次遍歷, 在 list 很長的狀況下優化會更明顯.atom

相關文章
相關標籤/搜索