poolboy max_overflow 參數坑

問題

某個服務節點在較低的qps(每秒2000次數據庫訪問)下, 在worker進程數100, max_overflow進程數100的狀況下. 忽然性能降低, 每秒只能處理1500次數據庫訪問. 致使請求處理延時從幾MS上升至幾百MS, 以後又逐漸恢復.mongodb

緣由

逐漸把範圍縮小至 mongodb poolboy 進程池的 checkout:數據庫

check out

handle_call({checkout, CRef, Block}, {FromPid, _} = From, State) ->
    #state{supervisor = Sup,
           workers = Workers,
           monitors = Monitors,
           overflow = Overflow,
           max_overflow = MaxOverflow} = State,
    case Workers of
        [Pid | Left] ->
            MRef = erlang:monitor(process, FromPid),
            true = ets:insert(Monitors, {Pid, CRef, MRef}),
            {reply, Pid, State#state{workers = Left}};
        [] when MaxOverflow > 0, Overflow < MaxOverflow ->
            {Pid, MRef} = new_worker(Sup, FromPid),
            true = ets:insert(Monitors, {Pid, CRef, MRef}),
            {reply, Pid, State#state{overflow = Overflow + 1}};
        [] when Block =:= false ->
            {reply, full, State};
        [] ->
            MRef = erlang:monitor(process, FromPid),
            Waiting = queue:in({From, CRef, MRef}, State#state.waiting),
            {noreply, State#state{waiting = Waiting}}
    end;

能夠看到, 當max_overflow不爲0時, 瞬間過載會建立新的worker, 而這些worker, 都會去連接mongodb, 耗時1-2MS. 建立的消耗會阻塞master process.性能

check in

而歸還時, 又會將worker銷燬, 致使連接一直建立/銷燬, 並且都卡在master process, 這致使全部的請求, 都會因master process的連接建立和銷燬而阻塞, 致使qps雪崩降低.code

handle_checkin(Pid, State) ->
    #state{supervisor = Sup,
           waiting = Waiting,
           monitors = Monitors,
           overflow = Overflow,
           strategy = Strategy} = State,
    case queue:out(Waiting) of
        {{value, {From, CRef, MRef}}, Left} ->
            true = ets:insert(Monitors, {Pid, CRef, MRef}),
            gen_server:reply(From, Pid),
            State#state{waiting = Left};
        {empty, Empty} when Overflow > 0 ->
            ok = dismiss_worker(Sup, Pid),
            State#state{waiting = Empty, overflow = Overflow - 1};
        {empty, Empty} ->
            Workers = case Strategy of
                lifo -> [Pid | State#state.workers];
                fifo -> State#state.workers ++ [Pid]
            end,
            State#state{workers = Workers, waiting = Empty, overflow = 0}
    end.

結論

不要使用 poolboy 的 max_overflow, 若建立/銷燬 children process時有必定消耗, 很容易阻塞 poolboy master進程, 頻繁建立/銷燬 worker 致使雪崩.server

每次查BUG, 回頭看來都是理所固然. 追查時卻要費一番心思, 監控數據不便在我的blog給出. 難免省掉不少推斷過程, 但願這個結論對你們有幫助.blog

相關文章
相關標籤/搜索