OTP 平臺的容錯性高,是由於它提供了機制來監控全部 processes 的狀態,若是有進程出現異常, 不只能夠及時檢測到錯誤,還能夠對 processes 進行重啓等操做。html
有了 supervisor,能夠有效的提升系統的可用性,一個 supervior 監督一個或多個應用, 同時, supervior 也能夠監督 supervior,從而造成一個監督樹,提升整個系統的可用性。測試
注意 ,supervior 最好只用於監督,不要有其餘的業務邏輯處理,越是接近監督樹根部的 supervior 就要越簡單, 由於 supervior 簡單就不容易出錯,它是保證系統高可用的關鍵。rest
下面,使用 elixir 中提供的 Supervisor 模塊,構造簡單的監督示例來演示如何提升系統的可用性。code
監督策略有4種:server
監督策略的轉換很是簡單,下面演示2種監督策略的示例:htm
defmodule PseudoServerA do use GenServer def start_link(state, opts \\ []) do GenServer.start_link(__MODULE__, state, opts) end def handle_call(:display, _from, []) do {:reply, 'ServerA PID: ' ++ :erlang.pid_to_list(self()), []} end def handle_cast(:err, []) do {:stop, "stop ServerA", []} end end defmodule PseudoServerB do use GenServer def start_link(state, opts \\ []) do GenServer.start_link(__MODULE__, state, opts) end def handle_call(:display, _from, []) do {:reply, 'ServerB PID: ' ++ :erlang.pid_to_list(self()), []} end def handle_cast(:err, []) do {:stop, "stop ServerB", []} end end defmodule PseudoServerC do use GenServer def start_link(state, opts \\ []) do GenServer.start_link(__MODULE__, state, opts) end def handle_call(:display, _from, []) do {:reply, 'ServerC PID: ' ++ :erlang.pid_to_list(self()), []} end def handle_cast(:err, []) do {:stop, "stop ServerC", []} end end defmodule SupervisorTest do import Supervisor.Spec def init() do children = [ worker(PseudoServerA, [[], [name: :server_a]]), worker(PseudoServerB, [[], [name: :server_b]]), worker(PseudoServerC, [[], [name: :server_c]]) ] # Start the supervisor with children Supervisor.start_link(children, strategy: :one_for_one) end end
測試方式:blog
$ iex -S mix # 啓動 supervisor 及其監督的3個 process iex(1)> SupervisorTest.init {:ok, #PID<0.145.0>} # 啓動後, 3個 process 的 PID 以下 iex(2)> GenServer.call(:server_a, :display) 'ServerA PID: <0.146.0>' iex(3)> GenServer.call(:server_b, :display) 'ServerB PID: <0.147.0>' iex(4)> GenServer.call(:server_c, :display) 'ServerC PID: <0.148.0>' # 經過消息 :err 讓 serverA 出錯 iex(5)> GenServer.cast(:server_a, :err) :ok iex(6)> 14:47:53.119 [error] GenServer :server_a terminating ** (stop) "stop ServerA" Last message: {:"$gen_cast", :err} State: [] nil # serverA 出錯後,再次查看3個process的PID,發現 supervisor 只重啓了 serverA,符合策略 :one_for_one iex(7)> GenServer.call(:server_a, :display) 'ServerA PID: <0.155.0>' iex(8)> GenServer.call(:server_b, :display) 'ServerB PID: <0.147.0>' iex(9)> GenServer.call(:server_c, :display) 'ServerC PID: <0.148.0>'
咱們換一種監督策略試試看,只須要將上面的代碼進程
# Start the supervisor with children Supervisor.start_link(children, strategy: :one_for_one)
改爲get
# Start the supervisor with children Supervisor.start_link(children, strategy: :one_for_all)
測試步驟 和 one_for_one 同樣:it
$ iex -S mix # 啓動 supervisor 及其監督的3個 process iex(1)> SupervisorTest.init {:ok, #PID<0.145.0>} # 啓動後, 3個 process 的 PID 以下 iex(2)> GenServer.call(:server_a, :display) 'ServerA PID: <0.146.0>' iex(3)> GenServer.call(:server_b, :display) 'ServerB PID: <0.147.0>' iex(4)> GenServer.call(:server_c, :display) 'ServerC PID: <0.148.0>' # 經過消息 :err 讓 serverA 出錯 iex(5)> GenServer.cast(:server_a, :err) :ok iex(6)> 14:55:16.183 [error] GenServer :server_a terminating ** (stop) "stop ServerA" Last message: {:"$gen_cast", :err} State: [] nil # serverA 出錯後,再次查看3個process的PID,發現 supervisor 重啓了全部 process,符合策略 :one_for_all iex(7)> GenServer.call(:server_a, :display) 'ServerA PID: <0.153.0>' iex(8)> GenServer.call(:server_b, :display) 'ServerB PID: <0.154.0>' iex(9)> GenServer.call(:server_c, :display) 'ServerC PID: <0.156.0>'
監督者並非一維的,監督者也能夠監督其它監督者,從而造成樹狀的監督關係。
修改上面的測試代碼以下:(只修改了 Supervisor 的部分)
defmodule PseudoServerA do use GenServer def start_link(state, opts \\ []) do GenServer.start_link(__MODULE__, state, opts) end def handle_call(:display, _from, []) do {:reply, 'ServerA PID: ' ++ :erlang.pid_to_list(self()), []} end def handle_cast(:err, []) do {:stop, "stop ServerA", []} end end defmodule PseudoServerB do use GenServer def start_link(state, opts \\ []) do GenServer.start_link(__MODULE__, state, opts) end def handle_call(:display, _from, []) do {:reply, 'ServerB PID: ' ++ :erlang.pid_to_list(self()), []} end def handle_cast(:err, []) do {:stop, "stop ServerB", []} end end defmodule PseudoServerC do use GenServer def start_link(state, opts \\ []) do GenServer.start_link(__MODULE__, state, opts) end def handle_call(:display, _from, []) do {:reply, 'ServerC PID: ' ++ :erlang.pid_to_list(self()), []} end def handle_cast(:err, []) do {:stop, "stop ServerC", []} end end defmodule SupervisorBranch do import Supervisor.Spec def start_link(state) do children = [ worker(PseudoServerA, [[], [name: :server_a]]), worker(PseudoServerB, [[], [name: :server_b]]), ] Supervisor.start_link(children, strategy: :one_for_one) end end defmodule SupervisorRoot do import Supervisor.Spec def init() do children = [ supervisor(SupervisorBranch, [[name: :supervisor_branch]]), worker(PseudoServerC, [[], [name: :server_c]]) ] # Start the supervisor with children Supervisor.start_link(children, strategy: :one_for_all) end end
測試流程以下:
# 啓動 根 監督者 iex(1)> SupervisorRoot.init {:ok, #PID<0.149.0>} # 啓動後,查看 3 個process 的PID iex(2)> GenServer.call(:server_a, :display) 'ServerA PID: <0.151.0>' iex(3)> GenServer.call(:server_b, :display) 'ServerB PID: <0.152.0>' iex(4)> GenServer.call(:server_c, :display) 'ServerC PID: <0.153.0>' # 經過消息 :err 讓 serverA 出錯 iex(5)> GenServer.cast(:server_a, :err) :ok iex(6)> 15:31:15.846 [error] GenServer :server_a terminating ** (stop) "stop ServerA" Last message: {:"$gen_cast", :err} State: [] nil # serverA 出錯後,由於它的監督者 SupervisorBranch 的策略是 :one_for_one,因此只重啓了 serverA iex(7)> GenServer.call(:server_a, :display) 'ServerA PID: <0.158.0>' iex(8)> GenServer.call(:server_b, :display) 'ServerB PID: <0.152.0>' iex(9)> GenServer.call(:server_c, :display) 'ServerC PID: <0.153.0>' # 經過消息 :err 讓 serverC 出錯 iex(10)> GenServer.cast(:server_c, :err) :ok 15:31:35.264 [error] GenServer :server_c terminating ** (stop) "stop ServerC" Last message: {:"$gen_cast", :err} State: [] # serverC 出錯後,由於它的監督者 SupervisorRoot 的策略是 :one_for_all,因此全部的 proocess 都重啓了 iex(11)> GenServer.call(:server_a, :display) 'ServerA PID: <0.166.0>' iex(12)> GenServer.call(:server_c, :display) 'ServerC PID: <0.168.0>' iex(13)> GenServer.call(:server_b, :display) 'ServerB PID: <0.167.0>'
經過監督樹,咱們能夠給不一樣的 process 分組,而後讓每一個組有不一樣的監督策略。
有了監督機制,能夠及時的把握全部 process 的狀態,經過監督樹,還能夠加入不一樣恢復機制。 所以,用好 Supervisor 模塊,能夠極大提升系統的可用性。
Supervisor 模塊詳細內容能夠參見:http://elixir-lang.org/docs/stable/elixir/Supervisor.html