Spark Launcher是一個很好的spark job提交工具,可是有時候會發現提交spark job以後會一直等待,將spark launcher這個進程kill 以後發現spark job狀態從RUNNING變成FINISHED了,其實這個是否並不是是等待spark job,而是由於進程發生了阻塞。html
Spark Launcher 其實也是用了Java 的 ProcessBuilder,通過查閱發現答案在這裏。
文檔中也標註了:
By default, the created subprocess does not have its own terminal or console. All its standard I/O (i.e. stdin, stdout, stderr) operations will be redirected to the parent process, where they can be accessed via the streams obtained using the methods getOutputStream()
, getInputStream()
, and getErrorStream()
. The parent process uses these streams to feed input to and get output from the subprocess. Because some native platforms only provide limited buffer size for standard input and output streams, failure to promptly write the input stream or read the output stream of the subprocess may cause the subprocess to block, or even deadlock.java
尤爲要注意的是,在Spark Launcher中,spark的INFO都是在 errorStream裏。api
因此在 spark輸出的時候,要開子線程來讀取buffer裏面的內容,以避免將buffer填滿,致使進程阻塞。oracle