線上使用java做hive的任務調度,通過java Runtime 類的exec方法來執行hive的job。最近發現有job卡死的情況。具體表現是調度的腳本掛起,而且還可能導致hive的lock不能正常釋放。
使用jstack打印java的thread信息:
發現如下的lock,最終定位到waitFor函數。
"main" prio=10 tid=0x000000005b24c800 nid=0x280e in Object.wait() [0x00002b3dee8e7000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000000eb6f88b8> (a java.lang.UNIXProcess)
at java.lang.Object.wait(Object.java:485)
at java.lang.UNIXProcess.waitFor(UNIXProcess.java:165)
- locked <0x00000000eb6f88b8> (a java.lang.UNIXProcess)
at calltest.main(calltest.java:8)
在java的api中發現關於的Runtime類的描述,每個jvm都有一個Runtime類的實例,Runtime類會起一個單獨的進程來運行相關的命令
Every Java application has a single instance of class Runtimethat allows the application to interface with
the environment in which the application is running.
Executes the specified string command in a separate process with the specified environment and working directory.
同時,發現如下的解釋:
即由exec生成的進程沒有自己的console,會和父進程有IO操作聯系,在輸出數據比較大的時候,可能會導致緩沖區寫滿,從而導致進程死鎖,而當前進程由於設置了waitFor會一直在等待子進程結束,從而當前子進程也進入阻塞狀態。
The created subprocess does not have its own terminal or console. All its standard io (i.e. stdin, stdout, stderr) operations will be redirected to the parent process through three streams (getOutputStream(), getInputStream(), getErrorStream()). The parent process uses these streams to feed input to and get output from the subprocess. Because some native platforms only provide limited buffer size for standard input and output streams, failure to promptly write the input stream or read the output stream of the subprocess may cause the subprocess to block, and even deadlock.
waitFor方法:
causes the current thread to wait, if necessary, until the process represented by this Process object has terminated. This method returns immediately if the subprocess has already terminated. If the subprocess has not yet terminated, the calling thread will be blocked until the subprocess exits.
解決方法:
可以在調用waitFor方法之前,為子進程創建兩個線程,來讀取標准輸出和標准錯誤輸出即可。