後來經過調查發現是由於程序中同時使用了多線程,多進程以及 logging 模塊,致使子進程中出現了死鎖的狀況。python
當建立子進程的時候,後臺線程中的 logging 模塊正好獲取了一個鎖(threading.RLock
)在記錄日誌信息。因爲在 unix/linux 平臺下 Python 是經過 fork 來建立子進程的,所以建立子進程的時候會把 logging 中的鎖也複製了一份,當子進程中須要記錄日誌的時候發現 logging 的鎖一直處於被佔用的狀態,從而出現了死鎖(複製的這個鎖永遠也不會被釋放,由於它的全部者是父進程的某個線程,可是這個線程釋放鎖的時候又不會影響子進程裏的這個鎖)。linux
import os import sys import threading import time class ThreadWorker(threading.Thread): def __init__(self): print('ThreadWorker: init') super().__init__() def run(self): print('ThreadWorker: running (rlock = {0})'.format(global_rlock)) global_rlock.acquire() print('ThreadWorker: i got lock {0}'.format(global_rlock)) time.sleep(5) global_rlock.release() print('ThreadWorker: release lock {0} and ' 'sleeping forever'.format(global_rlock)) time.sleep(600000) global_rlock = threading.RLock(verbose=True) worker = ThreadWorker() worker.start() time.sleep(1) print('forking') pid = os.fork() if pid != 0: # pid != 0 當前處於父進程 print('parent: running (rlock = {0})'.format(global_rlock)) else: # pid = 0 當前處於子進程 print('child: running (rlock = {0}), ' 'getting the lock...'.format(global_rlock)) global_rlock.acquire() print('child: got the lock {0}'.format(global_rlock)) sys.exit(0) time.sleep(10)
$ python fork.py ThreadWorker: init ThreadWorker: running (rlock = <unlocked _thread.RLock object owner=0 count=0 at 0x10116cb40>) ThreadWorker: i got lock <locked _thread.RLock object owner=123145307557888 count=1 at 0x10116cb40> forking parent: running (rlock = <locked _thread.RLock object owner=123145307557888 count=1 at 0x10116cb40>) child: running (rlock = <locked _thread.RLock object owner=123145307557888 count=1 at 0x10116cb40>), getting the lock... ThreadWorker: release lock <unlocked _thread.RLock object owner=0 count=0 at 0x10116cb40> and sleeping forever
那麼, 應該如何解決這個問題呢?至少有三種解決辦法:ui
import os import sys import threading import time class ThreadWorker(threading.Thread): def __init__(self): print('ThreadWorker: init') super().__init__() def run(self): print('ThreadWorker: running (rlock = {0})'.format(global_rlock)) global_rlock.acquire() print('ThreadWorker: i got lock {0}'.format(global_rlock)) time.sleep(5) global_rlock.release() print('ThreadWorker: release lock {0} and ' 'sleeping forever'.format(global_rlock)) time.sleep(600000) global_rlock = threading.RLock(verbose=True) worker = ThreadWorker() print('forking') pid = os.fork() if pid != 0: # pid != 0 當前處於父進程 print('parent: running (rlock = {0})'.format(global_rlock)) worker.start() else: # pid = 0 當前處於子進程 time.sleep(1) print('child: running (rlock = {0}), ' 'getting the lock...'.format(global_rlock)) global_rlock.acquire() print('child: got the lock {0}'.format(global_rlock)) global_rlock.release() print('child: release the lock {0}'.format(global_rlock)) sys.exit(0) time.sleep(10)
$ python fork2.py ThreadWorker: init forking parent: running (rlock = <unlocked _thread.RLock object owner=0 count=0 at 0x10f24cb70>) ThreadWorker: running (rlock = <unlocked _thread.RLock object owner=0 count=0 at 0x10f24cb70>) ThreadWorker: i got lock <locked _thread.RLock object owner=123145307557888 count=1 at 0x10f24cb70> child: running (rlock = <unlocked _thread.RLock object owner=0 count=0 at 0x10f24cb70>), getting the lock... child: got the lock <locked _thread.RLock object owner=140735162044416 count=1 at 0x10f24cb70> child: release the lock <unlocked _thread.RLock object owner=0 count=0 at 0x10f24cb70> ThreadWorker: release lock <unlocked _thread.RLock object owner=0 count=0 at 0x10f24cb70> and sleeping forever
不要混合使用 threading, multiprocessing, logging/其餘使用了線程鎖的模塊。 要麼都是多線程,要麼都是多進程。
另外一個辦法就是配置 logging 使用無鎖的 handler 來記錄日誌信息。
Issue 6721: Locks in the standard library should be sanitized on fork - Python tracker
multithreading - Deadlock with logging multiprocess/multithread python script - Stack Overflow
python - 使用multiprocessing.Process調用start方法後,有較小的概率子進程中run方法未執行 - SegmentFault
python multiprocessing hanging, potential queue memory error? - Stack Overflow
Threads and fork(): think twice before mixing them. | Linux Programming Blog