threading + multiprocessing + logging = 死鎖 ?

前段時間有個程序忽然出現了子進程不工做的狀況。html

後來經過調查發現是由於程序中同時使用了多線程,多進程以及 logging 模塊,致使子進程中出現了死鎖的狀況。python

當建立子進程的時候,後臺線程中的 logging 模塊正好獲取了一個鎖(threading.RLock)在記錄日誌信息。因爲在 unix/linux 平臺下 Python 是經過 fork 來建立子進程的,所以建立子進程的時候會把 logging 中的鎖也複製了一份,當子進程中須要記錄日誌的時候發現 logging 的鎖一直處於被佔用的狀態,從而出現了死鎖(複製的這個鎖永遠也不會被釋放,由於它的全部者是父進程的某個線程,可是這個線程釋放鎖的時候又不會影響子進程裏的這個鎖)。linux

復現問題的代碼以下:segmentfault

import os
import sys
import threading
import time


class ThreadWorker(threading.Thread):
    def __init__(self):
        print('ThreadWorker: init')
        super().__init__()

    def run(self):
        print('ThreadWorker: running (rlock = {0})'.format(global_rlock))

        global_rlock.acquire()
        print('ThreadWorker: i got lock {0}'.format(global_rlock))
        time.sleep(5)
        global_rlock.release()
        print('ThreadWorker: release lock {0} and '
              'sleeping forever'.format(global_rlock))

        time.sleep(600000)

global_rlock = threading.RLock(verbose=True)
worker = ThreadWorker()
worker.start()

time.sleep(1)
print('forking')
pid = os.fork()
if pid != 0:    # pid != 0 當前處於父進程
    print('parent: running (rlock = {0})'.format(global_rlock))
else:      # pid = 0 當前處於子進程
    print('child: running (rlock = {0}), '
          'getting the lock...'.format(global_rlock))
    global_rlock.acquire()
    print('child: got the lock {0}'.format(global_rlock))
    sys.exit(0)

time.sleep(10)

上面代碼的執行結果以下:多線程

$ python fork.py
ThreadWorker: init
ThreadWorker: running (rlock = <unlocked _thread.RLock object owner=0 count=0 at 0x10116cb40>)
ThreadWorker: i got lock <locked _thread.RLock object owner=123145307557888 count=1 at 0x10116cb40>
forking
parent: running (rlock = <locked _thread.RLock object owner=123145307557888 count=1 at 0x10116cb40>)
child: running (rlock = <locked _thread.RLock object owner=123145307557888 count=1 at 0x10116cb40>), getting the lock...
ThreadWorker: release lock <unlocked _thread.RLock object owner=0 count=0 at 0x10116cb40> and sleeping forever

從上面的結果中能夠看出來:雖然線程隨後釋放了得到的鎖,可是子進程卻永遠的卡在了獲取鎖的地方。app

那麼, 應該如何解決這個問題呢?至少有三種解決辦法:ui

  • 先建立子進程,而後再建立線程:線程

import os
import sys
import threading
import time


class ThreadWorker(threading.Thread):
    def __init__(self):
        print('ThreadWorker: init')
        super().__init__()

    def run(self):
        print('ThreadWorker: running (rlock = {0})'.format(global_rlock))

        global_rlock.acquire()
        print('ThreadWorker: i got lock {0}'.format(global_rlock))
        time.sleep(5)
        global_rlock.release()
        print('ThreadWorker: release lock {0} and '
              'sleeping forever'.format(global_rlock))

        time.sleep(600000)

global_rlock = threading.RLock(verbose=True)
worker = ThreadWorker()

print('forking')
pid = os.fork()
if pid != 0:    # pid != 0 當前處於父進程
    print('parent: running (rlock = {0})'.format(global_rlock))
    worker.start()
else:      # pid = 0 當前處於子進程
    time.sleep(1)
    print('child: running (rlock = {0}), '
          'getting the lock...'.format(global_rlock))
    global_rlock.acquire()
    print('child: got the lock {0}'.format(global_rlock))
    global_rlock.release()
    print('child: release the lock {0}'.format(global_rlock))
    sys.exit(0)

time.sleep(10)

結果:unix

$ python fork2.py
ThreadWorker: init
forking
parent: running (rlock = <unlocked _thread.RLock object owner=0 count=0 at 0x10f24cb70>)
ThreadWorker: running (rlock = <unlocked _thread.RLock object owner=0 count=0 at 0x10f24cb70>)
ThreadWorker: i got lock <locked _thread.RLock object owner=123145307557888 count=1 at 0x10f24cb70>
child: running (rlock = <unlocked _thread.RLock object owner=0 count=0 at 0x10f24cb70>), getting the lock...
child: got the lock <locked _thread.RLock object owner=140735162044416 count=1 at 0x10f24cb70>
child: release the lock <unlocked _thread.RLock object owner=0 count=0 at 0x10f24cb70>
ThreadWorker: release lock <unlocked _thread.RLock object owner=0 count=0 at 0x10f24cb70> and sleeping forever

能夠看到子進程和線程都可以正常獲取鎖。日誌

  • 不要混合使用 threading, multiprocessing, logging/其餘使用了線程鎖的模塊。 要麼都是多線程,要麼都是多進程。

  • 另外一個辦法就是配置 logging 使用無鎖的 handler 來記錄日誌信息。

參考資料

原文地址: https://mozillazg.com/2016/09...

相關文章
相關標籤/搜索