Python：線程爲何搞個setDaemon

時間 2019-11-07

原文原文鏈接

前言

使用 Python 都不會錯過線程這個知識，可是每次談到線程，你們都下意識說 GIL 全局鎖，python

但其實除了這個老生常談的話題，還有不少有價值的東西能夠探索的，譬如：setDaemon()。bootstrap

線程的使用與存在的問題

咱們會寫這樣的代碼來啓動多線程:segmentfault

import time
import threading

def test():
    while True:
        print threading.currentThread()
        time.sleep(1)

if __name__ == '__main__':
    t1 = threading.Thread(target=test)
    t2 = threading.Thread(target=test)
    t1.start()
    t2.start()

輸出：多線程

^C<Thread(Thread-2, started 123145414086656)>
<Thread(Thread-1, started 123145409880064)>
^C^C^C^C^C^C<Thread(Thread-2, started 123145414086656)>    # ctrl-c 屢次都沒法中斷
 <Thread(Thread-1, started 123145409880064)>
^C<Thread(Thread-1, started 123145409880064)>
 <Thread(Thread-2, started 123145414086656)>
<Thread(Thread-1, started 123145409880064)>
 <Thread(Thread-2, started 123145414086656)>
<Thread(Thread-2, started 123145414086656)><Thread(Thread-1, started 123145409880064)>
...（兩個線程競相打印）

經過 Threading 咱們能夠很簡單的實現併發的需求，可是同時也給咱們帶來了一個大難題: 怎麼退出呢？併發

在上面的程序運行中，我已經嘗試按了屢次的 ctrl-c，都沒法中斷這程序工做的熱情！最後是無可奈何用 kill 才結束。app

那麼怎樣才能能夠避免這種問題呢？或者說，怎樣才能在主線程退出的時候，子線程也自動退出呢？python2.7

守護線程

有過類似經驗的老司機確定就知道，setDaemon() 將線程搞成 守護線程 不就得了唄:ide

import time
import threading

def test():
    while True:
        print threading.currentThread()
        time.sleep(1)

if __name__ == '__main__':
    t1 = threading.Thread(target=test)
    t1.setDaemon(True)
    t1.start()

    t2 = threading.Thread(target=test)
    t2.setDaemon(True)
    t2.start()

輸出：函數

python2.7 1.py
<Thread(Thread-1, started daemon 123145439883264)>
<Thread(Thread-2, started daemon 123145444089856)>
（直接退出了）

直接退出？理所固然，由於主線程已經執行完了，確實是已經結束了，正由於設置了守護線程，因此這時候子線程也一併退出了。操作系統

突如其來的 daemon

那麼問題來了，咱們之前學 C 語言的時候，好像不用 Daemon 也能夠啊，好比這個：

#include <stdio.h>
#include <sys/syscall.h>
#include <pthread.h>

void *test(void *args)
{
    while (1)
    {
        printf("ThreadID: %d\n", syscall(SYS_gettid));
        sleep(1);
    }
}

int main()
{
    pthread_t t1 ;
    int ret = pthread_create(&t1, NULL, test, NULL);
    if (ret != 0)
    {
        printf("Thread create failed\n");
    }
   
    // 避免直接退出
    sleep(2);
    printf("Main run..\n");
}

輸出：

# gcc -lpthread test_pytha.out & ./a
ThreadID: 31233
ThreadID: 31233
Main run.. （堅決果斷退出了）

既然 Python 也是用 C 寫的，爲何 Python 多線程退出須要 setDaemon ？？？

想要解決這個問題，咱們怕不是要從主線程退出的一刻開始講起，從前....

反藤摸瓜

Python 解析器在結束的時候，會調用 wait_for_thread_shutdown 來作個例行清理：

// python2.7/python/pythonrun.c

static void
wait_for_thread_shutdown(void)
{
#ifdef WITH_THREAD
    PyObject *result;
    PyThreadState *tstate = PyThreadState_GET();
    PyObject *threading = PyMapping_GetItemString(tstate->interp->modules,
                                                  "threading");
    if (threading == NULL) {
        /* threading not imported */
        PyErr_Clear();
        return;
    }
    result = PyObject_CallMethod(threading, "_shutdown", "");
    if (result == NULL)
        PyErr_WriteUnraisable(threading);
    else
        Py_DECREF(result);
    Py_DECREF(threading);
#endif
}

咱們看到 #ifdef WITH_THREAD 就大概猜到對因而否多線程，這個函數是運行了不一樣的邏輯的

很明顯，咱們上面的腳本，就是命中了這個線程邏輯，因此它會動態 import threading 模塊，而後執行 _shutdown 函數。

這個函數的內容，咱們能夠從 threading 模塊看到：

# /usr/lib/python2.7/threading.py

_shutdown = _MainThread()._exitfunc

class _MainThread(Thread):

    def __init__(self):
        Thread.__init__(self, name="MainThread")
        self._Thread__started.set()
        self._set_ident()
        with _active_limbo_lock:
            _active[_get_ident()] = self

    def _set_daemon(self):
        return False

    def _exitfunc(self):
        self._Thread__stop()
        t = _pickSomeNonDaemonThread()
        if t:
            if __debug__:
                self._note("%s: waiting for other threads", self)
        while t:
            t.join()
            t = _pickSomeNonDaemonThread()
        if __debug__:
            self._note("%s: exiting", self)
        self._Thread__delete()

def _pickSomeNonDaemonThread():
    for t in enumerate():
        if not t.daemon and t.is_alive():
            return t
    return None

_shutdown 實際上也就是 _MainThread()._exitfunc 的內容，主要是將 enumerate() 返回的全部結果，所有 join() 回收

而 enumerate() 是什麼？

這個平時咱們也會使用，就是當前進程的全部 符合條件 的 Python線程對象:

>>> print threading.enumerate()
[<_MainThread(MainThread, started 140691994822400)>]

# /usr/lib/python2.7/threading.py

def enumerate():
    """Return a list of all Thread objects currently alive.

    The list includes daemonic threads, dummy thread objects created by
    current_thread(), and the main thread. It excludes terminated threads and
    threads that have not yet been started.

    """
    with _active_limbo_lock:
        return _active.values() + _limbo.values()

符合條件？？？符合什麼條件？？不着急，容我娓娓道來：

從起源談存活條件

在 Python 的線程模型裏面，雖然有 GIL 的干涉，可是線程倒是實實在在的原生線程

Python 只是多加一層封裝: t_bootstrap，而後再在這層封裝裏面執行真正的處理函數。

在 threading 模塊內，咱們也能看到一個類似的：

# /usr/lib/python2.7/threading.py

class Thread(_Verbose):
    def start(self):
        ...省略
        with _active_limbo_lock:
            _limbo[self] = self             # 重點
        try:
            _start_new_thread(self.__bootstrap, ())
        except Exception:
            with _active_limbo_lock:
                del _limbo[self]            # 重點
            raise
        self.__started.wait()
        
    def __bootstrap(self):
        try:
            self.__bootstrap_inner()
        except:
            if self.__daemonic and _sys is None:
                return
            raise
         
    def __bootstrap_inner(self):
        try:
            ...省略
            with _active_limbo_lock:
                _active[self.__ident] = self # 重點
                del _limbo[self]             # 重點
            ...省略

在上面的一連串代碼中，_limbo 和 _active 的變化都已經標記了重點，咱們能夠獲得下面的定義：

_limbo : 就是調用了 start，可是還沒來得及 _start_new_thread 的對象
    _active: 活生生的線程對象

那麼回到上文，當 _MainThread()._exitfunc 執行時，是會檢查整個進程是否存在 _limbo + _active 的對象，

只要存在一個，就會調用 join(), 這個也就是堵塞的緣由。

setDaemon 用處

無限期堵塞不行，自做聰明幫用戶強殺線程也不是辦法，那麼怎麼作纔會比較優雅呢？

那就是提供一個途徑，讓用戶來設置隨進程退出的標記，那就是 setDaemon：

class Thread():
    ...省略
    def setDaemon(self, daemonic):
        self.daemon = daemonic
        
    ...省略
  
# 其實上面也貼了，這裏再貼一次
def _pickSomeNonDaemonThread():
    for t in enumerate():
        if not t.daemon and t.is_alive():
            return t
    return None

只要子線程，所有設置 setDaemon(True), 那麼主線程一準備退出，全都乖乖地由操做系統銷燬回收。

以前一直很好奇，pthread 都沒有 daemon 屬性，爲何 Python 會有呢？

結果這玩意就是真的是僅做用於 Python 層（手動笑臉）