python多線程與多進程及其區別

時間 2019-11-24

原文原文鏈接

我的一直以爲對學習任何知識而言，概念是至關重要的。掌握了概念和原理，細節能夠留給實踐去推敲。掌握的關鍵在於理解，經過具體的實例和實際操做來感性的體會概念和原理能夠起到很好的效果。本文經過一些具體的例子簡單介紹一下python的多線程和多進程，後續會寫一些進程通訊和線程通訊的一些文章。python

python多線程

python中提供兩個標準庫thread和threading用於對線程的支持，python3中已放棄對前者的支持，後者是一種更高層次封裝的線程庫，接下來均之後者爲例。安全

建立線程

python中有兩種方式實現線程：網絡

實例化一個threading.Thread的對象，並傳入一個初始化函數對象（initial function )做爲線程執行的入口；
繼承threading.Thread，並重寫run函數；

方式1：建立threading.Thread對象

import threading
import time

def tstart(arg):
    time.sleep(0.5)
    print("%s running...." % arg)

if __name__ == '__main__':
    t1 = threading.Thread(target=tstart, args=('This is thread 1',))
    t2 = threading.Thread(target=tstart, args=('This is thread 2',))
    t1.start()
    t2.start()
    print("This is main function")

結果：多線程

This is main function
This is thread 2 running....
This is thread 1 running....

View Code

方式2：繼承threading.Thread，並重寫run

import threading
import time

class CustomThread(threading.Thread):
    def __init__(self, thread_name):
        # step 1: call base __init__ function
        super(CustomThread, self).__init__(name=thread_name)
        self._tname = thread_name

    def run(self):
        # step 2: overide run function
        time.sleep(0.5)
        print("This is %s running...." % self._tname)

if __name__ == "__main__":
    t1 = CustomThread("thread 1")
    t2 = CustomThread("thread 2")
    t1.start()
    t2.start()
    print("This is main function")

執行結果同方式1.併發

threading.Thread

上面兩種方法本質上都是直接或者間接使用threading.Thread類socket

threading.Thread(group=None, target=None, name=None, args=(), kwargs={})ide

關聯上面兩種建立線程的方式：函數

import threading
import time

class CustomThread(threading.Thread):
    def __init__(self, thread_name, target = None):
        # step 1: call base __init__ function
        super(CustomThread, self).__init__(name=thread_name, target=target, args = (thread_name,))
        self._tname = thread_name

    def run(self):
        # step 2: overide run function
        # time.sleep(0.5)
        # print("This is %s running....@run" % self._tname)
        super(CustomThread, self).run()

def target(arg):
    time.sleep(0.5)
    print("This is %s running....@target" % arg)

if __name__ == "__main__":
    t1 = CustomThread("thread 1", target)
    t2 = CustomThread("thread 2", target)
    t1.start()
    t2.start()
    print("This is main function")

結果：性能

This is main function
This is thread 1 running....@target This is thread 2 running....@target

上面這段代碼說明：學習

兩種方式建立線程，指定的參數最終都會傳給threading.Thread類；
傳給線程的目標函數是在基類Thread的run函數體中被調用的，若是run沒有被重寫的話。

threading模塊的一些屬性和方法能夠參照官網，這裏重點介紹一下threading.Thread對象的方法

下面是threading.Thread提供的線程對象方法和屬性：

start()：建立線程後經過start啓動線程，等待CPU調度，爲run函數執行作準備；

run()：線程開始執行的入口函數，函數體中會調用用戶編寫的target函數，或者執行被重載的run函數；

join([timeout])：阻塞掛起調用該函數的線程，直到被調用線程執行完成或超時。一般會在主線程中調用該方法，等待其餘線程執行完成。

name、getName()&setName()：線程名稱相關的操做；

ident：整數類型的線程標識符，線程開始執行前（調用start以前）爲None；

isAlive()、is_alive()：start函數執行以後到run函數執行完以前都爲True；

daemon、isDaemon()&setDaemon()：守護線程相關；

這些是咱們建立線程以後經過線程對象對線程進行管理和獲取線程信息的方法。

多線程執行

在主線程中建立若線程以後，他們之間沒有任何協做和同步，除主線程以外每一個線程都是從run開始被執行，直到執行完畢。

join

咱們能夠經過join方法讓主線程阻塞，等待其建立的線程執行完成。

import threading
import time

def tstart(arg):
    print("%s running....at: %s" % (arg,time.time()))
    time.sleep(1)
    print("%s is finished! at: %s" % (arg,time.time()))

if __name__ == '__main__':
    t1 = threading.Thread(target=tstart, args=('This is thread 1',))
    t1.start()
    t1.join()   # 當前線程阻塞，等待t1線程執行完成
    print("This is main function at：%s" % time.time())

結果：

This is thread 1 running....at: 1564906617.43
This is thread 1 is finished! at: 1564906618.43
This is main function at：1564906618.43

若是不加任何限制，當主線程執行完畢以後，當前程序並不會結束，必須等到全部線程都結束以後才能結束當前進程。

將上面程序中的t1.join()去掉，執行結果以下：

This is thread 1 running....at: 1564906769.52
This is main function at：1564906769.52 This is thread 1 is finished! at: 1564906770.52

能夠經過將建立的線程指定爲守護線程（daemon），這樣主線程執行完畢以後會當即結束未執行完的線程，而後結束程序。

deamon守護線程

import threading
import time

def tstart(arg):
    print("%s running....at: %s" % (arg,time.time()))
    time.sleep(1)
    print("%s is finished! at: %s" % (arg,time.time()))

if __name__ == '__main__':
    t1 = threading.Thread(target=tstart, args=('This is thread 1',))
    t1.setDaemon(True)
    t1.start()
    # t1.join()   # 當前線程阻塞，等待t1線程執行完成
    print("This is main function at：%s" % time.time())

結果：

This is thread 1 running....at: 1564906847.85
This is main function at：1564906847.85

python多進程

相比較於threading模塊用於建立python多線程，python提供multiprocessing用於建立多進程。先看一下建立進程的兩種方式。

The multiprocessing package mostly replicates the API of the threading module.　　—— python doc

建立進程

建立進程的方式和建立線程的方式相似：

實例化一個multiprocessing.Process的對象，並傳入一個初始化函數對象（initial function )做爲新建進程執行入口；
繼承multiprocessing.Process，並重寫run函數；

方式1：

from multiprocessing import Process  
import os, time

def pstart(name):
    # time.sleep(0.1)
    print("Process name: %s, pid: %s "%(name, os.getpid()))

if __name__ == "__main__": 
    subproc = Process(target=pstart, args=('subprocess',))  
    subproc.start()  
    subproc.join()
    print("subprocess pid: %s"%subproc.pid)
    print("current process pid: %s" % os.getpid())

結果：

Process name: subprocess, pid: 4888 
subprocess pid: 4888
current process pid: 9912

方式2：

from multiprocessing import Process  
import os, time

class CustomProcess(Process):
    def __init__(self, p_name, target=None):
        # step 1: call base __init__ function()
        super(CustomProcess, self).__init__(name=p_name, target=target, args=(p_name,))

    def run(self):
        # step 2:
        # time.sleep(0.1)
        print("Custom Process name: %s, pid: %s "%(self.name, os.getpid()))

if __name__ == '__main__':
    p1 = CustomProcess("process_1")
    p1.start()
    p1.join()
    print("subprocess pid: %s"%p1.pid)
    print("current process pid: %s" % os.getpid())

這裏能夠思考一下，若是像多線程同樣，存在一個全局的變量share_data，不一樣進程同時訪問share_data會有問題嗎？

因爲每個進程擁有獨立的內存地址空間且互相隔離，所以不一樣進程看到的share_data是不一樣的、分別位於不一樣的地址空間，同時訪問不會有問題。這裏須要注意一下。

Subprocess模塊

既然說道了多進程，那就順便提一下另外一種建立進程的方式。

python提供了Sunprocess模塊能夠在程序執行過程當中，調用外部的程序。

如咱們能夠在python程序中打開記事本，打開cmd，或者在某個時間點關機:

>>> import subprocess
>>> subprocess.Popen(['cmd'])
<subprocess.Popen object at 0x0339F550>
>>> subprocess.Popen(['notepad'])
<subprocess.Popen object at 0x03262B70>
>>> subprocess.Popen(['shutdown', '-p'])

或者使用ping測試一下網絡連通性：

>>> res = subprocess.Popen(['ping', 'www.cnblogs.com'], stdout=subprocess.PIPE).communicate()[0]
>>> print res
正在 Ping www.cnblogs.com [101.37.113.127] 具備 32 字節的數據:

來自 101.37.113.127 的回覆: 字節=32 時間=1ms TTL=91
來自 101.37.113.127 的回覆: 字節=32 時間=1ms TTL=91
來自 101.37.113.127 的回覆: 字節=32 時間=1ms TTL=91
來自 101.37.113.127 的回覆: 字節=32 時間=1ms TTL=91

101.37.113.127 的 Ping 統計信息:
數據包: 已發送 = 4，已接收 = 4，丟失 = 0 (0% 丟失)，
往返行程的估計時間(以毫秒爲單位):
最短 = 1ms，最長 = 1ms，平均 = 1ms

python多線程與多進程比較

先來看兩個例子：

開啓兩個python線程分別作一億次加一操做，和單獨使用一個線程作一億次加一操做：

def tstart(arg):
    var = 0
    for i in xrange(100000000):
        var += 1

if __name__ == '__main__':
    t1 = threading.Thread(target=tstart, args=('This is thread 1',))
    t2 = threading.Thread(target=tstart, args=('This is thread 2',))
    start_time = time.time()
    t1.start()
    t2.start()
    t1.join()
    t2.join()
    print("Two thread cost time: %s" % (time.time() - start_time))
    start_time = time.time()
    tstart("This is thread 0")
    print("Main thread cost time: %s" % (time.time() - start_time))

結果：

Two thread cost time: 20.6570000648
Main thread cost time: 2.52800011635

上面的例子若是隻開啓t1和t2兩個線程中的一個，那麼運行時間和主線程基本一致。這個後面會解釋緣由。

使用兩個進程進行上面的操做：

def pstart(arg):
    var = 0
    for i in xrange(100000000):
        var += 1

if __name__ == '__main__':
    p1 = Process(target = pstart, args = ("1", ))
    p2 = Process(target = pstart, args = ("2", ))
    start_time = time.time()
    p1.start()
    p2.start()
    p1.join()
    p2.join()
    print("Two process cost time: %s" % (time.time() - start_time))
    start_time = time.time()
    pstart("0")
    print("Current process cost time: %s" % (time.time() - start_time))

結果：

Two process cost time: 2.91599988937
Current process cost time: 2.52400016785

對比分析

雙進程並行執行和單進程執行相同的運算代碼，耗時基本相同，雙進程耗時會稍微多一些，可能的緣由是進程建立和銷燬會進行系統調用，形成額外的時間開銷。

可是對於python線程，雙線程並行執行耗時比單線程要高的多，效率相差近10倍。若是將兩個並行線程改爲串行執行，即：

    t1.start()
    t1.join()
    t2.start()
    t2.join()
    #Two thread cost time: 5.12199997902
    #Main thread cost time: 2.54200005531

能夠看到三個線程串行執行，每個執行的時間基本相同。

本質緣由雙線程是併發執行的，而不是真正的並行執行。緣由就在於GIL鎖。

GIL鎖

提起python多線程就不得不提一下GIL(Global Interpreter Lock 全局解釋器鎖)，這是目前佔統治地位的python解釋器CPython中爲了保證數據安全所實現的一種鎖。無論進程中有多少線程，只有拿到了GIL鎖的線程才能夠在CPU上運行，即時是多核處理器。對一個進程而言，無論有多少線程，任一時刻，只會有一個線程在執行。對於CPU密集型的線程，其效率不只僅不高，反而有可能比較低。python多線程比較適用於IO密集型的程序。對於的確須要並行運行的程序，能夠考慮多進程。

多線程對鎖的爭奪，CPU對線程的調度，線程之間的切換等均會有時間開銷。

線程與進程區別

下面簡單的比較一下線程與進程

進程是資源分配的基本單位，線程是CPU執行和調度的基本單位；
通訊/同步方式：

進程：
- 通訊方式：管道，FIFO，消息隊列，信號，共享內存，socket，stream流；
- 同步方式：PV信號量，管程
線程：
- 同步方式：互斥鎖，遞歸鎖，條件變量，信號量
- 通訊方式：位於同一進程的線程共享進程資源，所以線程間沒有相似於進程間用於數據傳遞的通訊方式，線程間的通訊主要是用於線程同步。

CPU上真正執行的是線程，線程比進程輕量，其切換和調度代價比進程要小；
線程間對於共享的進程數據須要考慮線程安全問題，因爲進程之間是隔離的，擁有獨立的內存空間資源，相對比較安全，只能經過上面列出的IPC(Inter-Process Communication)進行數據傳輸；
系統有一個個進程組成，每一個進程包含代碼段、數據段、堆空間和棧空間，以及操做系統共享部分，有等待，就緒和運行三種狀態；
一個進程能夠包含多個線程，線程之間共享進程的資源（文件描述符、全局變量、堆空間等），寄存器變量和棧空間等是線程私有的；
操做系統中一個進程掛掉不會影響其餘進程，若是一個進程中的某個線程掛掉並且OS對線程的支持是多對一模型，那麼會致使當前進程掛掉；
若是CPU和系統支持多線程與多進程，多個進程並行執行的同時，每一個進程中的線程也能夠並行執行，這樣才能最大限度的榨取硬件的性能；