Python併發編程之線程池/進程池

時間 2019-12-14

原文原文鏈接

引言

Python標準庫爲咱們提供了threading和multiprocessing模塊編寫相應的多線程/多進程代碼，可是當項目達到必定的規模，頻繁建立/銷燬進程或者線程是很是消耗資源的，這個時候咱們就要編寫本身的線程池/進程池，以空間換時間。但從Python3.2開始，標準庫爲咱們提供了concurrent.futures模塊，它提供了ThreadPoolExecutor和ProcessPoolExecutor兩個類，實現了對threading和multiprocessing的進一步抽象，對編寫線程池/進程池提供了直接的支持。html

Executor和Future

concurrent.futures模塊的基礎是Exectuor，Executor是一個抽象類，它不能被直接使用。可是它提供的兩個子類ThreadPoolExecutor和ProcessPoolExecutor倒是很是有用，顧名思義二者分別被用來建立線程池和進程池的代碼。咱們能夠將相應的tasks直接放入線程池/進程池，不須要維護Queue來操心死鎖的問題，線程池/進程池會自動幫咱們調度。java

Future這個概念相信有java和nodejs下編程經驗的朋友確定不陌生了，你能夠把它理解爲一個在將來完成的操做，這是異步編程的基礎，傳統編程模式下好比咱們操做queue.get的時候，在等待返回結果以前會產生阻塞，cpu不能讓出來作其餘事情，而Future的引入幫助咱們在等待的這段時間能夠完成其餘的操做。關於在Python中進行異步IO能夠閱讀完本文以後參考個人Python併發編程之協程/異步IO。node

p.s: 若是你依然在堅守Python2.x，請先安裝futures模塊。python

pip install futures

使用submit來操做線程池/進程池

咱們先經過下面這段代碼來了解一下線程池的概念git

# example1.py
from concurrent.futures import ThreadPoolExecutor
import time

def return_future_result(message):
    time.sleep(2)
    return message

pool = ThreadPoolExecutor(max_workers=2)  # 建立一個最大可容納2個task的線程池

future1 = pool.submit(return_future_result, ("hello"))  # 往線程池裏面加入一個task
future2 = pool.submit(return_future_result, ("world"))  # 往線程池裏面加入一個task

print(future1.done())  # 判斷task1是否結束
time.sleep(3)
print(future2.done())  # 判斷task2是否結束

print(future1.result())  # 查看task1返回的結果
print(future2.result())  # 查看task2返回的結果

咱們根據運行結果來分析一下。咱們使用submit方法來往線程池中加入一個task，submit返回一個Future對象，對於Future對象能夠簡單地理解爲一個在將來完成的操做。在第一個print語句中很明顯由於time.sleep(2)的緣由咱們的future1沒有完成，由於咱們使用time.sleep(3)暫停了主線程，因此到第二個print語句的時候咱們線程池裏的任務都已經所有結束。github

ziwenxie :: ~ » python example1.py
False
True
hello
world

# 在上述程序執行的過程當中，經過ps命令咱們能夠看到三個線程同時在後臺運行
ziwenxie :: ~ » ps -eLf | grep python
ziwenxie      8361  7557  8361  3    3 19:45 pts/0    00:00:00 python example1.py
ziwenxie      8361  7557  8362  0    3 19:45 pts/0    00:00:00 python example1.py
ziwenxie      8361  7557  8363  0    3 19:45 pts/0    00:00:00 python example1.py

上面的代碼咱們也能夠改寫爲進程池形式，api和線程池一模一樣，我就不羅嗦了。編程

# example2.py
from concurrent.futures import ProcessPoolExecutor
import time

def return_future_result(message):
    time.sleep(2)
    return message

pool = ProcessPoolExecutor(max_workers=2)
future1 = pool.submit(return_future_result, ("hello"))
future2 = pool.submit(return_future_result, ("world"))

print(future1.done())
time.sleep(3)
print(future2.done())

print(future1.result())
print(future2.result())

下面是運行結果api

ziwenxie :: ~ » python example2.py
False
True
hello
world

ziwenxie :: ~ » ps -eLf | grep python
ziwenxie      8560  7557  8560  3    3 19:53 pts/0    00:00:00 python example2.py
ziwenxie      8560  7557  8563  0    3 19:53 pts/0    00:00:00 python example2.py
ziwenxie      8560  7557  8564  0    3 19:53 pts/0    00:00:00 python example2.py
ziwenxie      8561  8560  8561  0    1 19:53 pts/0    00:00:00 python example2.py
ziwenxie      8562  8560  8562  0    1 19:53 pts/0    00:00:00 python example2.py

使用map/wait來操做線程池/進程池

除了submit，Exectuor還爲咱們提供了map方法，和內建的map用法相似，下面咱們經過兩個例子來比較一下二者的區別。bash

使用submit操做回顧

# example3.py
import concurrent.futures
import urllib.request

URLS = ['http://httpbin.org', 'http://example.com/', 'https://api.github.com/']

def load_url(url, timeout):
    with urllib.request.urlopen(url, timeout=timeout) as conn:
        return conn.read()

# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
    # Start the load operations and mark each future with its URL
    future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            print('%r generated an exception: %s' % (url, exc))
        else:
            print('%r page is %d bytes' % (url, len(data)))

從運行結果能夠看出，as_completed不是按照URLS列表元素的順序返回的。多線程

ziwenxie :: ~ » python example3.py
'http://example.com/' page is 1270 byte
'https://api.github.com/' page is 2039 bytes
'http://httpbin.org' page is 12150 bytes

使用map

# example4.py
import concurrent.futures
import urllib.request

URLS = ['http://httpbin.org', 'http://example.com/', 'https://api.github.com/']

def load_url(url):
    with urllib.request.urlopen(url, timeout=60) as conn:
        return conn.read()

# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
    for url, data in zip(URLS, executor.map(load_url, URLS)):
        print('%r page is %d bytes' % (url, len(data)))

從運行結果能夠看出，map是按照URLS列表元素的順序返回的，而且寫出的代碼更加簡潔直觀，咱們能夠根據具體的需求任選一種。

ziwenxie :: ~ » python example4.py
'http://httpbin.org' page is 12150 bytes
'http://example.com/' page is 1270 bytes
'https://api.github.com/' page is 2039 bytes

第三種選擇wait

wait方法接會返回一個tuple(元組)，tuple中包含兩個set(集合)，一個是completed(已完成的)另一個是uncompleted(未完成的)。使用wait方法的一個優點就是得到更大的自由度，它接收三個參數FIRST_COMPLETED, FIRST_EXCEPTION和ALL_COMPLETE，默認設置爲ALL_COMPLETED。

咱們經過下面這個例子來看一下三個參數的區別

from concurrent.futures import ThreadPoolExecutor, wait, as_completed
from time import sleep
from random import randint

def return_after_random_secs(num):
    sleep(randint(1, 5))
    return "Return of {}".format(num)

pool = ThreadPoolExecutor(5)
futures = []
for x in range(5):
    futures.append(pool.submit(return_after_random_secs, x))

print(wait(futures))

# print(wait(futures, timeout=None, return_when='FIRST_COMPLETED'))

若是採用默認的ALL_COMPLETED，程序會阻塞直到線程池裏面的全部任務都完成。

ziwenxie :: ~ » python example5.py
DoneAndNotDoneFutures(done={
<Future at 0x7f0b06c9bc88 state=finished returned str>,
<Future at 0x7f0b06cbaa90 state=finished returned str>,
<Future at 0x7f0b06373898 state=finished returned str>,
<Future at 0x7f0b06352ba8 state=finished returned str>,
<Future at 0x7f0b06373b00 state=finished returned str>}, not_done=set())

若是採用FIRST_COMPLETED參數，程序並不會等到線程池裏面全部的任務都完成。

ziwenxie :: ~ » python example5.py
DoneAndNotDoneFutures(done={
<Future at 0x7f84109edb00 state=finished returned str>,
<Future at 0x7f840e2e9320 state=finished returned str>,
<Future at 0x7f840f25ccc0 state=finished returned str>},
not_done={<Future at 0x7f840e2e9ba8 state=running>,
<Future at 0x7f840e2e9940 state=running>})