Py西遊攻關之多進程(multiprocessing模塊)

時間 2019-11-24

標籤攻關之多進程 multiprocessing 模塊简体版

原文原文鏈接

多進程

`一多進程的概念`

multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency,effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine. It runs on both Unix and Windows.html

因爲GIL的存在，python中的多線程其實並非真正的多線程，若是想要充分地使用多核CPU的資源，在python中大部分狀況須要使用多進程。Python提供了很是好用的多進程包multiprocessing，只須要定義一個函數，Python會完成其餘全部事情。藉助這個包，能夠輕鬆完成從單進程到併發執行的轉換。multiprocessing支持子進程、通訊和共享數據、執行不一樣形式的同步，提供了Process、Queue、Pipe、Lock等組件。python

　　multiprocessing包是Python中的多進程管理包。與threading.Thread相似，它能夠利用multiprocessing.Process對象來建立一個進程。該進程能夠運行在Python程序內部編寫的函數。該Process對象與Thread對象的用法相同，也有start(), run(), join()的方法。此外multiprocessing包中也有Lock/Event/Semaphore/Condition類 (這些對象能夠像多線程那樣，經過參數傳遞給各個進程)，用以同步進程，其用法與threading包中的同名類一致。因此，multiprocessing的很大一部份與threading使用同一套API，只不過換到了多進程的情境。多線程

但在使用這些共享API的時候，咱們要注意如下幾點:併發

在UNIX平臺上，當某個進程終結以後，該進程須要被其父進程調用wait，不然進程成爲殭屍進程(Zombie)。因此，有必要對每一個Process對象調用join()方法 (實際上等同於wait)。對於多線程來講，因爲只有一個進程，因此不存在此必要性。
multiprocessing提供了threading包中沒有的IPC(好比Pipe和Queue)，效率上更高。應優先考慮Pipe和Queue，避免使用Lock/Event/Semaphore/Condition等同步方式 (由於它們佔據的不是用戶進程的資源)。
多進程應該避免共享資源。在多線程中，咱們能夠比較容易地共享資源，好比使用全局變量或者傳遞參數。在多進程狀況下，因爲每一個進程有本身獨立的內存空間，以上方法並不合適。此時咱們能夠經過共享內存和Manager的方法來共享資源。但這樣作提升了程序的複雜度，並由於同步的須要而下降了程序的效率。

Process.PID中保存有PID，若是進程尚未start()，則PID爲None。app

window系統下，須要注意的是要想啓動一個子進程，必須加上那句if __name__ == "main"，進程相關的要寫在這句下面。async

實例： ide

from multiprocessing import Process
import time
def f(name):
    time.sleep(1)
    print('hello', name,time.ctime())

if __name__ == '__main__':
    p_list=[]
    for i in range(3):
        p = Process(target=f, args=('alvin',))
        p_list.append(p)
        p.start()
    for i in p_list:
        p.join()
    print('end')

類式調用函數

from multiprocessing import Process
import time

class MyProcess(Process):
    def __init__(self):
        super(MyProcess, self).__init__()
        #self.name = name

    def run(self):
        time.sleep(1)
        print ('hello', self.name,time.ctime())


if __name__ == '__main__':
    p_list=[]
    for i in range(3):
        p = MyProcess()
        p.start()
        p_list.append(p)

    for p in p_list:
        p.join()

    print('end')

To show the individual process IDs involved, here is an expanded example:ui

from multiprocessing import Process
import os
import time
def info(title):
    print(title)
    print('module name:', __name__)
    print('parent process:', os.getppid())
    print('process id:', os.getpid())


def f(name):
    info('\033[31;1mfunction f\033[0m')
    print('hello', name)

if __name__ == '__main__':
    info('\033[32;1mmain process line\033[0m')
    time.sleep(100)
    p = Process(target=info, args=('bob',))
    p.start()
    p.join()

二 Process類

構造方法：this

Process([group [, target [, name [, args [, kwargs]]]]])

　　group: 線程組，目前尚未實現，庫引用中提示必須是None；
　　target: 要執行的方法；
　　name: 進程名；
　　args/kwargs: 要傳入方法的參數。

實例方法：

　　is_alive()：返回進程是否在運行。

　　join([timeout])：阻塞當前上下文環境的進程程，直到調用此方法的進程終止或到達指定的timeout（可選參數）。

　　start()：進程準備就緒，等待CPU調度

　　run()：strat()調用run方法，若是實例進程時未制定傳入target，這star執行t默認run()方法。

　　terminate()：無論任務是否完成，當即中止工做進程

屬性：

　　authkey

　　daemon：和線程的setDeamon功能同樣

　　exitcode(進程在運行時爲None、若是爲–N，表示被信號N結束）

　　name：進程名字。

　　pid：進程號。

import time
from  multiprocessing import Process

def foo(i):
    time.sleep(1)
    print (p.is_alive(),i,p.pid)
    time.sleep(1)

if __name__ == '__main__':
    p_list=[]
    for i in range(10):
        p = Process(target=foo, args=(i,))
        #p.daemon=True
        p_list.append(p)

    for p in p_list:
        p.start()
    # for p in p_list:
    #     p.join()

    print('main process end')

二進程間通信　

不一樣進程間內存是不共享的，要想實現兩個進程間的數據交換，能夠用如下方法：

Queues

使用方法跟threading裏的queue相似：

from multiprocessing import Process, Queue

def f(q,n):
    q.put([42, n, 'hello'])

if __name__ == '__main__':
    q = Queue()
    p_list=[]
    for i in range(3):
        p = Process(target=f, args=(q,i))
        p_list.append(p)
        p.start()
    print(q.get())
    print(q.get())
    print(q.get())
    for i in p_list:
            i.join()

Pipes

The Pipe() function returns a pair of connection objects connected by a pipe which by default is duplex (two-way). For example:

from multiprocessing import Process, Pipe
 
def f(conn):
    conn.send([42, None, 'hello'])
    conn.close()
 
if __name__ == '__main__':
    parent_conn, child_conn = Pipe()
    p = Process(target=f, args=(child_conn,))
    p.start()
    print(parent_conn.recv())   # prints "[42, None, 'hello']"
    p.join()

The two connection objects returned by Pipe() represent the two ends of the pipe. Each connection object has send() and recv() methods (among others). Note that data in a pipe may become corrupted if two processes (or threads) try to read from or write to the same end of the pipe at the same time. Of course there is no risk of corruption from processes using different ends of the pipe at the same time.

Managers

A manager object returned by Manager() controls a server process which holds Python objects and allows other processes to manipulate them using proxies.

A manager returned by Manager() will support types list, dict, Namespace, Lock, RLock, Semaphore, BoundedSemaphore, Condition, Event, Barrier, Queue, Value and Array. For example,

from multiprocessing import Process, Manager

def f(d, l,n):
    d[n] = '1'
    d['2'] = 2
    d[0.25] = None
    l.append(n)
    print(l)

if __name__ == '__main__':
    with Manager() as manager:
        d = manager.dict()

        l = manager.list(range(5))
        p_list = []
        for i in range(10):
            p = Process(target=f, args=(d, l,i))
            p.start()
            p_list.append(p)
        for res in p_list:
            res.join()

        print(d)
        print(l)

三進程同步

Without using the lock output from the different processes is liable to get all mixed up.

from multiprocessing import Process, Lock

def f(l, i):
    l.acquire()
    try:
        print('hello world', i)
    finally:
        l.release()

if __name__ == '__main__':
    lock = Lock()

    for num in range(10):
        Process(target=f, args=(lock, num)).start()

四進程池

進程池內部維護一個進程序列，當使用時，則去進程池中獲取一個進程，若是進程池序列中沒有可供使用的進進程，那麼程序就會等待，直到進程池中有可用進程爲止。

進程池中有兩個方法：

apply
apply_async

from  multiprocessing import Process,Pool
import time
 
def Foo(i):
    time.sleep(2)
    return i+100
 
def Bar(arg):
    print('-->exec done:',arg)
 
pool = Pool(5)
 
for i in range(10):
    pool.apply_async(func=Foo, args=(i,),callback=Bar)
    #pool.apply(func=Foo, args=(i,))
 
print('end')
pool.close()
pool.join()