併發編程---進程、線程、協程

時間 2020-06-20

標籤併發編程進程線程简体版

原文原文鏈接

在python程序中的進程操做

　　以前咱們已經瞭解了不少進程相關的理論知識，瞭解進程是什麼應該再也不困難了，剛剛咱們已經瞭解了，運行中的程序就是一個進程。全部的進程都是經過它的父進程來建立的。所以，運行起來的python程序也是一個進程，那麼咱們也能夠在程序中再建立進程。多個進程能夠實現併發效果，也就是說，當咱們的程序中存在多個進程的時候，在某些時候，就會讓程序的執行速度變快。以咱們以前所學的知識，並不能實現建立進程這個功能，因此咱們就須要藉助python中強大的模塊。html

Python多線程狀況下： - 計算密集型操做：效率低。（GIL鎖） - IO操做： 效率高 Python多進程的狀況下： - 計算密集型操做：效率高（浪費資源）。 不得已而爲之。 - IO操做： 效率高 （浪費資源）。 之後寫Python時： IO密集型用多線程： 文件/輸入輸出/socket網絡通訊 計算密集型用多進程。 擴展： Java多線程狀況下： - 計算密集型操做：效率高。 - IO操做： 效率高 Python多進程的狀況下： - 計算密集型操做：效率高（浪費資源）。 - IO操做： 效率高 浪費資源）。

python和java多線程使用對比

1、multiprocess模塊

仔細說來，multiprocess不是一個模塊而是python中一個操做、管理進程的包。之因此叫multi是取自multiple的多功能的意思,在這個包中幾乎包含了和進程有關的全部子模塊。因爲提供的子模塊很是多，爲了方便你們歸類記憶，我將這部分大體分爲四個部分：建立進程部分，進程同步部分，進程池部分，進程之間數據共享。java

1.1 multiprocess.process模塊

1.1.1 process模塊介紹

process模塊是一個建立進程的模塊，藉助這個模塊，就能夠完成進程的建立。python

Process([group [, target [, name [, args [, kwargs]]]]])，由該類實例化獲得的對象，表示一個子進程中的任務（還沒有啓動） 強調： 1. 須要使用關鍵字的方式來指定參數 2. args指定的爲傳給target函數的位置參數，是一個元組形式，必須有逗號 參數介紹： 1 group參數未使用，值始終爲None 2 target表示調用對象，即子進程要執行的任務 3 args表示調用對象的位置參數元組，args=(1,2,'egon',) 4 kwargs表示調用對象的字典,kwargs={'name':'egon','age':18} 5 name爲子進程的名稱

1 p.start()：啓動進程，並調用該子進程中的p.run() 2 p.run():進程啓動時運行的方法，正是它去調用target指定的函數，咱們自定義類的類中必定要實現該方法 3 p.terminate():強制終止進程p，不會進行任何清理操做，若是p建立了子進程，該子進程就成了殭屍進程，使用該方法須要特別當心這種狀況。若是p還保存了一個鎖那麼也將不會被釋放，進而致使死鎖 4 p.is_alive():若是p仍然運行，返回True 5 p.join([timeout]):主線程等待p終止（強調：是主線程處於等的狀態，而p是處於運行的狀態）。timeout是可選的超時時間，須要強調的是，p.join只能join住start開啓的進程，而不能join住run開啓的進程

方法介紹

1 p.daemon：默認值爲False，若是設爲True，表明p爲後臺運行的守護進程，當p的父進程終止時，p也隨之終止，而且設定爲True後，p不能建立本身的新進程，必須在p.start()以前設置 2 p.name:進程的名稱 3 p.pid：進程的pid 4 p.exitcode:進程在運行時爲None、若是爲–N，表示被信號N結束(瞭解便可) 5 p.authkey:進程的身份驗證鍵,默認是由os.urandom()隨機生成的32字符的字符串。這個鍵的用途是爲涉及網絡鏈接的底層進程間通訊提供安全性，這類鏈接只有在具備相同的身份驗證鍵時才能成功（瞭解便可）

屬性介紹

在Windows操做系統中因爲沒有fork(linux操做系統中建立進程的機制)，在建立子進程的時候會自動 import 啓動它的這個文件，而在 import 的時候又執行了整個文件。所以若是將process()直接寫在文件中就會無限遞歸建立子進程報錯。因此必須把建立子進程的部分使用if __name__ ==‘__main__’ 判斷保護起來，import 的時候  ，就不會遞歸運行了。

在windows中使用process模塊的注意事項

1.1.2 使用process模塊建立進程

在一個python進程中開啓子進程，start方法和併發效果。mysql

import time from multiprocessing import Process def f(name): print('hello',name) print('我是子進程') if __name__ == '__main__': p = Process(target=f,args=('bob',)) p.start() time.sleep(1) print('執行主進程的內容了')

在python中啓動的第一個子進程

import time from multiprocessing import Process def f(name): print('hello', name) time.sleep(1) print('我是子進程') if __name__ == '__main__': p = Process(target=f, args=('bob',)) p.start() #p.join()
    print('我是父進程')

join方法

import time,os from multiprocessing import Process def f(x): print('子進程id:',os.getpid()) return x*x if __name__ == '__main__': print('主進程id：',os.getpid()) p_list = [] for i in range(5): p = Process(target=f,args=(i,)) p.start()

查看主進程和子進程的id號

進階，多個進程同時運行（注意，子進程的執行順序不是根據啓動順序決定的）linux

import time,os from multiprocessing import Process def f(name): print('hello',name) if __name__ == '__main__': p_list = [] for i in range(5): p = Process(target=f,args=('boo',)) p.start() p_list.append(p)

多個進程同時運行

import time from multiprocessing import Process def f(name): print('hello', name) time.sleep(1) if __name__ == '__main__': p_lst = [] for i in range(5): p = Process(target=f, args=('bob',)) p.start() p_lst.append(p) p.join() # [p.join() for p in p_lst]
    print('父進程在執行')

多個進程同時運行，再談join方法(1)

import time from multiprocessing import Process def f(name): print('hello', name) time.sleep(1) if __name__ == '__main__': p_lst = [] for i in range(5): p = Process(target=f, args=('bob',)) p.start() p_lst.append(p) # [p.join() for p in p_lst]
    print('父進程在執行')

多個進程同時運行，再談join方法(2)

除了上面這些開啓進程的方法，還有一種以繼承Process類的形式開啓進程的方式git

import os from multiprocessing import Process class MyProcess(Process): def __init__(self,name): super().__init__() self.name=name def run(self): print(os.getpid()) print('%s 正在和女主播聊天' %self.name) p1=MyProcess('wupeiqi') p2=MyProcess('yuanhao') p3=MyProcess('nezha') p1.start() #start會自動調用run
p2.start() # p2.run()
p3.start() p1.join() p2.join() p3.join() print('主線程')

經過繼承Process類開啓進程

進程之間的數據隔離問題程序員

from multiprocessing import Process def work(): global n n=0 print('子進程內: ',n) if __name__ == '__main__': n = 100 p=Process(target=work) p.start() print('主進程內: ',n)

進程之間的數據隔離問題

1.2 守護進程

會隨着主進程的結束而結束。github

主進程建立守護進程sql

　　其一：守護進程會在主進程代碼執行結束後就終止數據庫

　　其二：守護進程內沒法再開啓子進程,不然拋出異常：AssertionError: daemonic processes are not allowed to have children

注意：進程之間是互相獨立的，主進程代碼運行結束，守護進程隨即終止

import os import time from multiprocessing import Process class Myprocess(Process): def __init__(self,person): super().__init__() self.person = person def run(self): print(os.getpid(),self.name) print('%s正在和女主播聊天' %self.person) p=Myprocess('哪吒') p.daemon=True #必定要在p.start()前設置,設置p爲守護進程,禁止p建立子進程,而且父進程代碼執行結束,p即終止運行
p.start() time.sleep(10) # 在sleep時查看進程id對應的進程ps -ef|grep id
print('主')

守護進程的啓動

from multiprocessing import Process def foo(): print(123) time.sleep(1) print("end123") def bar(): print(456) time.sleep(3) print("end456") p1=Process(target=foo) p2=Process(target=bar) p1.daemon=True p1.start() p2.start() time.sleep(0.1) print("main-------")#打印該行則主進程代碼結束,則守護進程p1應該被終止.#可能會有p1任務執行的打印信息123,由於主進程打印main----時,p1也執行了,可是隨即被終止.

主進程代碼執行結束守護進程當即結束

socket聊天併發實例：

from socket import *
from multiprocessing import Process server=socket(AF_INET,SOCK_STREAM) server.setsockopt(SOL_SOCKET,SO_REUSEADDR,1) server.bind(('127.0.0.1',8080)) server.listen(5) def talk(conn,client_addr): while True: try: msg=conn.recv(1024) if not msg:break conn.send(msg.upper()) except Exception: break

if __name__ == '__main__': #windows下start進程必定要寫到這下面
    while True: conn,client_addr=server.accept() p=Process(target=talk,args=(conn,client_addr)) p.start()

使用多進程實現socket聊天併發-server

from socket import * client=socket(AF_INET,SOCK_STREAM) client.connect(('127.0.0.1',8080)) while True: msg=input('>>: ').strip() if not msg:continue client.send(msg.encode('utf-8')) msg=client.recv(1024) print(msg.decode('utf-8'))

Client端

多進程中的其餘方法：

from multiprocessing import Process import time import random class Myprocess(Process): def __init__(self,person): self.name=person super().__init__() def run(self): print('%s正在和網紅聊天' %self.name) time.sleep(random.randrange(1,5)) print('%s還在和網紅聊天' %self.name) p1=Myprocess('哪吒') p1.start() p1.terminate()#關閉進程,不會當即關閉,因此is_alive馬上查看的結果可能仍是存活
print(p1.is_alive()) #結果爲True

print('開始') print(p1.is_alive()) #結果爲False

進程對象的其餘方法:terminate,is_alive

class Myprocess(Process): def __init__(self,person): self.name=person   # name屬性是Process中的屬性，標示進程的名字
        super().__init__() # 執行父類的初始化方法會覆蓋name屬性
        #self.name = person # 在這裏設置就能夠修改進程名字了
        #self.person = person #若是不想覆蓋進程名，就修改屬性名稱就能夠了
    def run(self): print('%s正在和網紅聊天' %self.name) # print('%s正在和網紅聊天' %self.person)
        time.sleep(random.randrange(1,5)) print('%s正在和網紅聊天' %self.name) # print('%s正在和網紅聊天' %self.person)
 p1=Myprocess('哪吒') p1.start() print(p1.pid)    #能夠查看子進程的進程id
 進程對象的其餘屬性:pid和name

進程對象的其餘屬性:pid和name

2、進程同步(multiprocess.Lock、multiprocess.Semaphore、multiprocess.Event)

2.1 鎖 —— multiprocess.Lock

經過剛剛的學習，咱們想方設法實現了程序的異步，讓多個任務能夠同時在幾個進程中併發處理，他們之間的運行沒有順序，一旦開啓也不受咱們控制。儘管併發編程讓咱們能更加充分的利用IO資源，可是也給咱們帶來了新的問題。

　　當多個進程使用同一份數據資源的時候，就會引起數據安全或順序混亂問題。

import os import time import random from multiprocessing import Process def work(n): print('%s: %s is running' %(n,os.getpid())) time.sleep(random.random()) print('%s:%s is done' %(n,os.getpid())) if __name__ == '__main__': for i in range(3): p=Process(target=work,args=(i,)) p.start()

多進程搶佔輸出資源

# 由併發變成了串行,犧牲了運行效率,但避免了競爭
import os import time import random from multiprocessing import Process,Lock def work(lock,n): lock.acquire() print('%s: %s is running' % (n, os.getpid())) time.sleep(random.random()) print('%s: %s is done' % (n, os.getpid())) lock.release() if __name__ == '__main__': lock=Lock() for i in range(3): p=Process(target=work,args=(lock,i)) p.start()

使用鎖維護執行順序

　　上面這種狀況雖然使用加鎖的形式實現了順序的執行，可是程序又從新變成串行了，這樣確實會浪費了時間，卻保證了數據的安全。

　　接下來，咱們以模擬搶票爲例，來看看數據安全的重要性。

#文件db的內容爲：{"count":1} #注意必定要用雙引號，否則json沒法識別 #併發運行，效率高，但競爭寫同一文件，數據寫入錯亂
from multiprocessing import Process,Lock import time,json,random def search(): dic=json.load(open('db')) print('\033[43m剩餘票數%s\033[0m' %dic['count']) def get(): dic=json.load(open('db')) time.sleep(0.1) #模擬讀數據的網絡延遲
    if dic['count'] >0: dic['count']-=1 time.sleep(0.2) #模擬寫數據的網絡延遲
        json.dump(dic,open('db','w')) print('\033[43m購票成功\033[0m') def task(): search() get() if __name__ == '__main__': for i in range(100): #模擬併發100個客戶端搶票
        p=Process(target=task) p.start()

多進程同時搶購餘票

#文件db的內容爲：{"count":5} #注意必定要用雙引號，否則json沒法識別 #併發運行，效率高，但競爭寫同一文件，數據寫入錯亂
from multiprocessing import Process,Lock import time,json,random def search(): dic=json.load(open('db')) print('\033[43m剩餘票數%s\033[0m' %dic['count']) def get(): dic=json.load(open('db')) time.sleep(random.random()) #模擬讀數據的網絡延遲
    if dic['count'] >0: dic['count']-=1 time.sleep(random.random()) #模擬寫數據的網絡延遲
        json.dump(dic,open('db','w')) print('\033[32m購票成功\033[0m') else: print('\033[31m購票失敗\033[0m') def task(lock): search() lock.acquire() get() lock.release() if __name__ == '__main__': lock = Lock() for i in range(100): #模擬併發100個客戶端搶票
        p=Process(target=task,args=(lock,)) p.start()

使用鎖來保證數據安全

#加鎖能夠保證多個進程修改同一塊數據時，同一時間只能有一個任務能夠進行修改，即串行的修改，沒錯，速度是慢了，但犧牲了速度卻保證了數據安全。
雖然能夠用文件共享數據實現進程間通訊，但問題是： 1.效率低（共享數據基於文件，而文件是硬盤上的數據） 2.須要本身加鎖處理 #所以咱們最好找尋一種解決方案可以兼顧：一、效率高（多個進程共享一塊內存的數據）二、幫咱們處理好鎖問題。
這就是mutiprocessing模塊爲咱們提供的基於消息的IPC通訊機制：隊列和管道。
隊列和管道都是將數據存放於內存中 隊列又是基於（管道+鎖）實現的，可讓咱們從複雜的鎖問題中解脫出來， 咱們應該儘可能避免使用共享數據，儘量使用消息傳遞和隊列，避免處理複雜的同步和鎖問題，並且在進程數目增多時，每每能夠得到更好的可獲展性。

2.2 信號量 —— multiprocess.Semaphore（瞭解）

互斥鎖同時只容許一個線程更改數據，而信號量Semaphore是同時容許必定數量的線程更改數據 。 假設商場裏有4個迷你唱吧，因此同時能夠進去4我的，若是來了第五我的就要在外面等待，等到有人出來才能再進去玩。 實現： 信號量同步基於內部計數器，每調用一次acquire()，計數器減1；每調用一次release()，計數器加1.當計數器爲0時，acquire()調用被阻塞。這是迪科斯徹（Dijkstra）信號量概念P()和V()的Python實現。信號量同步機制適用於訪問像服務器這樣的有限資源。 信號量與進程池的概念很像，可是要區分開，信號量涉及到加鎖的概念

信號量介紹Semaphore

from multiprocessing import Process,Semaphore import time,random def go_ktv(sem,user): sem.acquire() print('%s 佔到一間ktv小屋' %user) time.sleep(random.randint(0,3)) #模擬每一個人在ktv中待的時間不一樣
 sem.release() if __name__ == '__main__': sem=Semaphore(4) p_l=[] for i in range(13): p=Process(target=go_ktv,args=(sem,'user%s' %i,)) p.start() p_l.append(p) for i in p_l: i.join() print('============》')

例子

2.3 事件 —— multiprocess.Event（瞭解）

python線程的事件用於主線程控制其餘線程的執行，事件主要提供了三個方法 set、wait、clear。 事件處理的機制：全局定義了一個「Flag」，若是「Flag」值爲 False，那麼當程序執行 event.wait 方法時就會阻塞，若是「Flag」值爲True，那麼event.wait 方法時便再也不阻塞。 clear：將「Flag」設置爲False set：將「Flag」設置爲True

事件介紹

from multiprocessing import Process, Event import time, random def car(e, n): while True: if not e.is_set():  # 進程剛開啓，is_set()的值是Flase，模擬信號燈爲紅色
            print('\033[31m紅燈亮\033[0m，car%s等着' % n) e.wait() # 阻塞，等待is_set()的值變成True，模擬信號燈爲綠色
            print('\033[32m車%s 看見綠燈亮了\033[0m' % n) time.sleep(random.randint(3, 6)) if not e.is_set():   #若是is_set()的值是Flase，也就是紅燈，仍然回到while語句開始
                continue
            print('車開遠了,car', n) break


def police_car(e, n): while True: if not e.is_set():# 進程剛開啓，is_set()的值是Flase，模擬信號燈爲紅色
            print('\033[31m紅燈亮\033[0m，car%s等着' % n) e.wait(0.1) # 阻塞，等待設置等待時間，等待0.1s以後沒有等到綠燈就闖紅燈走了
            if not e.is_set(): print('\033[33m紅燈,警車先走\033[0m，car %s' % n) else: print('\033[33;46m綠燈，警車走\033[0m，car %s' % n) break



def traffic_lights(e, inverval): while True: time.sleep(inverval) if e.is_set(): print('######', e.is_set()) e.clear() # ---->將is_set()的值設置爲False
        else: e.set() # ---->將is_set()的值設置爲True
            print('***********',e.is_set()) if __name__ == '__main__': e = Event() for i in range(10): p=Process(target=car,args=(e,i,))  # 建立是個進程控制10輛車
 p.start() for i in range(5): p = Process(target=police_car, args=(e, i,))  # 建立5個進程控制5輛警車
 p.start() t = Process(target=traffic_lights, args=(e, 10))  # 建立一個進程控制紅綠燈
 t.start() print('============》')

紅綠燈實例

3、進程間通訊——隊列和管道（multiprocess.Queue、multiprocess.Pipe）

3.1 進程間通訊

3.1.1 IPC(Inter-Process Communication)

隊列

概念介紹

建立共享的進程隊列，Queue是多進程安全的隊列，可使用Queue實現多進程之間的數據傳遞。

Queue([maxsize]) 建立共享的進程隊列。 參數 ：maxsize是隊列中容許的最大項數。若是省略此參數，則無大小限制。 底層隊列使用管道和鎖定實現。

Queue([maxsize]) 建立共享的進程隊列。maxsize是隊列中容許的最大項數。若是省略此參數，則無大小限制。底層隊列使用管道和鎖定實現。另外，還須要運行支持線程以便隊列中的數據傳輸到底層管道中。 Queue的實例q具備如下方法： q.get( [ block [ ,timeout ] ] ) 返回q中的一個項目。若是q爲空，此方法將阻塞，直到隊列中有項目可用爲止。block用於控制阻塞行爲，默認爲True. 若是設置爲False，將引起Queue.Empty異常（定義在Queue模塊中）。timeout是可選超時時間，用在阻塞模式中。若是在制定的時間間隔內沒有項目變爲可用，將引起Queue.Empty異常。 q.get_nowait( ) 同q.get(False)方法。 q.put(item [, block [,timeout ] ] ) 將item放入隊列。若是隊列已滿，此方法將阻塞至有空間可用爲止。block控制阻塞行爲，默認爲True。若是設置爲False，將引起Queue.Empty異常（定義在Queue庫模塊中）。timeout指定在阻塞模式中等待可用空間的時間長短。超時後將引起Queue.Full異常。 q.qsize() 返回隊列中目前項目的正確數量。此函數的結果並不可靠，由於在返回結果和在稍後程序中使用結果之間，隊列中可能添加或刪除了項目。在某些系統上，此方法可能引起NotImplementedError異常。 q.empty() 若是調用此方法時 q爲空，返回True。若是其餘進程或線程正在往隊列中添加項目，結果是不可靠的。也就是說，在返回和使用結果之間，隊列中可能已經加入新的項目。 q.full() 若是q已滿，返回爲True. 因爲線程的存在，結果也多是不可靠的（參考q.empty（）方法）。。

方法介紹

q.close() 關閉隊列，防止隊列中加入更多數據。調用此方法時，後臺線程將繼續寫入那些已入隊列但還沒有寫入的數據，但將在此方法完成時立刻關閉。若是q被垃圾收集，將自動調用此方法。關閉隊列不會在隊列使用者中生成任何類型的數據結束信號或異常。例如，若是某個使用者正被阻塞在get（）操做上，關閉生產者中的隊列不會致使get（）方法返回錯誤。 q.cancel_join_thread() 不會再進程退出時自動鏈接後臺線程。這能夠防止join_thread()方法阻塞。 q.join_thread() 鏈接隊列的後臺線程。此方法用於在調用q.close()方法後，等待全部隊列項被消耗。默認狀況下，此方法由不是q的原始建立者的全部進程調用。調用q.cancel_join_thread()方法能夠禁止這種行爲。

其餘方法（瞭解）

代碼實現：

''' multiprocessing模塊支持進程間通訊的兩種主要形式:管道和隊列 都是基於消息傳遞實現的,可是隊列接口 '''

from multiprocessing import Queue q=Queue(3) #put ,get ,put_nowait,get_nowait,full,empty
q.put(3) q.put(3) q.put(3) # q.put(3) # 若是隊列已經滿了，程序就會停在這裏，等待數據被別人取走，再將數據放入隊列。
           # 若是隊列中的數據一直不被取走，程序就會永遠停在這裏。
try: q.put_nowait(3) # 可使用put_nowait，若是隊列滿了不會阻塞，可是會由於隊列滿了而報錯。
except: # 所以咱們能夠用一個try語句來處理這個錯誤。這樣程序不會一直阻塞下去，可是會丟掉這個消息。
    print('隊列已經滿了') # 所以，咱們再放入數據以前，能夠先看一下隊列的狀態，若是已經滿了，就不繼續put了。
print(q.full()) #滿了

print(q.get()) print(q.get()) print(q.get()) # print(q.get()) # 同put方法同樣，若是隊列已經空了，那麼繼續取就會出現阻塞。
try: q.get_nowait(3) # 可使用get_nowait，若是隊列滿了不會阻塞，可是會由於沒取到值而報錯。
except: # 所以咱們能夠用一個try語句來處理這個錯誤。這樣程序不會一直阻塞下去。
    print('隊列已經空了') print(q.empty()) #空了

單看隊列用法

上面這個例子尚未加入進程通訊，只是先來看看隊列爲咱們提供的方法，以及這些方法的使用和現象。

import time from multiprocessing import Process, Queue def f(q): q.put([time.asctime(), 'from Jacob', 'hello'])  #調用主函數中p進程傳遞過來的進程參數 put函數爲向隊列中添加一條數據。

if __name__ == '__main__': q = Queue() #建立一個Queue對象
    p = Process(target=f, args=(q,)) #建立一個進程
 p.start() print(q.get()) p.join()

子進程發送數據給父進程

上面是一個queue的簡單應用，使用隊列q對象調用get函數來取得隊列中最早進入的數據。接下來看一個稍微複雜一些的例子：

import os import time import multiprocessing # 向queue中輸入數據的函數
def inputQ(queue): info = str(os.getpid()) + '(put):' + str(time.asctime()) queue.put(info) # 向queue中輸出數據的函數
def outputQ(queue): info = queue.get() print ('%s%s\033[32m%s\033[0m'%(str(os.getpid()), '(get):',info)) # Main
if __name__ == '__main__': multiprocessing.freeze_support() record1 = []   # store input processes
    record2 = []   # store output processes
    queue = multiprocessing.Queue(3) # 輸入進程
    for i in range(10): process = multiprocessing.Process(target=inputQ,args=(queue,)) process.start() record1.append(process) # 輸出進程
    for i in range(10): process = multiprocessing.Process(target=outputQ,args=(queue,)) process.start() record2.append(process) for p in record1: p.join() for p in record2: p.join()

批量生產數據放入隊列再批量獲取結果 x

3.1.2 生產者消費者模型

在併發編程中使用生產者和消費者模式可以解決絕大多數併發問題。該模式經過平衡生產線程和消費線程的工做能力來提升程序的總體處理數據的速度。

爲何要使用生產者和消費者模式

在線程世界裏，生產者就是生產數據的線程，消費者就是消費數據的線程。在多線程開發當中，若是生產者處理速度很快，而消費者處理速度很慢，那麼生產者就必須等待消費者處理完，才能繼續生產數據。一樣的道理，若是消費者的處理能力大於生產者，那麼消費者就必須等待生產者。爲了解決這個問題因而引入了生產者和消費者模式。

什麼是生產者消費者模式

生產者消費者模式是經過一個容器來解決生產者和消費者的強耦合問題。生產者和消費者彼此之間不直接通信，而經過阻塞隊列來進行通信，因此生產者生產完數據以後不用等待消費者處理，直接扔給阻塞隊列，消費者不找生產者要數據，而是直接從阻塞隊列裏取，阻塞隊列就至關於一個緩衝區，平衡了生產者和消費者的處理能力。

基於隊列實現生產者消費者模型

from multiprocessing import Process,Queue import time,random,os def consumer(q): while True: res=q.get() time.sleep(random.randint(1,3)) print('\033[45m%s 吃 %s\033[0m' %(os.getpid(),res)) def producer(q): for i in range(10): time.sleep(random.randint(1,3)) res='包子%s' %i q.put(res) print('\033[44m%s 生產了 %s\033[0m' %(os.getpid(),res)) if __name__ == '__main__': q=Queue() #生產者們:即廚師們
    p1=Process(target=producer,args=(q,)) #消費者們:即吃貨們
    c1=Process(target=consumer,args=(q,)) #開始
 p1.start() c1.start() print('主')

基於隊列實現

此時的問題是主進程永遠不會結束，緣由是：生產者p在生產完後就結束了，可是消費者c在取空了q以後，則一直處於死循環中且卡在q.get()這一步。

解決方式無非是讓生產者在生產完畢後，往隊列中再發一個結束信號，這樣消費者在接收到結束信號後就能夠break出死循環。

from multiprocessing import Process,Queue import time,random,os def consumer(q): while True: res=q.get() if res is None:break #收到結束信號則結束
        time.sleep(random.randint(1,3)) print('\033[45m%s 吃 %s\033[0m' %(os.getpid(),res)) def producer(q): for i in range(10): time.sleep(random.randint(1,3)) res='包子%s' %i q.put(res) print('\033[44m%s 生產了 %s\033[0m' %(os.getpid(),res)) q.put(None) #發送結束信號
if __name__ == '__main__': q=Queue() #生產者們:即廚師們
    p1=Process(target=producer,args=(q,)) #消費者們:即吃貨們
    c1=Process(target=consumer,args=(q,)) #開始
 p1.start() c1.start() print('主')

改良版——生產者消費者模型

注意：結束信號None，不必定要由生產者發，主進程裏一樣能夠發，但主進程須要等生產者結束後才應該發送該信號

from multiprocessing import Process,Queue import time,random,os def consumer(q): while True: res=q.get() if res is None:break #收到結束信號則結束
        time.sleep(random.randint(1,3)) print('\033[45m%s 吃 %s\033[0m' %(os.getpid(),res)) def producer(q): for i in range(2): time.sleep(random.randint(1,3)) res='包子%s' %i q.put(res) print('\033[44m%s 生產了 %s\033[0m' %(os.getpid(),res)) if __name__ == '__main__': q=Queue() #生產者們:即廚師們
    p1=Process(target=producer,args=(q,)) #消費者們:即吃貨們
    c1=Process(target=consumer,args=(q,)) #開始
 p1.start() c1.start() p1.join() q.put(None) #發送結束信號
    print('主')

主進程在生產者生產完畢後發送結束信號None

但上述解決方式，在有多個生產者和多個消費者時，咱們則須要用一個很low的方式去解決

from multiprocessing import Process,Queue import time,random,os def consumer(q): while True: res=q.get() if res is None:break #收到結束信號則結束
        time.sleep(random.randint(1,3)) print('\033[45m%s 吃 %s\033[0m' %(os.getpid(),res)) def producer(name,q): for i in range(2): time.sleep(random.randint(1,3)) res='%s%s' %(name,i) q.put(res) print('\033[44m%s 生產了 %s\033[0m' %(os.getpid(),res)) if __name__ == '__main__': q=Queue() #生產者們:即廚師們
    p1=Process(target=producer,args=('包子',q)) p2=Process(target=producer,args=('骨頭',q)) p3=Process(target=producer,args=('泔水',q)) #消費者們:即吃貨們
    c1=Process(target=consumer,args=(q,)) c2=Process(target=consumer,args=(q,)) #開始
 p1.start() p2.start() p3.start() c1.start() p1.join() #必須保證生產者所有生產完畢,才應該發送結束信號
 p2.join() p3.join() q.put(None) #有幾個消費者就應該發送幾回結束信號None
    q.put(None) #發送結束信號
    print('主')

多個消費者的例子：有幾個消費者就須要發送幾回結束信號

JoinableQueue([maxsize])
建立可鏈接的共享進程隊列。這就像是一個Queue對象，但隊列容許項目的使用者通知生產者項目已經被成功處理。通知進程是使用共享的信號和條件變量來實現的。

JoinableQueue的實例p除了與Queue對象相同的方法以外，還具備如下方法： q.task_done() 使用者使用此方法發出信號，表示q.get()返回的項目已經被處理。若是調用此方法的次數大於從隊列中刪除的項目數量，將引起ValueError異常。 q.join() 生產者將使用此方法進行阻塞，直到隊列中全部項目均被處理。阻塞將持續到爲隊列中的每一個項目均調用q.task_done()方法爲止。 下面的例子說明如何創建永遠運行的進程，使用和處理隊列上的項目。生產者將項目放入隊列，並等待它們被處理。

方法介紹

rom multiprocessing import Process,JoinableQueue import time,random,os def consumer(q): while True: res=q.get() time.sleep(random.randint(1,3)) print('\033[45m%s 吃 %s\033[0m' %(os.getpid(),res)) q.task_done() #向q.join()發送一次信號,證實一個數據已經被取走了

def producer(name,q): for i in range(10): time.sleep(random.randint(1,3)) res='%s%s' %(name,i) q.put(res) print('\033[44m%s 生產了 %s\033[0m' %(os.getpid(),res)) q.join() #生產完畢，使用此方法進行阻塞，直到隊列中全部項目均被處理。


if __name__ == '__main__': q=JoinableQueue() #生產者們:即廚師們
    p1=Process(target=producer,args=('包子',q)) p2=Process(target=producer,args=('骨頭',q)) p3=Process(target=producer,args=('泔水',q)) #消費者們:即吃貨們
    c1=Process(target=consumer,args=(q,)) c2=Process(target=consumer,args=(q,)) c1.daemon=True c2.daemon=True #開始
    p_l=[p1,p2,p3,c1,c2] for p in p_l: p.start() p1.join() p2.join() p3.join() print('主') #主進程等--->p1,p2,p3等---->c1,c2
    #p1,p2,p3結束了,證實c1,c2確定全都收完了p1,p2,p3發到隊列的數據
    #於是c1,c2也沒有存在的價值了,不須要繼續阻塞在進程中影響主進程了。應該隨着主進程的結束而結束,因此設置成守護進程就能夠了。

JoinableQueue隊列實現消費之生產者模型

3.1.3 管道(瞭解)

#建立管道的類：
Pipe([duplex]):在進程之間建立一條管道，並返回元組（conn1,conn2）,其中conn1，conn2表示管道兩端的鏈接對象，強調一點：必須在產生Process對象以前產生管道 #參數介紹：
dumplex:默認管道是全雙工的，若是將duplex射成False，conn1只能用於接收，conn2只能用於發送。 #主要方法：
 conn1.recv():接收conn2.send(obj)發送的對象。若是沒有消息可接收，recv方法會一直阻塞。若是鏈接的另一端已經關閉，那麼recv方法會拋出EOFError。 conn1.send(obj):經過鏈接發送對象。obj是與序列化兼容的任意對象 #其餘方法：
conn1.close():關閉鏈接。若是conn1被垃圾回收，將自動調用此方法 conn1.fileno():返回鏈接使用的整數文件描述符 conn1.poll([timeout]):若是鏈接上的數據可用，返回True。timeout指定等待的最長時限。若是省略此參數，方法將當即返回結果。若是將timeout射成None，操做將無限期地等待數據到達。 conn1.recv_bytes([maxlength]):接收c.send_bytes()方法發送的一條完整的字節消息。maxlength指定要接收的最大字節數。若是進入的消息，超過了這個最大值，將引起IOError異常，而且在鏈接上沒法進行進一步讀取。若是鏈接的另一端已經關閉，不再存在任何數據，將引起EOFError異常。 conn.send_bytes(buffer [, offset [, size]])：經過鏈接發送字節數據緩衝區，buffer是支持緩衝區接口的任意對象，offset是緩衝區中的字節偏移量，而size是要發送字節數。結果數據以單條消息的形式發出，而後調用c.recv_bytes()函數進行接收 conn1.recv_bytes_into(buffer [, offset]):接收一條完整的字節消息，並把它保存在buffer對象中，該對象支持可寫入的緩衝區接口（即bytearray對象或相似的對象）。offset指定緩衝區中放置消息處的字節位移。返回值是收到的字節數。若是消息長度大於可用的緩衝區空間，將引起BufferTooShort異常。

介紹

from multiprocessing import Process, Pipe def f(conn): conn.send("Hello The_Third_Wave") conn.close() if __name__ == '__main__': parent_conn, child_conn = Pipe() p = Process(target=f, args=(child_conn,)) p.start() print(parent_conn.recv()) p.join()

pipe初使用

應該特別注意管道端點的正確管理問題。若是是生產者或消費者中都沒有使用管道的某個端點，就應將它關閉。這也說明了爲什麼在生產者中關閉了管道的輸出端，在消費者中關閉管道的輸入端。若是忘記執行這些步驟，程序可能在消費者中的recv（）操做上掛起。管道是由操做系統進行引用計數的，必須在全部進程中關閉管道後才能生成EOFError異常。所以，在生產者中關閉管道不會有任何效果，除非消費者也關閉了相同的管道端點。

from multiprocessing import Process, Pipe def f(parent_conn,child_conn): #parent_conn.close() #不寫close將不會引起EOFError
    while True: try: print(child_conn.recv()) except EOFError: child_conn.close() if __name__ == '__main__': parent_conn, child_conn = Pipe() p = Process(target=f, args=(parent_conn,child_conn,)) p.start() child_conn.close() parent_conn.send('hello') parent_conn.close() p.join()

引起EOFError

from multiprocessing import Process,Pipe def consumer(p,name): produce, consume=p produce.close() while True: try: baozi=consume.recv() print('%s 收到包子:%s' %(name,baozi)) except EOFError: break

def producer(seq,p): produce, consume=p consume.close() for i in seq: produce.send(i) if __name__ == '__main__': produce,consume=Pipe() c1=Process(target=consumer,args=((produce,consume),'c1')) c1.start() seq=(i for i in range(10)) producer(seq,(produce,consume)) produce.close() consume.close() c1.join() print('主進程')

pipe實現生產者消費者模型

from multiprocessing import Process,Pipe,Lock def consumer(p,name,lock): produce, consume=p produce.close() while True: lock.acquire() baozi=consume.recv() lock.release() if baozi: print('%s 收到包子:%s' %(name,baozi)) else: consume.close() break


def producer(p,n): produce, consume=p consume.close() for i in range(n): produce.send(i) produce.send(None) produce.send(None) produce.close() if __name__ == '__main__': produce,consume=Pipe() lock = Lock() c1=Process(target=consumer,args=((produce,consume),'c1',lock)) c2=Process(target=consumer,args=((produce,consume),'c2',lock)) p1=Process(target=producer,args=((produce,consume),10)) c1.start() c2.start() p1.start() produce.close() consume.close() c1.join() c2.join() p1.join() print('主進程')

多個消費之之間的競爭問題帶來的數據不安全問題

4、進程之間的數據共享

展望將來，基於消息傳遞的併發編程是大勢所趨

即使是使用線程，推薦作法也是將程序設計爲大量獨立的線程集合，經過消息隊列交換數據。

這樣極大地減小了對使用鎖定和其餘同步手段的需求，還能夠擴展到分佈式系統中。

但進程間應該儘可能避免通訊，即使須要通訊，也應該選擇進程安全的工具來避免加鎖帶來的問題。

之後咱們會嘗試使用數據庫來解決如今進程之間的數據共享問題。

進程間數據是獨立的，能夠藉助於隊列或管道實現通訊，兩者都是基於消息傳遞的 雖然進程間數據獨立，但能夠經過Manager實現數據共享，事實上Manager的功能遠不止於此 A manager object returned by Manager() controls a server process which holds Python objects and allows other processes to manipulate them using proxies. A manager returned by Manager() will support types list, dict, Namespace, Lock, RLock, Semaphore, BoundedSemaphore, Condition, Event, Barrier, Queue, Value and Array.

Manager模塊

from multiprocessing import Manager,Process,Lock def work(d,lock): with lock: #不加鎖而操做共享的數據,確定會出現數據錯亂
        d['count']-=1

if __name__ == '__main__': lock=Lock() with Manager() as m: dic=m.dict({'count':100}) p_l=[] for i in range(100): p=Process(target=work,args=(dic,lock)) p_l.append(p) p.start() for p in p_l: p.join() print(dic)

Manager例子

4.1 進程池和multiprocess.Pool模塊

4.1.1 進程池

爲何要有進程池?進程池的概念。

在程序實際處理問題過程當中，忙時會有成千上萬的任務須要被執行，閒時可能只有零星任務。那麼在成千上萬個任務須要被執行的時候，咱們就須要去建立成千上萬個進程麼？首先，建立進程須要消耗時間，銷燬進程也須要消耗時間。第二即使開啓了成千上萬的進程，操做系統也不能讓他們同時執行，這樣反而會影響程序的效率。所以咱們不能無限制的根據任務開啓或者結束進程。那麼咱們要怎麼作呢？

在這裏，要給你們介紹一個進程池的概念，定義一個池子，在裏面放上固定數量的進程，有需求來了，就拿一個池中的進程來處理任務，等處處理完畢，進程並不關閉，而是將進程再放回進程池中繼續等待任務。若是有不少任務須要執行，池中的進程數量不夠，任務就要等待以前的進程執行任務完畢歸來，拿到空閒進程才能繼續執行。也就是說，池中進程的數量是固定的，那麼同一時間最多有固定數量的進程在運行。這樣不會增長操做系統的調度難度，還節省了開閉進程的時間，也必定程度上可以實現併發效果。

4.1.2 multiprocess.Pool模塊

概念介紹：

Pool([numprocess  [,initializer [, initargs]]]):建立進程池

numprocess:要建立的進程數，若是省略，將默認使用cpu_count()的值 initializer：是每一個工做進程啓動時要執行的可調用對象，默認爲None initargs：是要傳給initializer的參數組

參數介紹

p.apply(func [, args [, kwargs]]):在一個池工做進程中執行func(*args,**kwargs),而後返回結果。 '''須要強調的是：此操做並不會在全部池工做進程中並執行func函數。若是要經過不一樣參數併發地執行func函數，必須從不一樣線程調用p.apply()函數或者使用p.apply_async()''' p.apply_async(func [, args [, kwargs]]):在一個池工做進程中執行func(*args,**kwargs),而後返回結果。 '''此方法的結果是AsyncResult類的實例，callback是可調用對象，接收輸入參數。當func的結果變爲可用時，將理解傳遞給callback。callback禁止執行任何阻塞操做，不然將接收其餘異步操做中的結果。''' p.close():關閉進程池，防止進一步操做。若是全部操做持續掛起，它們將在工做進程終止前完成 P.jion():等待全部工做進程退出。此方法只能在close（）或teminate()以後調用

主要方法

方法apply_async()和map_async（）的返回值是AsyncResul的實例obj。實例具備如下方法 obj.get():返回結果，若是有必要則等待結果到達。timeout是可選的。若是在指定時間內尚未到達，將引起一場。若是遠程操做中引起了異常，它將在調用此方法時再次被引起。 obj.ready():若是調用完成，返回True obj.successful():若是調用完成且沒有引起異常，返回True，若是在結果就緒以前調用此方法，引起異常 obj.wait([timeout]):等待結果變爲可用。 obj.terminate()：當即終止全部工做進程，同時不執行任何清理或結束任何掛起工做。若是p被垃圾回收，將自動調用此函數

其餘方法（瞭解）

代碼實例

同步和異步

import os,time from multiprocessing import Pool def work(n): print('%s run' %os.getpid()) time.sleep(3) return n**2

if __name__ == '__main__': p=Pool(3) #進程池中從無到有建立三個進程,之後一直是這三個進程在執行任務
    res_l=[] for i in range(10): res=p.apply(work,args=(i,)) # 同步調用，直到本次任務執行完畢拿到res，等待任務work執行的過程當中可能有阻塞也可能沒有阻塞
                                    # 但無論該任務是否存在阻塞，同步調用都會在原地等着
    print(res_l)

進程池的同步調用

import os import time import random from multiprocessing import Pool def work(n): print('%s run' %os.getpid()) time.sleep(random.random()) return n**2

if __name__ == '__main__': p=Pool(3) #進程池中從無到有建立三個進程,之後一直是這三個進程在執行任務
    res_l=[] for i in range(10): res=p.apply_async(work,args=(i,)) # 異步運行，根據進程池中有的進程數，每次最多3個子進程在異步執行
                                          # 返回結果以後，將結果放入列表，歸還進程，以後再執行新的任務
                                          # 須要注意的是，進程池中的三個進程不會同時開啓或者同時結束
                                          # 而是執行完一個就釋放一個進程，這個進程就去接收新的任務。 
 res_l.append(res) # 異步apply_async用法：若是使用異步提交的任務，主進程須要使用jion，等待進程池內任務都處理完，而後能夠用get收集結果
    # 不然，主進程結束，進程池可能還沒來得及執行，也就跟着一塊兒結束了
 p.close() p.join() for res in res_l: print(res.get()) #使用get來獲取apply_aync的結果,若是是apply,則沒有get方法,由於apply是同步執行,馬上獲取結果,也根本無需get

進程池的異步調用

練習：

#Pool內的進程數默認是cpu核數，假設爲4（查看方法os.cpu_count()） #開啓6個客戶端，會發現2個客戶端處於等待狀態 #在每一個進程內查看pid，會發現pid使用爲4個，即多個客戶端公用4個進程
from socket import *
from multiprocessing import Pool import os server=socket(AF_INET,SOCK_STREAM) server.setsockopt(SOL_SOCKET,SO_REUSEADDR,1) server.bind(('127.0.0.1',8080)) server.listen(5) def talk(conn): print('進程pid: %s' %os.getpid()) while True: try: msg=conn.recv(1024) if not msg:break conn.send(msg.upper()) except Exception: break

if __name__ == '__main__': p=Pool(4) while True: conn,*_=server.accept() p.apply_async(talk,args=(conn,)) # p.apply(talk,args=(conn,client_addr)) #同步的話，則同一時間只有一個客戶端能訪問

server：進程池版socket併發聊天

from socket import * client=socket(AF_INET,SOCK_STREAM) client.connect(('127.0.0.1',8080)) while True: msg=input('>>: ').strip() if not msg:continue client.send(msg.encode('utf-8')) msg=client.recv(1024) print(msg.decode('utf-8'))

client

發現：併發開啓多個客戶端，服務端同一時間只有4個不一樣的pid，只能結束一個客戶端，另一個客戶端纔會進來.

回調函數

須要回調函數的場景：進程池中任何一個任務一旦處理完了，就當即告知主進程：我好了額，你能夠處理個人結果了。主進程則調用一個函數去處理該結果，該函數即回調函數 咱們能夠把耗時間（阻塞）的任務放到進程池中，而後指定回調函數（主進程負責執行），這樣主進程在執行回調函數時就省去了I/O的過程，直接拿到的是任務的結果。

from multiprocessing import Pool import requests import json import os def get_page(url): print('<進程%s> get %s' %(os.getpid(),url)) respone=requests.get(url) if respone.status_code == 200: return {'url':url,'text':respone.text} def pasrse_page(res): print('<進程%s> parse %s' %(os.getpid(),res['url'])) parse_res='url:<%s> size:[%s]\n' %(res['url'],len(res['text'])) with open('db.txt','a') as f: f.write(parse_res) if __name__ == '__main__': urls=[ 'https://www.baidu.com', 'https://www.python.org', 'https://www.openstack.org', 'https://help.github.com/', 'http://www.sina.com.cn/' ] p=Pool(3) res_l=[] for url in urls: res=p.apply_async(get_page,args=(url,),callback=pasrse_page) res_l.append(res) p.close() p.join() print([res.get() for res in res_l]) #拿到的是get_page的結果,其實徹底不必拿該結果,該結果已經傳給回調函數處理了

''' 打印結果: <進程3388> get https://www.baidu.com <進程3389> get https://www.python.org <進程3390> get https://www.openstack.org <進程3388> get https://help.github.com/ <進程3387> parse https://www.baidu.com <進程3389> get http://www.sina.com.cn/ <進程3387> parse https://www.python.org <進程3387> parse https://help.github.com/ <進程3387> parse http://www.sina.com.cn/ <進程3387> parse https://www.openstack.org [{'url': 'https://www.baidu.com', 'text': '<!DOCTYPE html>\r\n...',...}] '''

使用多進程請求多個url來減小網絡等待浪費的時間

import re from urllib.request import urlopen from multiprocessing import Pool def get_page(url,pattern): response=urlopen(url).read().decode('utf-8') return pattern,response def parse_page(info): pattern,page_content=info res=re.findall(pattern,page_content) for item in res: dic={ 'index':item[0].strip(), 'title':item[1].strip(), 'actor':item[2].strip(), 'time':item[3].strip(), } print(dic) if __name__ == '__main__': regex = r'<dd>.*?<.*?class="board-index.*?>(\d+)</i>.*?title="(.*?)".*?class="movie-item-info".*?<p class="star">(.*?)</p>.*?<p class="releasetime">(.*?)</p>' pattern1=re.compile(regex,re.S) url_dic={ 'http://maoyan.com/board/7':pattern1, } p=Pool() res_l=[] for url,pattern in url_dic.items(): res=p.apply_async(get_page,args=(url,pattern),callback=parse_page) res_l.append(res) for i in res_l: i.get()

爬蟲實例

若是在主進程中等待進程池中全部任務都執行完畢後，再統一處理結果，則無需回調函數

from multiprocessing import Pool import time,random,os def work(n): time.sleep(1) return n**2
if __name__ == '__main__': p=Pool() res_l=[] for i in range(10): res=p.apply_async(work,args=(i,)) res_l.append(res) p.close() p.join() #等待進程池中全部進程執行完畢
 nums=[] for res in res_l: nums.append(res.get()) #拿到全部結果
    print(nums) #主進程拿到全部的處理結果,能夠在主進程中進行統一進行處理

無需回調函數

進程池的其餘實現方式：https://docs.python.org/dev/library/concurrent.futures.html

在python程序中的線程操做

1、理論知識

1.1 全局解釋器鎖

Python代碼的執行由Python虛擬機(也叫解釋器主循環)來控制。Python在設計之初就考慮到要在主循環中，同時只有一個線程在執行。雖然 Python 解釋器中能夠「運行」多個線程，但在任意時刻只有一個線程在解釋器中運行。
　　對Python虛擬機的訪問由全局解釋器鎖(GIL)來控制，正是這個鎖能保證同一時刻只有一個線程在運行。

　　在多線程環境中，Python 虛擬機按如下方式執行：

　　a、設置 GIL；

　　b、切換到一個線程去運行；

　　c、運行指定數量的字節碼指令或者線程主動讓出控制(能夠調用 time.sleep(0))；

　　d、把線程設置爲睡眠狀態；

　　e、解鎖 GIL；

　　d、再次重複以上全部步驟。
　　在調用外部代碼(如 C/C++擴展函數)的時候，GIL將會被鎖定，直到這個函數結束爲止(因爲在這期間沒有Python的字節碼被運行，因此不會作線程切換)編寫擴展的程序員能夠主動解鎖GIL。

1.2 python線程模塊的選擇

Python提供了幾個用於多線程編程的模塊，包括thread、threading和Queue等。thread和threading模塊容許程序員建立和管理線程。thread模塊提供了基本的線程和鎖的支持，threading提供了更高級別、功能更強的線程管理的功能。Queue模塊容許用戶建立一個能夠用於多個線程之間共享數據的隊列數據結構。
　　避免使用thread模塊，由於更高級別的threading模塊更爲先進，對線程的支持更爲完善，並且使用thread模塊裏的屬性有可能會與threading出現衝突；其次低級別的thread模塊的同步原語不多(實際上只有一個)，而threading模塊則有不少；再者，thread模塊中當主線程結束時，全部的線程都會被強制結束掉，沒有警告也不會有正常的清除工做，至少threading模塊能確保重要的子線程退出後進程才退出。

　　thread模塊不支持守護線程，當主線程退出時，全部的子線程不論它們是否還在工做，都會被強行退出。而threading模塊支持守護線程，守護線程通常是一個等待客戶請求的服務器，若是沒有客戶提出請求它就在那等着，若是設定一個線程爲守護線程，就表示這個線程是不重要的，在進程退出的時候，不用等待這個線程退出。

2、threading模塊

multiprocess模塊的徹底模仿了threading模塊的接口，兩者在使用層面，有很大的類似性，於是再也不詳細介紹。

2.1 線程的建立Threading.Thread類

2.1.1 線程的建立

from threading import Thread import time def synfi(name): time.sleep(2) print('%s say hello'%name) if __name__ == '__main__': t = Thread(target=synfi,args=('Mr Xiong',)) t.start() print('主線程')

建立線程的方式一

from threading import Thread import time class Sayhi(Thread): def __init__(self,name): super().__init__() self.name = name def run(self): time.sleep(2) print('%s say hello' %self.name) if __name__ == '__main__': t = Sayhi('Mr Xiong') t.start() print('主線程')

建立線程的方式二

2.1.2 多線程與多進程

from threading import Thread from multiprocessing import Process import os def work(): print('hello',os.getpid()) if __name__ == '__main__': #part1:在主進程下開啓多個線程,每一個線程都跟主進程的pid同樣
    t1=Thread(target=work) t2=Thread(target=work) t1.start() t2.start() print('主線程/主進程pid',os.getpid()) #part2:開多個進程,每一個進程都有不一樣的pid
    p1=Process(target=work) p2=Process(target=work) p1.start() p2.start() print('主線程/主進程pid',os.getpid())

pid的比較

from threading import Thread from multiprocessing import Process import os def work(): print('hello') if __name__ == '__main__': #在主進程下開啓線程
    t=Thread(target=work) t.start() print('主線程/主進程') ''' 打印結果: hello 主線程/主進程 '''

    #在主進程下開啓子進程
    t=Process(target=work) t.start() print('主線程/主進程') ''' 打印結果: 主線程/主進程 hello '''

開啓效率的比較

from  threading import Thread from multiprocessing import Process import os def work(): global n n=0 if __name__ == '__main__': # n=100
    # p=Process(target=work)
    # p.start()
    # p.join()
    # print('主',n) #毫無疑問子進程p已經將本身的全局的n改爲了0,但改的僅僅是它本身的,查看父進程的n仍然爲100
 n=1 t=Thread(target=work) t.start() t.join() print('主',n) #查看結果爲0,由於同一進程內的線程之間共享進程內的數據
同一進程內的線程共享該進程的數據？

內存數據的共享問題

2.1.3 練習：多線程實現socket

#_*_coding:utf-8_*_ #!/usr/bin/env python
import multiprocessing import threading import socket s=socket.socket(socket.AF_INET,socket.SOCK_STREAM) s.bind(('127.0.0.1',8080)) s.listen(5) def action(conn): while True: data=conn.recv(1024) print(data) conn.send(data.upper()) if __name__ == '__main__': while True: conn,addr=s.accept() p=threading.Thread(target=action,args=(conn,)) p.start()

server

#_*_coding:utf-8_*_ #!/usr/bin/env python


import socket s=socket.socket(socket.AF_INET,socket.SOCK_STREAM) s.connect(('127.0.0.1',8080)) while True: msg=input('>>: ').strip() if not msg:continue s.send(msg.encode('utf-8')) data=s.recv(1024) print(data)

client

2.1.4 Thread類的其餘方法

Thread實例對象的方法 # isAlive(): 返回線程是否活動的。
  # getName(): 返回線程名。
  # setName(): 設置線程名。
 threading模塊提供的一些方法： # threading.currentThread(): 返回當前的線程變量。
  # threading.enumerate(): 返回一個包含正在運行的線程的list。正在運行指線程啓動後、結束前，不包括啓動前和終止後的線程。
  # threading.activeCount(): 返回正在運行的線程數量，與len(threading.enumerate())有相同的結果。

from threading import Thread import threading from multiprocessing import Process import os def work(): import time time.sleep(3) print(threading.current_thread().getName()) if __name__ == '__main__': #在主進程下開啓線程
    t=Thread(target=work) t.start() print(threading.current_thread().getName()) print(threading.current_thread()) #主線程
    print(threading.enumerate()) #連同主線程在內有兩個運行的線程
    print(threading.active_count()) print('主線程/主進程') ''' 打印結果: MainThread <_MainThread(MainThread, started 140735268892672)> [<_MainThread(MainThread, started 140735268892672)>, <Thread(Thread-1, started 123145307557888)>] 主線程/主進程 Thread-1 '''

代碼示例

from threading import Thread import time def sayhi(name): time.sleep(2) print('%s say hello' %name) if __name__ == '__main__': t=Thread(target=sayhi,args=('egon',)) t.start() t.join() print('主線程') print(t.is_alive()) ''' egon say hello 主線程 False '''

join方法

2.1.5 守護線程

不管是進程仍是線程，都遵循：守護xx會等待主xx運行完畢後被銷燬。須要強調的是：運行完畢並不是終止運行

#1.對主進程來講，運行完畢指的是主進程代碼運行完畢 #2.對主線程來講，運行完畢指的是主線程所在的進程內全部非守護線程通通運行完畢，主線程纔算運行完畢

#1 主進程在其代碼結束後就已經算運行完畢了（守護進程在此時就被回收）,而後主進程會一直等非守護的子進程都運行完畢後回收子進程的資源(不然會產生殭屍進程)，纔會結束， #2 主線程在其餘非守護線程運行完畢後纔算運行完畢（守護線程在此時就被回收）。由於主線程的結束意味着進程的結束，進程總體的資源都將被回收，而進程必須保證非守護線程都運行完畢後才能結束。

詳細解釋

from threading import Thread import time def sayhi(name): time.sleep(2) print('%s say hello' %name) if __name__ == '__main__': t=Thread(target=sayhi,args=('egon',)) t.setDaemon(True) #必須在t.start()以前設置
 t.start() print('主線程') print(t.is_alive()) ''' 主線程 True '''

守護線程例1

from threading import Thread import time def foo(): print(123) time.sleep(1) print("end123") def bar(): print(456) time.sleep(3) print("end456") t1=Thread(target=foo) t2=Thread(target=bar) t1.daemon=True t1.start() t2.start() print("main-------")

守護線程2

3、鎖

3.1 鎖與GIL

3.2 同步鎖

from threading import Thread import os,time def work(): global n temp=n time.sleep(0.1) n=temp-1
if __name__ == '__main__': n=100 l=[] for i in range(100): p=Thread(target=work) l.append(p) p.start() for p in l: p.join() print(n) #結果可能爲99

多個線程搶佔資源的狀況

import threading R=threading.Lock() R.acquire() ''' 對公共數據的操做 ''' R.release()

from threading import Thread,Lock import os,time def work(): global n lock.acquire() temp=n time.sleep(0.1) n=temp-1 lock.release() if __name__ == '__main__': lock=Lock() n=100 l=[] for i in range(100): p=Thread(target=work) l.append(p) p.start() for p in l: p.join() print(n) #結果確定爲0，由原來的併發執行變成串行，犧牲了執行效率保證了數據安全

同步鎖的引用

#不加鎖:併發執行,速度快,數據不安全
from threading import current_thread,Thread,Lock import os,time def task(): global n print('%s is running' %current_thread().getName()) temp=n time.sleep(0.5) n=temp-1


if __name__ == '__main__': n=100 lock=Lock() threads=[] start_time=time.time() for i in range(100): t=Thread(target=task) threads.append(t) t.start() for t in threads: t.join() stop_time=time.time() print('主:%s n:%s' %(stop_time-start_time,n)) ''' Thread-1 is running Thread-2 is running ...... Thread-100 is running 主:0.5216062068939209 n:99 '''


#不加鎖:未加鎖部分併發執行,加鎖部分串行執行,速度慢,數據安全
from threading import current_thread,Thread,Lock import os,time def task(): #未加鎖的代碼併發運行
    time.sleep(3) print('%s start to run' %current_thread().getName()) global n #加鎖的代碼串行運行
 lock.acquire() temp=n time.sleep(0.5) n=temp-1 lock.release() if __name__ == '__main__': n=100 lock=Lock() threads=[] start_time=time.time() for i in range(100): t=Thread(target=task) threads.append(t) t.start() for t in threads: t.join() stop_time=time.time() print('主:%s n:%s' %(stop_time-start_time,n)) ''' Thread-1 is running Thread-2 is running ...... Thread-100 is running 主:53.294203758239746 n:0 '''

#有的同窗可能有疑問:既然加鎖會讓運行變成串行,那麼我在start以後當即使用join,就不用加鎖了啊,也是串行的效果啊 #沒錯:在start以後馬上使用jion,確定會將100個任務的執行變成串行,毫無疑問,最終n的結果也確定是0,是安全的,但問題是 #start後當即join:任務內的全部代碼都是串行執行的,而加鎖,只是加鎖的部分即修改共享數據的部分是串行的 #單從保證數據安全方面,兩者均可以實現,但很明顯是加鎖的效率更高.
from threading import current_thread,Thread,Lock import os,time def task(): time.sleep(3) print('%s start to run' %current_thread().getName()) global n temp=n time.sleep(0.5) n=temp-1


if __name__ == '__main__': n=100 lock=Lock() start_time=time.time() for i in range(100): t=Thread(target=task) t.start() t.join() stop_time=time.time() print('主:%s n:%s' %(stop_time-start_time,n)) ''' Thread-1 start to run Thread-2 start to run ...... Thread-100 start to run 主:350.6937336921692 n:0 #耗時是多麼的恐怖 ''' ）

互斥鎖與join的區別

3.3 死鎖與遞歸鎖

線程也有死鎖與遞歸鎖，在進程那裏忘記說了，放到這裏一切說了額

所謂死鎖：是指兩個或兩個以上的進程或線程在執行過程當中，因爭奪資源而形成的一種互相等待的現象，若無外力做用，它們都將沒法推動下去。此時稱系統處於死鎖狀態或系統產生了死鎖，這些永遠在互相等待的進程稱爲死鎖進程，以下就是死鎖

from threading import Lock as Lock import time mutexA=Lock() mutexA.acquire() mutexA.acquire() print(123) mutexA.release() mutexA.release()

死鎖

解決方法，遞歸鎖，在Python中爲了支持在同一線程中屢次請求同一資源，python提供了可重入鎖RLock。

這個RLock內部維護着一個Lock和一個counter變量，counter記錄了acquire的次數，從而使得資源能夠被屢次require。直到一個線程全部的acquire都被release，其餘的線程才能得到資源。上面的例子若是使用RLock代替Lock，則不會發生死鎖：

from threading import RLock as Lock import time mutexA=Lock() mutexA.acquire() mutexA.acquire() print(123) mutexA.release() mutexA.release()

遞歸鎖RLock

典型問題：科學家吃麪

import time from threading import Thread,Lock noodle_lock = Lock() fork_lock = Lock() def eat1(name): noodle_lock.acquire() print('%s 搶到了麪條'%name) fork_lock.acquire() print('%s 搶到了叉子'%name) print('%s 吃麪'%name) fork_lock.release() noodle_lock.release() def eat2(name): fork_lock.acquire() print('%s 搶到了叉子' % name) time.sleep(1) noodle_lock.acquire() print('%s 搶到了麪條' % name) print('%s 吃麪' % name) noodle_lock.release() fork_lock.release() for name in ['哪吒','egon','yuan']: t1 = Thread(target=eat1,args=(name,)) t2 = Thread(target=eat2,args=(name,)) t1.start() t2.start()

死鎖問題

import time from threading import Thread,RLock fork_lock = noodle_lock = RLock() def eat1(name): noodle_lock.acquire() print('%s 搶到了麪條'%name) fork_lock.acquire() print('%s 搶到了叉子'%name) print('%s 吃麪'%name) fork_lock.release() noodle_lock.release() def eat2(name): fork_lock.acquire() print('%s 搶到了叉子' % name) time.sleep(1) noodle_lock.acquire() print('%s 搶到了麪條' % name) print('%s 吃麪' % name) noodle_lock.release() fork_lock.release() for name in ['哪吒','egon','yuan']: t1 = Thread(target=eat1,args=(name,)) t2 = Thread(target=eat2,args=(name,)) t1.start() t2.start()

遞歸鎖解決死鎖問題

4、信號量

同進程的同樣

Semaphore管理一個內置的計數器，
每當調用acquire()時內置計數器-1；
調用release() 時內置計數器+1；
計數器不能小於0；當計數器爲0時，acquire()將阻塞線程直到其餘線程調用release()。

實例：(同時只有5個線程能夠得到semaphore,便可以限制最大鏈接數爲5)：

from threading import Thread,Semaphore import threading import time # def func(): # if sm.acquire(): # print (threading.currentThread().getName() + ' get semaphore') # time.sleep(2) # sm.release()
def func(): sm.acquire() print('%s get sm' %threading.current_thread().getName()) time.sleep(3) sm.release() if __name__ == '__main__': sm=Semaphore(5) for i in range(23): t=Thread(target=func) t.start()

實例

與進程池是徹底不一樣的概念，進程池Pool(4)，最大隻能產生4個進程，並且從頭至尾都只是這四個進程，不會產生新的，而信號量是產生一堆線程/進程

池與信號量

5、事件

同進程的同樣

線程的一個關鍵特性是每一個線程都是獨立運行且狀態不可預測。若是程序中的其他線程須要經過判斷某個線程的狀態來肯定本身下一步的操做,這時線程同步問題就會變得很是棘手。爲了解決這些問題,咱們須要使用threading庫中的Event對象。對象包含一個可由線程設置的信號標誌,它容許線程等待某些事件的發生。在初始狀況下,Event對象中的信號標誌被設置爲假。若是有線程等待一個Event對象, 而這個Event對象的標誌爲假,那麼這個線程將會被一直阻塞直至該標誌爲真。一個線程若是將一個Event對象的信號標誌設置爲真,它將喚醒全部等待這個Event對象的線程。若是一個線程等待一個已經被設置爲真的Event對象,那麼它將忽略這個事件, 繼續執行

event.isSet()：返回event的狀態值； event.wait()：若是 event.isSet()==False將阻塞線程； event.set()： 設置event的狀態值爲True，全部阻塞池的線程激活進入就緒狀態， 等待操做系統調度； event.clear()：恢復event的狀態值爲False。

例如，有多個工做線程嘗試連接MySQL，咱們想要在連接前確保MySQL服務正常才讓那些工做線程去鏈接MySQL服務器，若是鏈接不成功，都會去嘗試從新鏈接。那麼咱們就能夠採用threading.Event機制來協調各個工做線程的鏈接操做

import threading import time,random from threading import Thread,Event def conn_mysql(): count=1
    while not event.is_set(): if count > 3: raise TimeoutError('連接超時') print('<%s>第%s次嘗試連接' % (threading.current_thread().getName(), count)) event.wait(0.5) count+=1
    print('<%s>連接成功' %threading.current_thread().getName()) def check_mysql(): print('\033[45m[%s]正在檢查mysql\033[0m' % threading.current_thread().getName()) time.sleep(random.randint(2,4)) event.set() if __name__ == '__main__': event=Event() conn1=Thread(target=conn_mysql) conn2=Thread(target=conn_mysql) check=Thread(target=check_mysql) conn1.start() conn2.start() check.start()

實例

6、條件

使得線程等待，只有知足某條件時，才釋放n個線程

Python提供的Condition對象提供了對複雜線程同步問題的支持。Condition被稱爲條件變量，除了提供與Lock相似的acquire和release方法外，還提供了wait和notify方法。線程首先acquire一個條件變量，而後判斷一些條件。若是條件不知足則wait；若是條件知足，進行一些處理改變條件後，經過notify方法通知其餘線程，其餘處於wait狀態的線程接到通知後會從新判斷條件。不斷的重複這一過程，從而解決複雜的同步問題。

詳細說明

代碼說明：

import threading def run(n): con.acquire() con.wait() print("run the thread: %s" % n) con.release() if __name__ == '__main__': con = threading.Condition() for i in range(10): t = threading.Thread(target=run, args=(i,)) t.start() while True: inp = input('>>>') if inp == 'q': break con.acquire() con.notify(int(inp)) con.release() print('****')

實例

7、定時器

定時器，指定n秒後執行某個操做

from threading import Timer def hello(): print("hello, world") t = Timer(1, hello) t.start() # after 1 seconds, "hello, world" will be printed

8、線程隊列

queue隊列：使用import queue，用法與進程Queue同樣

queue is especially useful in threaded programming when information must be exchanged safely between multiple threads.

class queue.Queue(maxsize=0) #先進先出

import queue q=queue.Queue() q.put('first') q.put('second') q.put('third') print(q.get()) print(q.get()) print(q.get()) ''' 結果(先進先出): first second third '''

先進先出

class queue.LifoQueue(maxsize=0) #last in fisrt out

import queue q=queue.LifoQueue() q.put('first') q.put('second') q.put('third') print(q.get()) print(q.get()) print(q.get()) ''' 結果(後進先出): third second first '''

後進先出

class queue.PriorityQueue(maxsize=0) #存儲數據時可設置優先級的隊列

import queue q=queue.PriorityQueue() #put進入一個元組,元組的第一個元素是優先級(一般是數字,也能夠是非數字之間的比較),數字越小優先級越高
q.put((20,'a')) q.put((10,'b')) q.put((30,'c')) print(q.get()) print(q.get()) print(q.get()) ''' 結果(數字越小優先級越高,優先級高的優先出隊): (10, 'b') (20, 'a') (30, 'c') '''

優先級隊列

Constructor for a priority queue. maxsize is an integer that sets the upperbound limit on the number of items that can be placed in the queue. Insertion will block once this size has been reached, until queue items are consumed. If maxsize is less than or equal to zero, the queue size is infinite. The lowest valued entries are retrieved first (the lowest valued entry is the one returned by sorted(list(entries))[0]). A typical pattern for entries is a tuple in the form: (priority_number, data). exception queue.Empty Exception raised when non-blocking get() (or get_nowait()) is called on a Queue object which is empty. exception queue.Full Exception raised when non-blocking put() (or put_nowait()) is called on a Queue object which is full. Queue.qsize() Queue.empty() #return True if empty 
Queue.full() # return True if full 
Queue.put(item, block=True, timeout=None) Put item into the queue. If optional args block is true and timeout is None (the default), block if necessary until a free slot is available. If timeout is a positive number, it blocks at most timeout seconds and raises the Full exception if no free slot was available within that time. Otherwise (block is false), put an item on the queue if a free slot is immediately available, else raise the Full exception (timeout is ignored in that case). Queue.put_nowait(item) Equivalent to put(item, False). Queue.get(block=True, timeout=None) Remove and return an item from the queue. If optional args block is true and timeout is None (the default), block if necessary until an item is available. If timeout is a positive number, it blocks at most timeout seconds and raises the Empty exception if no item was available within that time. Otherwise (block is false), return an item if one is immediately available, else raise the Empty exception (timeout is ignored in that case). Queue.get_nowait() Equivalent to get(False). Two methods are offered to support tracking whether enqueued tasks have been fully processed by daemon consumer threads. Queue.task_done() Indicate that a formerly enqueued task is complete. Used by queue consumer threads. For each get() used to fetch a task, a subsequent call to task_done() tells the queue that the processing on the task is complete. If a join() is currently blocking, it will resume when all items have been processed (meaning that a task_done() call was received for every item that had been put() into the queue). Raises a ValueError if called more times than there were items placed in the queue. Queue.join() block直到queue被消費完畢

更多方法說明

9、Python標準模塊--concurrent.futures

https://docs.python.org/dev/library/concurrent.futures.html

#1 介紹
concurrent.futures模塊提供了高度封裝的異步調用接口 ThreadPoolExecutor：線程池，提供異步調用 ProcessPoolExecutor: 進程池，提供異步調用 Both implement the same interface, which is defined by the abstract Executor class. #2 基本方法 #submit(fn, *args, **kwargs)
異步提交任務 #map(func, *iterables, timeout=None, chunksize=1) 
取代for循環submit的操做 #shutdown(wait=True) 
至關於進程池的pool.close()+pool.join()操做 wait=True，等待池內全部任務執行完畢回收完資源後才繼續 wait=False，當即返回，並不會等待池內的任務執行完畢 但無論wait參數爲什麼值，整個程序都會等到全部任務執行完畢 submit和map必須在shutdown以前 #result(timeout=None)
取得結果 #add_done_callback(fn)
回調函數

#介紹
The ProcessPoolExecutor class is an Executor subclass that uses a pool of processes to execute calls asynchronously. ProcessPoolExecutor uses the multiprocessing module, which allows it to side-step the Global Interpreter Lock but also means that only picklable objects can be executed and returned. class concurrent.futures.ProcessPoolExecutor(max_workers=None, mp_context=None) An Executor subclass that executes calls asynchronously using a pool of at most max_workers processes. If max_workers is None or not given, it will default to the number of processors on the machine. If max_workers is lower or equal to 0, then a ValueError will be raised. #用法
from concurrent.futures import ThreadPoolExecutor,ProcessPoolExecutor import os,time,random def task(n): print('%s is runing' %os.getpid()) time.sleep(random.randint(1,3)) return n**2

if __name__ == '__main__': executor=ProcessPoolExecutor(max_workers=3) futures=[] for i in range(11): future=executor.submit(task,i) futures.append(future) executor.shutdown(True) print('+++>') for future in futures: print(future.result())

ProcessPoolExecutor

#介紹
ThreadPoolExecutor is an Executor subclass that uses a pool of threads to execute calls asynchronously. class concurrent.futures.ThreadPoolExecutor(max_workers=None, thread_name_prefix='') An Executor subclass that uses a pool of at most max_workers threads to execute calls asynchronously. Changed in version 3.5: If max_workers is None or not given, it will default to the number of processors on the machine, multiplied by 5, assuming that ThreadPoolExecutor is often used to overlap I/O instead of CPU work and the number of workers should be higher than the number of workers for ProcessPoolExecutor. New in version 3.6: The thread_name_prefix argument was added to allow users to control the threading.Thread names for worker threads created by the pool for easier debugging. #用法
與ProcessPoolExecutor相同

ThreadPoolExecutor

from concurrent.futures import ThreadPoolExecutor,ProcessPoolExecutor import os,time,random def task(n): print('%s is runing' %os.getpid()) time.sleep(random.randint(1,3)) return n**2

if __name__ == '__main__': executor=ThreadPoolExecutor(max_workers=3) # for i in range(11):
    # future=executor.submit(task,i)
 executor.map(task,range(1,12)) #map取代了for+submit

map的用法

from concurrent.futures import ThreadPoolExecutor,ProcessPoolExecutor from multiprocessing import Pool import requests import json import os def get_page(url): print('<進程%s> get %s' %(os.getpid(),url)) respone=requests.get(url) if respone.status_code == 200: return {'url':url,'text':respone.text} def parse_page(res): res=res.result() print('<進程%s> parse %s' %(os.getpid(),res['url'])) parse_res='url:<%s> size:[%s]\n' %(res['url'],len(res['text'])) with open('db.txt','a') as f: f.write(parse_res) if __name__ == '__main__': urls=[ 'https://www.baidu.com', 'https://www.python.org', 'https://www.openstack.org', 'https://help.github.com/', 'http://www.sina.com.cn/' ] # p=Pool(3)
    # for url in urls:
    # p.apply_async(get_page,args=(url,),callback=pasrse_page)
    # p.close()
    # p.join()
 p=ProcessPoolExecutor(3) for url in urls: p.submit(get_page,url).add_done_callback(parse_page) #parse_page拿到的是一個future對象obj，須要用obj.result()拿到結果

回調函數

在python程序中的協程操做

1、協程介紹

協程：是單線程下的併發，又稱微線程，纖程。英文名Coroutine。一句話說明什麼是協程：協程是一種用戶態的輕量級線程，即協程是由用戶程序本身控制調度的。、

須要強調的是：

#1. python的線程屬於內核級別的，即由操做系統控制調度（如單線程遇到io或執行時間過長就會被迫交出cpu執行權限，切換其餘線程運行） #2. 單線程內開啓協程，一旦遇到io，就會從應用程序級別（而非操做系統）控制切換，以此來提高效率（！！！非io操做的切換與效率無關）

對比操做系統控制線程的切換，用戶在單線程內控制協程的切換

優勢以下：

#1. 協程的切換開銷更小，屬於程序級別的切換，操做系統徹底感知不到，於是更加輕量級 #2. 單線程內就能夠實現併發的效果，最大限度地利用cpu

缺點以下：

#1. 協程的本質是單線程下，沒法利用多核，能夠是一個程序開啓多個進程，每一個進程內開啓多個線程，每一個線程內開啓協程 #2. 協程指的是單個線程，於是一旦協程出現阻塞，將會阻塞整個線程

總結協程特色：

必須在只有一個單線程裏實現併發
修改共享數據不需加鎖
用戶程序裏本身保存多個控制流的上下文棧
附加：一個協程遇到IO操做自動切換到其它協程（如何實現檢測IO，yield、greenlet都沒法實現，就用到了gevent模塊（select機制））

2、Greenlet模塊

安裝：pip3 install greenlet

from greenlet import greenlet def eat(name): print('%s eat 1' %name) g2.switch('egon') print('%s eat 2' %name) g2.switch() def play(name): print('%s play 1' %name) g1.switch() print('%s play 2' %name) g1=greenlet(eat) g2=greenlet(play) g1.switch('egon')#能夠在第一次switch時傳入參數，之後都不須要

greenlet實現狀態切換

單純的切換（在沒有io的狀況下或者沒有重複開闢內存空間的操做），反而會下降程序的執行速度

#順序執行
import time def f1(): res=1
    for i in range(100000000): res+=i def f2(): res=1
    for i in range(100000000): res*=i start=time.time() f1() f2() stop=time.time() print('run time is %s' %(stop-start)) #10.985628366470337

#切換
from greenlet import greenlet import time def f1(): res=1
    for i in range(100000000): res+=i g2.switch() def f2(): res=1
    for i in range(100000000): res*=i g1.switch() start=time.time() g1=greenlet(f1) g2=greenlet(f2) g1.switch() stop=time.time() print('run time is %s' %(stop-start)) # 52.763017892837524

效率對比

greenlet只是提供了一種比generator更加便捷的切換方式，當切到一個任務執行時若是遇到io，那就原地阻塞，仍然是沒有解決遇到IO自動切換來提高效率的問題。

單線程裏的這20個任務的代碼一般會既有計算操做又有阻塞操做，咱們徹底能夠在執行任務1時遇到阻塞，就利用阻塞的時間去執行任務2。。。。如此，才能提升效率，這就用到了Gevent模塊。

3、Gevent模塊

3.1 安裝：pip3 install gevent

Gevent 是一個第三方庫，能夠輕鬆經過gevent實現併發同步或異步編程，在gevent中用到的主要模式是Greenlet, 它是以C擴展模塊形式接入Python的輕量級協程。 Greenlet所有運行在主程序操做系統進程的內部，但它們被協做式地調度。

g1=gevent.spawn(func,1,,2,3,x=4,y=5)建立一個協程對象g1，spawn括號內第一個參數是函數名，如eat，後面能夠有多個參數，能夠是位置實參或關鍵字實參，都是傳給函數eat的 g2=gevent.spawn(func2) g1.join() #等待g1結束
 g2.join() #等待g2結束

#或者上述兩步合做一步：gevent.joinall([g1,g2])
 g1.value#拿到func1的返回值

用法介紹

import gevent def eat(name): print('%s eat 1' %name) gevent.sleep(2) print('%s eat 2' %name) def play(name): print('%s play 1' %name) gevent.sleep(1) print('%s play 2' %name) g1=gevent.spawn(eat,'egon') g2=gevent.spawn(play,name='egon') g1.join() g2.join() #或者gevent.joinall([g1,g2])
print('主')

例：遇到io主動切換

上例gevent.sleep(2)模擬的是gevent能夠識別的io阻塞,而time.sleep(2)或其餘的阻塞,gevent是不能直接識別的須要用下面一行代碼,打補丁,就能夠識別了

from gevent import monkey;monkey.patch_all()必須放到被打補丁者的前面，如time，socket模塊以前

或者咱們乾脆記憶成：要用gevent，須要將from gevent import monkey;monkey.patch_all()放到文件的開頭

from gevent import monkey;monkey.patch_all() import gevent import time def eat(): print('eat food 1') time.sleep(2) print('eat food 2') def play(): print('play 1') time.sleep(1) print('play 2') g1=gevent.spawn(eat) g2=gevent.spawn(play) gevent.joinall([g1,g2]) print('主')

View Code

咱們能夠用threading.current_thread().getName()來查看每一個g1和g2，查看的結果爲DummyThread-n，即假線程

from gevent import monkey;monkey.patch_all() import threading import gevent import time def eat(): print(threading.current_thread().getName()) print('eat food 1') time.sleep(2) print('eat food 2') def play(): print(threading.current_thread().getName()) print('play 1') time.sleep(1) print('play 2') g1=gevent.spawn(eat) g2=gevent.spawn(play) gevent.joinall([g1,g2]) print('主')

查看threading.current_thread().getName()

3.2 Gevent之同步與異步

from gevent import spawn,joinall,monkey;monkey.patch_all() import time def task(pid): """ Some non-deterministic task """ time.sleep(0.5) print('Task %s done' % pid) def synchronous():  # 同步
    for i in range(10): task(i) def asynchronous(): # 異步
    g_l=[spawn(task,i) for i in range(10)] joinall(g_l) print('DONE') if __name__ == '__main__': print('Synchronous:') synchronous() print('Asynchronous:') asynchronous() # 上面程序的重要部分是將task函數封裝到Greenlet內部線程的gevent.spawn。 # 初始化的greenlet列表存放在數組threads中，此數組被傳給gevent.joinall 函數， # 後者阻塞當前流程，並執行全部給定的greenlet任務。執行流程只會在 全部greenlet執行完後纔會繼續向下走。

Gevent之應用舉例一

from gevent import monkey;monkey.patch_all() import gevent import requests import time def get_page(url): print('GET: %s' %url) response=requests.get(url) if response.status_code == 200: print('%d bytes received from %s' %(len(response.text),url)) start_time=time.time() gevent.joinall([ gevent.spawn(get_page,'https://www.python.org/'), gevent.spawn(get_page,'https://www.yahoo.com/'), gevent.spawn(get_page,'https://github.com/'), ]) stop_time=time.time() print('run time is %s' %(stop_time-start_time))

協程應用：爬蟲

Gevent之應用舉例二

經過gevent實現單線程下的socket併發

注意：from gevent import monkey;monkey.patch_all()必定要放到導入socket模塊以前，不然gevent沒法識別socket的阻塞

from gevent import monkey;monkey.patch_all() from socket import *
import gevent #若是不想用money.patch_all()打補丁,能夠用gevent自帶的socket # from gevent import socket # s=socket.socket()

def server(server_ip,port): s=socket(AF_INET,SOCK_STREAM) s.setsockopt(SOL_SOCKET,SO_REUSEADDR,1) s.bind((server_ip,port)) s.listen(5) while True: conn,addr=s.accept() gevent.spawn(talk,conn,addr) def talk(conn,addr): try: while True: res=conn.recv(1024) print('client %s:%s msg: %s' %(addr[0],addr[1],res)) conn.send(res.upper()) except Exception as e: print(e) finally: conn.close() if __name__ == '__main__': server('127.0.0.1',8080)

server

from socket import * client=socket(AF_INET,SOCK_STREAM) client.connect(('127.0.0.1',8080)) while True: msg=input('>>: ').strip() if not msg:continue client.send(msg.encode('utf-8')) msg=client.recv(1024) print(msg.decode('utf-8'))

client

from threading import Thread from socket import *
import threading def client(server_ip,port): c=socket(AF_INET,SOCK_STREAM) #套接字對象必定要加到函數內，即局部名稱空間內，放在函數外則被全部線程共享，則你們公用一個套接字對象，那麼客戶端端口永遠同樣了
 c.connect((server_ip,port)) count=0 while True: c.send(('%s say hello %s' %(threading.current_thread().getName(),count)).encode('utf-8')) msg=c.recv(1024) print(msg.decode('utf-8')) count+=1
if __name__ == '__main__': for i in range(500): t=Thread(target=client,args=('127.0.0.1',8080)) t.start()