day-3 python多線程編程知識點彙總

時間 2020-05-15

原文原文鏈接

　　python語言以容易入門，適合應用開發，編程簡潔，第三方庫多等等諸多優勢，並吸引廣大編程愛好者。可是也存在一個被熟知的性能瓶頸：python解釋器引入GIL鎖之後，多CPU場景下，也再也不是並行方式運行，甚至比串行性能更差。註定這門語言在某些方面是有天花板的，對於一些並行要求高的系統，python可能再也不成爲首選，甚至是徹底不考慮。可是事情也並非絕對悲觀的，咱們已經看到有一大批人正在致力優化這個特性，新版本較老版本也有了必定改進，一些核心模塊咱們也能夠選用其它模塊開發等等措施。python

一、python多線程編程

threading是python實現多線程編程的經常使用庫，有兩種方式能夠實現多線程：一、調用庫接口傳入功能函數和參數執行；二、自定義線程類繼承threading.Thread，而後重寫__init__和run方法。git

一、調用庫接口傳入功能函數和參數執行github

import threading import queue import time ''' 實現功能：定義一個FIFO的queue,10個元素，3個線程同時來獲取 '''

# 初始化FIFO隊列
q = queue.Queue() for i in range(10): q.put(i) print("%s : Init queue,size:%d"%(time.ctime(),q.qsize())) # 線程功能函數，獲取隊列數據
def run(q,threadid): is_empty = False while not is_empty: if not q.empty(): data = q.get() print("Thread %d get:%d"%(threadid,data)) time.sleep(1) else: is_empty = True # 定義線程列表
thread_handler_lists = [] # 初始化線程
for i in range(3): thread = threading.Thread(target=run,args = (q,i)) thread.start() thread_handler_lists.append(thread) # 等待線程執行完畢
for thread_handler in thread_handler_lists: thread_handler.join() print("%s : End of progress"%(time.ctime()))

View Code

二、自定義線程類繼承threading.Thread，而後重寫__init__和run方法編程

　　和其它語言同樣，爲了保證多線程間數據一致性，threading庫自帶鎖功能，涉及3個接口：安全

thread_lock = threading.Lock() 建立一個鎖對象多線程

thread_lock.acquire() 獲取鎖併發

thread_lock.release() 釋放鎖app

　　注意：因爲python模塊queue已經實現多線程安全，實際編碼中，再也不須要進行鎖的操做，此處只是進行編程演示。ide

import threading import queue import time ''' 實現功能：定義一個FIFO的queue,10個元素，3個線程同時來獲取 queue線程安全的隊列，所以不須要加 thread_lock.acquire() thread_lock.release() '''

# 自定義一個線程類，繼承threading.Thread，重寫__init__和run方法便可
class MyThread(threading.Thread): def __init__(self,threadid,name,q): threading.Thread.__init__(self) self.threadid = threadid self.name = name self.q =q print("%s : Init %s success."%(time.ctime(),self.name)) def run(self): is_empty = False while not is_empty: thread_lock.acquire() if not q.empty(): data = self.q.get() print("Thread %d get:%d"%(self.threadid,data)) time.sleep(1) thread_lock.release() else: is_empty = True thread_lock.release() # 定義一個鎖
thread_lock = threading.Lock() # 定義一個FIFO隊列
q = queue.Queue() # 定義線程列表
thread_name_list = ["Thread-1","Thread-2","Thread-3"] thread_handler_lists = [] # 初始化隊列
thread_lock.acquire() for i in range(10): q.put(i) thread_lock.release() print("%s : Init queue,size:%d"%(time.ctime(),q.qsize())) # 初始化線程
thread_id = 1
for thread_name in thread_name_list: thread = MyThread(thread_id,thread_name,q) thread.start() thread_handler_lists.append(thread) thread_id += 1

# 等待線程執行完畢
for thread_handler in thread_handler_lists: thread_handler.join() print("%s : End of progress"%(time.ctime()))

View Code

　　另外多線程還涉及事件和信號量，很簡單，就再也不貼代碼了函數

　　用threading.Event 實現線程間通訊，使用threading.Event可使一個線程等待其餘線程的通知，咱們把這個Event傳遞到線程對象中，
　　涉及接口：set()、isSet()、Event()、clear()

　　若是在主機執行IO密集型任務的時候再執行這種類型的程序時，計算機就有很大可能會宕機。
　　這時候就能夠爲這段程序添加一個計數器功能，來限制一個時間點內的線程數量。

　　涉及接口：threading.Semaphore(5)、acquire()、release()

二、python多線程機制分析

　　討論前，咱們先梳理幾個概念：

　　並行和併發

　　併發的關鍵是你有處理多個任務的能力，不必定要同時。而並行的關鍵是你有同時處理多個任務的能力。我認爲它們最關鍵的點就是：是不是『同時』，或者說並行是併發的子集。

　　GIL

　　GIL:全局解釋鎖，python解釋器級別的鎖，爲了保證程序自己運行正常，例如python的自動垃圾回收機制，在咱們程序運行的同時，也在進行垃圾清理工做。

　　下圖試圖模擬A進程中3個線程的執行狀況：

　　　　一、 t一、t二、t3線程處於就緒狀態，同時向python解釋器獲取GIL鎖

　　　　二、假設t1獲取到GIL鎖，被python分配給任意CPU執行，處於運行狀態

　　　　三、 Python基於某種調度方式（例如pcode），會讓t1釋放GIL鎖，從新處於就緒狀態

　　　　四、重複1步驟，假設這時t2獲取到GIL鎖，運行過程同上，被python分配給任意CPU執行，處於運行狀態，Python基於某種調度方式（例如pcode），會讓t2釋放GIL鎖，從新處於就緒狀態

　　　　五、最後能夠推得t一、t二、t3按以下一、二、三、4方式串行運行

所以，儘管t一、t二、t3爲三個線程，理論上能夠並行運行，但實際上python解釋器引入GIL鎖之後，多CPU場景下，也再也不是並行方式運行，甚至比串行性能更差，下面咱們作個測試：

咱們寫兩個計算函數，測試單線程和多線程的時間開銷，代碼以下：

import threading import time # 定義兩個計算量大的函數
def sum(): sum = 0 for i in range(100000000): sum += i def mul(): sum = 0 for i in range(10000000): sum *= i # 單線程時間測試
starttime = time.time() sum() mul() endtime = time.time() period = endtime - starttime print("The single thread cost:%d"%(period)) # 多線程時間測試
starttime = time.time() l = [] t1 = threading.Thread(target = sum) t2 = threading.Thread(target = sum) l.append(t1) l.append(t2) for i in l: i.start() for i in l: i.join() endtime = time.time() period = endtime - starttime print("The mutiple thread cost:%d"%(period)) print("End of program.")

View Code

測試發現，多線程的時間開銷竟然比單線程還要大：

這個結果有點讓人不可接受，那有沒有辦法優化？答案是有的，好比把多線程變成多進程，可是考慮到進程開銷問題，實際編程中，不能開過多進程，下面是多進程測試代碼：

''' 程序欲實現功能：定義1個CPU佔用高函數，測試Python多進程執行效率 '''

import multiprocessing import time def mul(): sum = 0 for i in range(1000000000): sum *= i if __name__ == "__main__": start_time = time.time() # 執行兩個函數
 mul() mul() end_time = time.time() print("single proccess cost : %d" % (end_time - start_time)) start_time = time.time() #定義兩個進程
    l = [] p1 = multiprocessing.Process(target = mul) p1.start() l.append(p1) p2 = multiprocessing.Process(target = mul) p2.start() l.append(p2) #等待進程執行完畢
    for p_list in l: p_list.join() end_time = time.time() print("Mutiple proccess cost : %d"%(end_time - start_time))