進程運行的三個狀態:運行、就緒、阻塞html
同步:提交一個任務,自任務開始運行直到此任務結束(可能有IO),返回一個返回值以後,我再提交下一個任務python
異步:一次提交多個任務,而後直接執行下一行代碼,等待任務結果瀏覽器
返回結果如何回收?服務器
案例:給三個老師發佈任務:併發
from concurrent.futures import ProcessPoolExecutor,ThreadPoolExecutor import time import random import os def task(i): print(f"{os.getpid()} 開始了") time.sleep(random.randint(1, 3)) print(f"{os.getpid()} 結束了") return i if __name__ == '__main__': pool = ProcessPoolExecutor(4) for i in range(6): obj = pool.submit(task, i) # obj是一個動態對象,返回當前對象的狀態,有可能運行中(running),可能pending(就緒或阻塞),還可能使結束了(finished returned int) # obj.result()必須等到這個任務完成後,返回結果以後再執行下一個任務 print(obj.result()) # obj.result()沒有返回值 pool.shutdown(wait=True) print("===主")
異步調用返回值如何接收? 未解決app
from concurrent.futures import ProcessPoolExecutor,ThreadPoolExecutor import time import random import os def task(i): print(f"{os.getpid()} 開始了") time.sleep(random.randint(1, 3)) print(f"{os.getpid()} 結束了") return i if __name__ == '__main__': pool = ProcessPoolExecutor(4) for i in range(6): pool.submit(task, i) pool.shutdown(wait=True) # shutdown:讓個人主進程等待進程池中全部的子進程都結束以後再執行下面的代碼,優勢相似於join # shutdown:在上一個進程池沒有完成全部的任務以前,不容許添加新的任務 # 一個任務是經過一個函數實現的,任務完成了他的返回值就是函數的返回值 print("===主")
方式一:異步調用統一接收結果dom
缺點:我不能立刻收到任何一個已經完成的任務的返回值,我只能等到全部的任務所有結束以後統一回收異步
from concurrent.futures import ProcessPoolExecutor,ThreadPoolExecutor import time import random import os def task(i): print(f"{os.getpid()} 開始了") time.sleep(random.randint(1, 3)) print(f"{os.getpid()} 結束了") return i if __name__ == '__main__': pool = ProcessPoolExecutor(4) lst = [] for i in range(6): obj = pool.submit(task, i) lst.append(obj) pool.shutdown() for i in lst: i.result() # print(i.result()) print("===主")
第二種方式:異步調用+回調函數函數
利用代碼模擬一個瀏覽器,進行瀏覽器的工做流程獲得一堆源代碼url
對源代碼進行數據清洗獲得我想要的數據
頁面請求的狀態值
分別有:200請求成功、303重定向、400請求錯誤、401未受權、403禁止訪問、404文件未找到、500服務器錯誤
代碼
import requests ret = requests.get("http://www.baidu.com") if ret.status_code == 200: print(ret.text)
主代碼
from concurrent.futures import ProcessPoolExecutor,ThreadPoolExecutor import requests def task(url): """ 模擬的是爬取多個源代碼,必定有IO操做 :param url: :return: """ ret = requests.get(url) if ret.status_code == 200: return ret.text def parse(content): """ 模擬對數據進行分析,通常沒有IO :param content: :return: """ return len(content) if __name__ == '__main__': # 串行 耗費時間長,不可取 # ret1 = task("http://www.baidu.com") # print(parse(ret1)) # ret2 = task("http://www.JD.com") # print(parse(ret2)) # ret3 = task("http://www.taobao.com") # print(parse(ret3)) # ret4 = task("https://www.cnblogs.com/jin-xin/articles/7459977.html") # print(parse(ret4)) # 開啓線程池,併發並行的執行 url_list = [ "http://www.baidu.com", "http://www.JD.com", "http://www.taobao.com", "https://www.cnblogs.com/jin-xin/articles/7459977.html", 'https://www.cnblogs.com/jin-xin/articles/7459977.html', 'https://www.cnblogs.com/jin-xin/articles/7459977.html', 'https://www.cnblogs.com/jin-xin/articles/7459977.html', 'https://www.cnblogs.com/jin-xin/articles/9811379.html', 'https://www.cnblogs.com/jin-xin/articles/11245654.html', 'https://www.luffycity.com/' ] pool = ThreadPoolExecutor(4) obj_list = [] for url in url_list: obj = pool.submit(task, url) obj_list.append(obj) pool.shutdown(wait=True) for i in obj_list: print(parse(i.result())) print("===主")
總結:
缺點:
異步發出10個任務,併發的執行,可是統一接收了全部任務的返回值(效率低,不能實時的獲取結果)
分析結果流程是串行,影響了效率
for res in obj_list: print(parse(res.result()))
針對版本一的缺點2進行改進,讓串行變成併發或並行
解決方式
主代碼
from concurrent.futures import ProcessPoolExecutor,ThreadPoolExecutor import requests def task(url): """ 模擬的是爬取多個源代碼,必定有IO操做 :param url: :return: """ ret = requests.get(url) if ret.status_code == 200: return parse(ret.text) def parse(content): """ 模擬對數據進行分析,通常沒有IO :param content: :return: """ return len(content) if __name__ == '__main__': url_list = [ "http://www.baidu.com", "http://www.JD.com", "http://www.taobao.com", "https://www.cnblogs.com/jin-xin/articles/7459977.html", 'https://www.cnblogs.com/jin-xin/articles/7459977.html', 'https://www.cnblogs.com/jin-xin/articles/7459977.html', 'https://www.cnblogs.com/jin-xin/articles/7459977.html', 'https://www.cnblogs.com/jin-xin/articles/9811379.html', 'https://www.cnblogs.com/jin-xin/articles/11245654.html', 'https://www.luffycity.com/' ] pool = ThreadPoolExecutor(4) obj_list = [] for url in url_list: obj = pool.submit(task, url) obj_list.append(obj) pool.shutdown(wait=True) for i in obj_list: print(i.result()) print("===主")
總結:
版本一與版本二對比
版本一:
版本二:
缺點:
基於異步調用回收全部任務的結果我要作到實時回收結果
併發執行任務每一個任務只是處理IO阻塞的,不能增長新的功能
異步調用+回調函數
from concurrent.futures import ProcessPoolExecutor,ThreadPoolExecutor import requests def task(url): """ 模擬的是爬取多個源代碼,必定有IO操做 :param url: :return: """ ret = requests.get(url) if ret.status_code == 200: return ret.text def parse(obj): """ 模擬對數據進行分析,通常沒有IO :param content: :return: """ print(len(obj.result())) if __name__ == '__main__': url_list = [ "http://www.baidu.com", "http://www.JD.com", "http://www.taobao.com", "https://www.cnblogs.com/jin-xin/articles/7459977.html", 'https://www.cnblogs.com/jin-xin/articles/7459977.html', 'https://www.cnblogs.com/jin-xin/articles/7459977.html', 'https://www.cnblogs.com/jin-xin/articles/7459977.html', 'https://www.cnblogs.com/jin-xin/articles/9811379.html', 'https://www.cnblogs.com/jin-xin/articles/11245654.html', 'https://www.luffycity.com/' ] pool = ThreadPoolExecutor(4) for url in url_list: obj = pool.submit(task, url) obj.add_done_callback(parse)
總結:
異步與回調是一回事?