一、簡介python
1. 在編寫爬蟲時,性能的消耗主要在IO請求中,當單進程單線程模式下請求URL時必然會引發等待,從而使得請求總體變慢。git
2. 進程:啓用進程很是浪費資源github
3. 線程:線程多,而且在阻塞過程當中沒法執行其餘任務多線程
4. 協程:gevent只用起一個線程,當請求發出去後gevent就無論,永遠就只有一個線程工做,誰先回來先處理併發
二、實現併發幾個方法比較異步
1)使用線程池實現併發async
#! /usr/bin/env python # -*- coding: utf-8 -*- import requests from concurrent.futures import ThreadPoolExecutor def fetch_request(url): result = requests.get(url) print(result.content) pool = ThreadPoolExecutor(10) # 建立一個線程池,最多開10個線程 url_list = [ 'www.google.com', 'http://www.baidu.com', ] for url in url_list: # 去線程池中獲取一個線程 # 線程去執行fetch_request方法 pool.submit(fetch_request,url) pool.shutdown(True) # 主線程本身關閉,讓子線程本身拿任務執行
2)使用進程池實現併發ide
#! /usr/bin/env python # -*- coding: utf-8 -*- import requests from concurrent.futures import ProcessPoolExecutor def fetch_request(url): result = requests.get(url) print(result.text) url_list = [ 'www.google.com', 'http://www.bing.com', ] if __name__ == '__main__': pool = ProcessPoolExecutor(10) # 線程池 # 缺點:線程多,而且在阻塞過程當中沒法執行其餘任務 for url in url_list: # 去線程池中獲取一個進程 # 進程去執行fetch_request方法 pool.submit(fetch_request,url) pool.shutdown(True)
3)多線程+回調函數執行函數
#! /usr/bin/env python # -*- coding: utf-8 -*- from concurrent.futures import ThreadPoolExecutor import requests def fetch_async(url): response = requests.get(url) return response def callback(future): print(future.result().content) if __name__ == '__main__': url_list = ['http://www.github.com', 'http://www.bing.com'] pool = ThreadPoolExecutor(5) for url in url_list: v = pool.submit(fetch_async, url) v.add_done_callback(callback) pool.shutdown(wait=True)
4) 協程:微線程實現異步性能
#! /usr/bin/env python # -*- coding: utf-8 -*- import gevent import requests from gevent import monkey monkey.patch_all() # 這些請求誰先回來就先處理誰 def fetch_async(method, url, req_kwargs): print(method, url, req_kwargs) response = requests.request(method=method, url=url, **req_kwargs) print(response.url, response.content) if __name__ == '__main__': ##### 發送請求 ##### gevent.joinall([ gevent.spawn(fetch_async, method='get', url='https://www.python.org/', req_kwargs={}), gevent.spawn(fetch_async, method='get', url='https://www.yahoo.com/', req_kwargs={}), gevent.spawn(fetch_async, method='get', url='https://github.com/', req_kwargs={}), ])
1111111111111