如何實現多個任務的同時進行 並且還效率高 react
效率最低最不可取web
import requests urls = [ 'http://www.baidu.com/', 'https://www.cnblogs.com/', 'https://www.cnblogs.com/news/', 'https://cn.bing.com/', 'https://stackoverflow.com/', ] for url in urls: response = requests.get(url) print(response)
多線程存在線程利用率不高的問題多線程
import requests import threading urls = [ 'http://www.baidu.com/', 'https://www.cnblogs.com/', 'https://www.cnblogs.com/news/', 'https://cn.bing.com/', 'https://stackoverflow.com/', ] def task(url): response = requests.get(url) print(response) for url in urls: t = threading.Thread(target=task,args=(url,)) t.start()
gevent內部調用greenlet(實現了協程)app
基於協程比線程更加省資源異步
from gevent import monkey; monkey.patch_all() import gevent import requests def func(url): response = requests.get(url) print(response) urls = [ 'http://www.baidu.com/', 'https://www.cnblogs.com/', 'https://www.cnblogs.com/news/', 'https://cn.bing.com/', 'https://stackoverflow.com/', ] spawn_list = [] for url in urls: spawn_list.append(gevent.spawn(func, url)) # 建立協程 gevent.joinall(spawn_list)
基於事件循環的異步非阻塞模塊:Twisted函數
from twisted.web.client import getPage, defer from twisted.internet import reactor def stop_loop(arg): reactor.stop() def get_response(contents): print(contents) deferred_list = [] url_list = [ 'http://www.baidu.com/', 'https://www.cnblogs.com/', 'https://www.cnblogs.com/news/', 'https://cn.bing.com/', 'https://stackoverflow.com/', ] for url in url_list: deferred = getPage(bytes(url, encoding='utf8')) # 拿到了要爬取的任務,並無真正的執行爬蟲 deferred.addCallback(get_response) # 要調用的回調函數 deferred_list.append(deferred) # 將全部的任務加入帶一個列表裏面 dlist = defer.DeferredList(deferred_list) # 檢測全部的任務是否都被循環 dlist.addBoth(stop_loop) # 若是列表中的任務都完成了就中止循環,執行中止的函數 reactor.run()