gevent異步爬蟲

本文首發於知乎
以前咱們講過基於asycnio的異步爬蟲實現,不過代碼過於複雜,本文咱們使用gevent模塊實現異步爬蟲。php

本文分爲以下部分html

  • 用gevent實現異步爬蟲
  • grequests模塊

用gevent實現異步爬蟲

由於使用很是簡單,就直接上代碼了python

import gevent
from gevent import monkey
import requests
from bs4 import BeautifulSoup
monkey.patch_all() # 對全部io操做打上補丁,固定加這一句
def get_title(i):
url = 'https://movie.douban.com/top250?start={}&filter='.format(i*25)
text = requests.get(url).content
soup = BeautifulSoup(text, 'html.parser')
lis = soup.find('ol', class_='grid_view').find_all('li')
for li in lis:
title = li.find('span', class_="title").text
print(title)
gevent.joinall([gevent.spawn(get_title, i) for i in range(10)])
複製代碼

gevent本質上是開啓了多個微線程,下面咱們用threading模塊來檢驗一下編程

import gevent
from gevent import monkey
import requests
from bs4 import BeautifulSoup
import threading
monkey.patch_all()
def get_title(i):
print(threading.current_thread().name) # 打印出當前線程名稱
url = 'https://movie.douban.com/top250?start={}&filter='.format(i*25)
text = requests.get(url).content
soup = BeautifulSoup(text, 'html.parser')
lis = soup.find('ol', class_='grid_view').find_all('li')
for li in lis:
title = li.find('span', class_="title").text
print(title)
gevent.joinall([gevent.spawn(get_title, i) for i in range(10)])
複製代碼

運行結果首先打印出了下面內容網絡

DummyThread-1
DummyThread-2
DummyThread-3
DummyThread-4
DummyThread-5
DummyThread-6
DummyThread-7
DummyThread-8
DummyThread-9
DummyThread-10
複製代碼

表示這裏其實開了10個微線程同時運行。app

其實咱們也能夠控制用一個線程來完成,只須要這樣改異步

monkey.patch_all()
改爲
monkey.patch_all(thread=False)
複製代碼

grequests模塊

requests庫的做者將requests和gevent融合產生了grequests模塊,專門用於異步網絡請求,使用以下ui

import grequests
from bs4 import BeautifulSoup
def get_title(rep):
soup = BeautifulSoup(rep.text, 'html.parser')
lis = soup.find('ol', class_='grid_view').find_all('li')
for li in lis:
title = li.find('span', class_="title").text
print(title)
reps = (grequests.get('https://movie.douban.com/top250?start={}&filter='.format(i*25)) for i in range(10))
for rep in grequests.map(reps):
get_title(rep)
複製代碼

歡迎關注個人知乎專欄

專欄主頁:python編程lua

專欄目錄:目錄url

版本說明:軟件及包版本說明

相關文章
相關標籤/搜索