今天練習了用爬蟲批量爬取網站文件。練習對象是一個妹子圖片網站,網址在代碼裏有哈哈哈,最後實現了將妹子的大圖批量下載到電腦裏,好爽嘿嘿嘿。收穫以下:html
將遠程文件下載到本地用的是urlretrieve
方法,他主要有兩個參數:文件的網址和要存儲的文件名。其中第二個參數要特別注意:要到文件名才行,不能只是路徑。而文件名的構造採用了以下的代碼,暫時還不太懂,可是先學會再說:app
x =0 for item in imgurl: urlretrieve(item,'/Users/zengyichao/Desktop/工做零碎文件/2.21/test4/'+'%s.jpg'%x) x+=1
import requests from bs4 import BeautifulSoup import time from urllib.request import urlretrieve headers = { 'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36' } imgurl = [] def get_img(url): res = requests.get(url, headers = headers) res.encoding = 'utf-8' soup = BeautifulSoup(res.text,'html.parser') imgs = soup.select('#big-pic > p > a > img') for img in imgs: href = img.get('src') imgurl.append(href) # # url = 'http://www.mmonly.cc/mmtp/xgmn/198663.html' # get_img(url) urls = ['http://www.mmonly.cc/mmtp/xgmn/100306_{}.html'.format(str(i)) for i in range(2,31)] for url in urls: get_img(url) x =0 for item in imgurl: urlretrieve(item,'/Users/zengyichao/Desktop/工做零碎文件/2.21/test4/'+'%s.jpg'%x) x+=1 print(imgurl)