這是一個罪惡的爬蟲html
爬取 http://www.27gif.net/gifcc 中的gif圖,並以‘神祕代碼’爲它的文件名保存。python
------------------------------------------------------------------------------------------------------url
import requests from bs4 import BeautifulSoup page = 1 while True: # 請求起始頁,找到每一個圖帖子的鏈接,並自動保存在list中 star_url = 'http://www.27gif.net/gifcc/page/%s/' % str(page) star_html = requests.get(star_url).text star_soup = BeautifulSoup(star_html,'lxml') gif_list = star_soup.find_all('div',class_='wow fadeInUp') # 遍歷全部帖子的list for gif_html in gif_list: # 找到img標籤中的'alt屬性' 整理獲得gif的url try: gif_name = gif_html.find('img')['alt'].split(':')[1] except TypeError as E: continue except IndexError as e: gif_name = gif_html.find('img')['alt'] try: gif_url = gif_html.find('img')['src'].split('src=')[1].split('&w=')[0] except TypeError as E: continue # 請求gif的url 並保存 gif_content = requests.get(gif_url).content with open(gif_name+'.gif','wb') as f: f.write(gif_content) print(gif_name+' OK!') if page < 13: page += 1 else: break
運行完畢後,會在當前文件夾保存GIF圖。.net
使用前請備好紙巾,使用後請及時喝養分快線xml