Python爬蟲教程：包圖網免費付費素材爬取【附源碼】

包圖網你們都知道吧集齊海量設計素材十分好用惋惜太貴了,今天就帶你們使用Python—爬蟲爬取這些素材而且保存到本地!html

抓取一個網站的內容，咱們須要從如下幾方面入手：python

1-如何抓取網站的下一頁連接？swift

2-目標資源是靜態仍是動態(視頻、圖片等)微信

3-該網站的數據結構格式網絡

源代碼以下

import requestsfrom lxml import etreeimport threading  class Spider(object): def __init__(self): self.headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36"} self.offset = 1  def start_work(self, url): print("正在爬取第 %d 頁......" % self.offset) self.offset += 1 response = requests.get(url=url,headers=self.headers) html = response.content.decode() html = etree.HTML(html)  video_src = html.xpath('//div[@class="video-play"]/video/@src') video_title = html.xpath('//span[@class="video-title"]/text()') next_page = "http:" + html.xpath('//a[@class="next"]/@href')[0] # 爬取完畢... if next_page == "http:": return  self.write_file(video_src, video_title) self.start_work(next_page)  def write_file(self, video_src, video_title): for src, title in zip(video_src, video_title): response = requests.get("http:"+ src, headers=self.headers) file_name = title + ".mp4" file_name = "".join(file_name.split("/")) print("正在抓取%s" % file_name) with open('E://python//demo//mp4//'+file_name, "wb") as f: f.write(response.content) if __name__ == "__main__": spider = Spider() for i in range(0,3): # spider.start_work(url="https://ibaotu.com/shipin/7-0-0-0-"+ str(i) +"-1.html") t = threading.Thread(target=spider.start_work, args=("https://ibaotu.com/shipin/7-0-0-0-"+ str(i) +"-1.html",))        t.start()

效果展現
數據結構

爲了幫助你們更輕鬆的學好Python開發，爬蟲技術，Python數據分析，人工智能,給你們分享一套系統教學資源，加Python技術學習qq裙：583262168，免費領取。學習過程當中有疑問，羣裏有專業的老司機免費答疑解惑!app

PS：若是以爲本篇文章對您有所幫助，歡迎關注、訂閱！幫忙點個再看轉發一下 分享出去ide

*聲明：本文於網絡整理，版權歸原做者全部，如來源信息有誤或侵犯權益，請聯繫咱們刪除或受權事宜。學習

本文分享自微信公衆號 - python教程（pythonjc）。
若有侵權，請聯繫 support@oschina.cn 刪除。
本文參與「OSC源創計劃」，歡迎正在閱讀的你也加入，一塊兒分享。網站