帶你學習目前很是流行的開源爬蟲框架Scrapy

時間 2021-03-05

標籤 html shell api 瀏覽器 dom scrapy ide 性能學習測試欄目網絡爬蟲简体版

原文原文鏈接

Scrapy安裝

官網 https://scrapy.org/html

安裝方式

在任意操做系統下，能夠使用pip安裝Scrapy，例如：shell

$ pip install scrapy

爲確認Scrapy已安裝成功，首先在Python中測試可否導入Scrapy模塊：api

>>> import scrapy  
>>> scrapy.version_info
(1, 8, 0)

Python爬蟲、數據分析、網站開發等案例教程視頻免費在線觀看瀏覽器

https://space.bilibili.com/523606542

Python學習交流羣：1039649593

而後，在 shell 中測試可否執行 Scrapy 這條命令：dom

(base) λ scrapy 
Scrapy 1.8.0 - no active project 
Usage: 
  scrapy <command> [options] [args] 

Available commands: 
  bench Run quick benchmark test
  fetch Fetch a URL using the Scrapy downloader 
  genspider Generate new spider using pre-defined templates 
  runspider Run a self-contained spider (without creating a project) 
  settings Get settings values 
  shell Interactive scraping console 
  startproject Create new project version 
  Print Scrapy version 
  view Open URL in browser, as seen by Scrapy 

  [ more ] More commands available when run from project directory 

Use "scrapy <command> -h" to see more info about a command

經過了以上兩項檢測，說明Scrapy安裝成功了。如上所示，咱們安裝的是當前最新版本1.8.0scrapy

注意：

在安裝Scrapy的過程當中可能會遇到缺乏VC++等錯誤，能夠安裝缺失模塊的離線包
成功安裝後，在CMD下運行scrapy出現上圖不算真正成功，檢測真正是否成功使用 scrapy bench 測試，若是沒有提示錯誤，就表明成功安裝

具體Scrapy安裝流程參考： http://doc.scrapy.org/en/latest/intro/install.html##intro-install-platform-notes 裏面有各個平臺的安裝方法ide

全局命令

$ scrapy 
Scrapy 1.7.3 - no active project 
Usage: 
  scrapy <command> [options] [args] 

Available commands: 
  bench Run quick benchmark test 
        ## 測試電腦性能。
  fetch Fetch a URL using the Scrapy downloader 
        ## 將源代碼下載下來並顯示出來
  genspider Generate new spider using pre-defined templates 
        ## 建立一個新的 spider 文件 
  runspider Run a self-contained spider (without creating a project) 
        ## 這個和經過crawl啓動爬蟲不一樣，scrapy runspider 爬蟲文件名稱 
  settings Get settings values 
        ## 獲取當前的配置信息 
  shell Interactive scraping console 
        ## 進入 scrapy 的交互模式 
  startproject Create new project 
        ## 建立爬蟲項目。 
  version Print Scrapy version 
  view Open URL in browser, as seen by Scrapy 
        ## 將網頁document內容下載下來，而且在瀏覽器顯示出來 

  [ more ] More commands available when run from project directory 

Use "scrapy <command> -h" to see more info about a command