帶你學習目前很是流行的開源爬蟲框架Scrapy

Scrapy安裝

官網 https://scrapy.org/html

安裝方式

在任意操做系統下,能夠使用pip安裝Scrapy,例如:shell

$ pip install scrapy

 

爲確認Scrapy已安裝成功,首先在Python中測試可否導入Scrapy模塊:api

>>> import scrapy  
>>> scrapy.version_info
(1, 8, 0)

 

Python爬蟲、數據分析、網站開發等案例教程視頻免費在線觀看瀏覽器

https://space.bilibili.com/523606542

Python學習交流羣:1039649593

而後,在 shell 中測試可否執行 Scrapy 這條命令:dom

(base) λ scrapy 
Scrapy 1.8.0 - no active project 
Usage: 
  scrapy <command> [options] [args] 

Available commands: 
  bench Run quick benchmark test
  fetch Fetch a URL using the Scrapy downloader 
  genspider Generate new spider using pre-defined templates 
  runspider Run a self-contained spider (without creating a project) 
  settings Get settings values 
  shell Interactive scraping console 
  startproject Create new project version 
  Print Scrapy version 
  view Open URL in browser, as seen by Scrapy 

  [ more ] More commands available when run from project directory 

Use "scrapy <command> -h" to see more info about a command

 

經過了以上兩項檢測,說明Scrapy安裝成功了。如上所示,咱們安裝的是當前最新版本1.8.0scrapy

注意:

  • 在安裝Scrapy的過程當中可能會遇到缺乏VC++等錯誤,能夠安裝缺失模塊的離線包
  • 成功安裝後,在CMD下運行scrapy出現上圖不算真正成功,檢測真正是否成功使用 scrapy bench 測試,若是沒有提示錯誤,就表明成功安裝

具體Scrapy安裝流程參考: http://doc.scrapy.org/en/latest/intro/install.html##intro-install-platform-notes 裏面有各個平臺的安裝方法ide

全局命令

$ scrapy 
Scrapy 1.7.3 - no active project 
Usage: 
  scrapy <command> [options] [args] 

Available commands: 
  bench Run quick benchmark test 
        ## 測試電腦性能。
  fetch Fetch a URL using the Scrapy downloader 
        ## 將源代碼下載下來並顯示出來
  genspider Generate new spider using pre-defined templates 
        ## 建立一個新的 spider 文件 
  runspider Run a self-contained spider (without creating a project) 
        ## 這個和經過crawl啓動爬蟲不一樣,scrapy runspider 爬蟲文件名稱 
  settings Get settings values 
        ## 獲取當前的配置信息 
  shell Interactive scraping console 
        ## 進入 scrapy 的交互模式 
  startproject Create new project 
        ## 建立爬蟲項目。 
  version Print Scrapy version 
  view Open URL in browser, as seen by Scrapy 
        ## 將網頁document內容下載下來,而且在瀏覽器顯示出來 

  [ more ] More commands available when run from project directory 

Use "scrapy <command> -h" to see more info about a command

 

項目命令

    • scrapy startproject projectname
      建立一個項目
    • scrapy genspider spidername domain
      建立爬蟲。建立好爬蟲項目之後,還須要建立爬蟲。
    • scrapy crawl spidername運行爬蟲。注意該命令運行時所在的目錄。
相關文章
相關標籤/搜索