Scrapy經常使用命令: html
全局命令,不須要建項目也可執行:startproject settings runspider shell fetch view version
python
項目命令:crawl cheak list edit parse genspider deploy bench
chrome
下面一次介紹各命令功能:
shell
1. startproject:新建爬蟲項目瀏覽器
語法:app
scrapy startproject <項目名>
2.genspider: 在項目中新建spiderdom
語法:scrapy
scrapy genspider [-t 模板] <爬蟲名> <域名>
模板有四種 basic crawl csvfeed xmlfeed, 可用-d來預覽生成的模板ide
D:\crawler\lagou\spider>scrapy genspider -d basic # -*- coding: utf-8 -*- import scrapy class $classname(scrapy.Spider): name = "$name" allowed_domains = ["$domain"] start_urls = ( 'http://www.$domain/', ) def parse(self, response): pass
3.crawl: 運行爬蟲
測試
語法:
scrapy crawl <爬蟲名>
4.cheak:檢查
語法:
scrapy check [-l] <爬蟲名>
例如
D:\crawler\lagou\spider>scrapy check lagou ---------------------------------------------------------------------- Ran 0 contracts in 0.000s OK
5.fetch: 獲取指定內容
語法:
scrapy fetch <url>
使用Scrapy下載器(downloader)下載給定的URL,並將獲取到的內容送到標準輸出
例如要查看百度的headers:
D:\crawler\lagou\spider>scrapy fetch --nolog --headers http://www.baidu.com > Accept-Language: en > Accept-Encoding: gzip,deflate > Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 > User-Agent: Scrapy/1.0.3 (+http://scrapy.org) > < Bdqid: 0x9eec8e8400034b3c < Bduserid: 0 < Set-Cookie: BAIDUID=20B9DAB5F75E14AFB3447C6857ACFDA3:FG=1; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com < Set-Cookie: BIDUPSID=20B9DAB5F75E14AFB3447C6857ACFDA3; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com < Set-Cookie: PSTM=1452846490; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com < Set-Cookie: BDSVRTM=0; path=/ < Set-Cookie: BD_HOME=0; path=/ < Set-Cookie: H_PS_PSSID=18439_18720_1431_18878_12825_17565_18965_18768_18971_18778_18780_17000_18782_17072_15098_12356_18018_10634; path=/; domain=.baidu.com < Expires: Fri, 15 Jan 2016 08:27:26 GMT < Vary: Accept-Encoding < X-Powered-By: HPHP < Server: BWS/1.1 < Cxy_All: baidu+12d68e0b8747863f4dbde8d1321e60b9 < Cache-Control: private < Date: Fri, 15 Jan 2016 08:28:10 GMT < P3P: CP=" OTI DSP COR IVA OUR IND COM " < Content-Type: text/html; charset=utf-8 < Bdpagetype: 1 < X-Ua-Compatible: IE=Edge,chrome=1
6.view: 在瀏覽器中將url以Scrapy能獲取到的形式展示
語法:
scrapy view <url>
因爲有些網頁嵌有JS等腳本,scrapy能獲取到的和用戶在瀏覽器中看到的並不同,所以能夠用此方法來檢查spider獲取到的頁面,已確認這是您所指望的。
7. shell : scrapy測試終端
語法:
scrapy shell [url]
將以給定的url啓動scrapy,並能夠在此界面進行xpath測試等操做
8. runspider :在不建立項目的狀況下,運行一個spider
語法:
scrapy runspider <spider_file>
9.deploy: 將項目部署到Scrapyd服務
語法:
scrapy deploy [ <target:project> | -l <target> | -L ]
具體查看官方文檔:http://scrapyd.readthedocs.org/en/latest/deploy.html
轉載請註明 開源中國:http://my.oschina.net/u/2463131/blog/603333