取自shell
Scrapy終端(Scrapy shell)瀏覽器
#判斷 url是不是想要的scrapy
def parse(self, response): if ".org" in response.url: from scrapy.shell import inspect_response #調試語句 inspect_response(response, self) >>> response.url 'http://example.org'
測試提取代碼:ide
>>> sel.xpath('//h1[@class="fn"]') []
瀏覽器打開連接測試
>>> view(response) True
最後您能夠點擊Ctrl-D(Windows下Ctrl-Z)來退出終端,恢復爬取:url
>>> ^D2014-01-23 17:50:03-0400 [myspider] DEBUG: Crawled (200) <GET http://example.net> (referer: None)
在瀏覽器中打開URL.net
from scrapy.utils.response import open_in_browser def parse(self, response): if "item name" not in response.body: open_in_browser(response)