scrapy下調試單個函數的方法

進行抓取任務時很苦惱的一點在於爲了調試某個第三,四層以上的跳轉連接須要等待將前面的連接都跑一遍,才能肯定某個頁面的parse函數是否正確,scrapy的命令行參數 parse就是爲了解決這一問題.html

官網的描述

Syntax: scrapy parse <url> [options]
意思就是 scrpy parse 網址 可選參數python

官網給出的例子 $ scrapy shell       http://www.example.com/some/page.html

個人實踐之路

開始運行時結果老是沒有打印出任何log來,因而將本來0.25的scrapy升級到1.0
這時再輸入shell

scrapy parse http://www.douban.com -c group_parse

報了這樣的錯誤scrapy

ERROR: Unable to find spider for: http://www.douban.com

還有多是這樣的ide

Traceback (most recent call last):
  File "/usr/local/bin/scrapy", line 11, in <module>
    sys.exit(execute())
  File "/Library/Python/2.7/site-packages/scrapy/cmdline.py", line 143, in execute
    _run_print_help(parser, _run_command, cmd, args, opts)
  File "/Library/Python/2.7/site-packages/scrapy/cmdline.py", line 89, in _run_print_help
    func(*a, **kw)
  File "/Library/Python/2.7/site-packages/scrapy/cmdline.py", line 150, in _run_command
    cmd.run(args, opts)
  File "/Library/Python/2.7/site-packages/scrapy/commands/parse.py", line 220, in run
    self.set_spidercls(url, opts)
  File "/Library/Python/2.7/site-packages/scrapy/commands/parse.py", line 147, in set_spidercls
    self.spidercls.start_requests = _start_requests
AttributeError: 'NoneType' object has no attribute 'start_requests'

好吧,自動找不到咱們就顯示指定下爬蟲的名字
就是在繼承自spider類裏定義的那個name裏的值函數

class douban(Spider):
    name = "douban_spider"

ok 問題解決url

相關文章
相關標籤/搜索