1.Scrapy安裝問題html
一開始是按照官方文檔上直接用pip安裝的,建立項目的時候並無報錯,scrapy
然而在運行 scrapy crawl dmoz 的時候錯誤百粗/(ㄒoㄒ)/~~好比:ide
ImportError: No module named _cffi_backendurl
Unhandled error in Deferred 等等,發現是依賴包好多沒有裝上,就去百度安裝各類包,
有好多大神把這些都總結好了:膜拜!^_^spa
http://blog.csdn.net/niying/article/details/27103081.net
http://blog.csdn.net/pleasecallmewhy/article/details/19354723code
2.沒有獲得數據,發現是拼寫錯誤.htm
E:\tutorial>scrapy crawl dmoz 2015-10-30 13:44:02 [scrapy] INFO: Scrapy 1.0.3 started (bot: tutorial) 2015-10-30 13:44:02 [scrapy] INFO: Optional features available: ssl, http11 2015-10-30 13:44:02 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'tu torial.spiders', 'SPIDER_MODULES': ['tutorial.spiders'], 'BOT_NAME': 'tutorial'} 2015-10-30 13:44:02 [scrapy] INFO: Enabled extensions: CloseSpider, TelnetConsol e, LogStats, CoreStats, SpiderState 2015-10-30 13:44:03 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddl eware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultH eadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMidd leware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats 2015-10-30 13:44:03 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddlewa re, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware 2015-10-30 13:44:03 [scrapy] INFO: Enabled item pipelines: 2015-10-30 13:44:03 [scrapy] INFO: Spider opened 2015-10-30 13:44:03 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 i tems (at 0 items/min) 2015-10-30 13:44:03 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023 2015-10-30 13:44:03 [scrapy] INFO: Closing spider (finished) 2015-10-30 13:44:03 [scrapy] INFO: Dumping Scrapy stats: {'finish_reason': 'finished', 'finish_time': datetime.datetime(2015, 10, 30, 5, 44, 3, 292000), 'log_count/DEBUG': 1, 'log_count/INFO': 7, 'start_time': datetime.datetime(2015, 10, 30, 5, 44, 3, 282000)} 2015-10-30 13:44:03 [scrapy] INFO: Spider closed (finished)
在spiders目錄下的dmoz_spiders.py文件中將start_urls寫成了start_url ,哎,╮(╯▽╰)╭blog
1 start_urls = [ 2 "http://www.dmoz.org/Computers/Programming/Languages/Python/Books/", 3 "http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/" 4 ]