爬蟲 - scrapy之定製命令

單爬蟲運行

import sys
from scrapy.cmdline import execute
 
if __name__ == '__main__':
    execute(["scrapy","crawl","chouti","--nolog"])

而後右鍵運行py文件便可運行名爲‘chouti‘的爬蟲scrapy

同時運行多個爬蟲

步驟以下:ide

- 在spiders同級建立任意目錄,如:commands
- 在其中建立 crawlall.py 文件 (此處文件名就是自定義的命令)
- 在settings.py 中添加配置 COMMANDS_MODULE = '項目名稱.目錄名稱'
- 在項目目錄執行命令:scrapy crawlall


代碼以下:
 1 from scrapy.commands import ScrapyCommand
 2     from scrapy.utils.project import get_project_settings
 3  
 4     class Command(ScrapyCommand):
 5  
 6         requires_project = True
 7  
 8         def syntax(self):
 9             return '[options]'
10  
11         def short_desc(self):
12             return 'Runs all of the spiders'
13  
14         def run(self, args, opts):
15             spider_list = self.crawler_process.spiders.list()
16             for name in spider_list:
17                 self.crawler_process.crawl(name, **opts.__dict__)
18             self.crawler_process.start()
19  
20 crawlall.py
相關文章
相關標籤/搜索