一般開發好的Scrapy爬蟲部署到服務器上,要不使用nohup命令,要不使用scrapyd。若是使用nohup命令的話,爬蟲掛掉了,你可能還不知道,你還得上服務器上查或者作額外的郵件通知操做。若是使用scrapyd,就是部署爬蟲的時候有點複雜,功能少了點,其餘還好。python
SpiderKeeper是一款管理爬蟲的軟件,和scrapinghub的部署功能差很少,能多臺服務器部署爬蟲,定時執行爬蟲,查看爬蟲日誌,查看爬蟲執行狀況等功能。
項目地址:https://github.com/DormyMo/SpiderKeepergit
一、supervisor pip install supervisor
二、scrapyd pip3 install scrapyd
三、SpiderKeeperpip3 install SpiderKeeper
github
一、新建scrapyd的配置文件:web
[scrapyd] eggs_dir = eggs logs_dir = logs items_dir = jobs_to_keep = 5 dbs_dir = dbs max_proc = 0 max_proc_per_cpu = 4 finished_to_keep = 100 poll_interval = 5.0 bind_address = 0.0.0.0 http_port = 6800 debug = off runner = scrapyd.runner application = scrapyd.app.application launcher = scrapyd.launcher.Launcher webroot = scrapyd.website.Root [services] schedule.json = scrapyd.webservice.Schedule cancel.json = scrapyd.webservice.Cancel addversion.json = scrapyd.webservice.AddVersion listprojects.json = scrapyd.webservice.ListProjects listversions.json = scrapyd.webservice.ListVersions listspiders.json = scrapyd.webservice.ListSpiders delproject.json = scrapyd.webservice.DeleteProject delversion.json = scrapyd.webservice.DeleteVersion listjobs.json = scrapyd.webservice.ListJobs daemonstatus.json = scrapyd.webservice.DaemonStatus
一、建立配置的文件夾和配置文件json
mkdir /etc/supervisor echo_supervisord_conf > /etc/supervisor/supervisord.conf
二、編輯配置文件vim /etc/supervisor/supervisord.conf
vim
;[include] ;files = relative/directory/*.ini
改成api
[include] files = conf.d/*.conf
三、新建conf.d文件夾mkdir /etc/supervisor/conf.d
四、添加scrapyd的配置文件vim /etc/supervisor/conf.d/scrapyd.conf
服務器
[program:scrapyd] command=/usr/local/python3.5/bin/scrapyd directory=/opt/SpiderKeeper user=root stderr_logfile=/var/log/scrapyd.err.log stdout_logfile=/var/log/scrapyd.out.log
五、添加spiderkeeper的配置文件vim /etc/supervisor/conf.d/spiderkeeper.conf
app
[program:spiderkeeper] command=spiderkeeper --server=http://localhost:6800 directory=/opt/SpiderKeeper user=root stderr_logfile=/var/log/spiderkeeper.err.log stdout_logfile=/var/log/spiderkeeper.out.log
六、啓動supervisor,supervisord -c /etc/supervisor/supervisord.conf
scrapy
一、登陸http://localhost:5000
二、新建project
三、打包爬蟲文件
pip3 install scrapyd-client
scrapyd-deploy --build-egg output.egg
四、上傳打包好的爬蟲egg文件
SpiderKeeper能夠識別多臺服務器的scrapyd,具體多加--server就好。