在ubuntu環境下,使用scrapy定時執行抓取任務,因爲scrapy自己沒有提供定時執行的功能,因此採用了crontab的方式進行定時執行:ubuntu
首先編寫要執行的命令腳本cron.shless
#! /bin/sh export PATH=$PATH:/usr/local/bin cd /home/zhangchao/CVS/testCron nohup scrapy crawl example >> example.log 2>&1 &
執行,crontab -e,規定crontab要執行的命令和要執行的時間頻率,這裏我須要每一分鐘就執行scrapy crawl example這條爬取命令:dom
# Edit this file to introduce tasks to be run by cron. # # Each task to run has to be defined through a single line # indicating with different fields when the task will be run # and what command to run for the task # # To define the time you can provide concrete values for # minute (m), hour (h), day of month (dom), month (mon), # and day of week (dow) or use '*' in these fields (for 'any').# # Notice that tasks will be started based on the cron's system # daemon's notion of time and timezones. # # Output of the crontab jobs (including errors) is sent through # email to the user the crontab file belongs to (unless redirected). # # For example, you can run a backup of all your user accounts # at 5 a.m every week with: # 0 5 * * 1 tar -zcf /var/backups/home.tgz /home/ # # For more information see the manual pages of crontab(5) and cron(8) # # m h dom mon dow command */1 * * * * sh /home/zhangchao/CVS/testCron/cron.sh
編輯好了後,發現ubuntu的/var/log/下面沒有crontab的日誌,緣由是由於ubuntu默認沒有開啓crontab的日誌功能,執行以下操做:scrapy
emacs /etc/rsyslog.d/50-default.conf ,將cron.*這一行前的註釋打開:ide
而後重啓sudo service rsyslog restart this
最後就能夠使用tail –f /var/log/cron.log查看crontab的日誌了,能夠看到cron.sh每一分鐘被執行了一次:spa
藉此機會複習下,crontab的常見格式:3d
每分鐘執行 */1 * * * * rest
每小時執行 0 * * * * 日誌
天天執行 0 0 * * *
每週執行 0 0 * * 0
每個月執行 0 0 1 * *
每一年執行 0 0 1 1 *