應用場景:在爬蟲關閉或者爬蟲空閒時能夠經過發送郵件的提醒。html
經過twisted的非阻塞IO實現,能夠直接寫在spider中,也能夠寫在中間件或者擴展中,看你具體的需求。python
在網上找了不少教程,都是不少年前的或者就是官網搬運的,一點實際的代碼都沒有,因此就本身嘗試了一下,因爲本人也是爬蟲新手,輕噴,輕噴!git
看下面的示例代碼前,先看下官網,熟悉基本的屬性。github
官網地址sending e-mail:
<https://docs.scrapy.org/en/latest/topics/email.html?highlight=MailSender>
服務器
首先在settings
同級的目錄下建立extendions
(擴展)文件夾,scrapy
代碼以下:ide
import logging from scrapy import signals from scrapy.exceptions import NotConfigured from scrapy.mail import MailSender logger = logging.getLogger(__name__) class SendEmail(object): def __init__(self,sender,crawler): self.sender = sender crawler.signals.connect(self.spider_idle, signal=signals.spider_idle) crawler.signals.connect(self.spider_closed, signal=signals.spider_closed) @classmethod def from_crawler(cls,crawler): if not crawler.settings.getbool('MYEXT_ENABLED'): raise NotConfigured mail_host = crawler.settings.get('MAIL_HOST') # 發送郵件的服務器 mail_port = crawler.settings.get('MAIL_PORT') # 郵件發送者 mail_user = crawler.settings.get('MAIL_USER') # 郵件發送者 mail_pass = crawler.settings.get('MAIL_PASS') # 發送郵箱的密碼不是你註冊時的密碼,而是受權碼!!!切記! sender = MailSender(mail_host,mail_user,mail_user,mail_pass,mail_port) #因爲這裏郵件的發送者和郵件帳戶是同一個就都寫了mail_user了 h = cls(sender,crawler) return h def spider_idle(self,spider): logger.info('idle spider %s' % spider.name) def spider_closed(self, spider): logger.info("closed spider %s", spider.name) body = 'spider[%s] is closed' %spider.name subject = '[%s] good!!!' %spider.name # self.sender.send(to={'zfeijun@foxmail.com'}, subject=subject, body=body) return self.sender.send(to={'zfeijun@foxmail.com'}, subject=subject, body=body)
這裏爲何是
return self.sender.send
,是由於直接用sender.send
會報builtins.AttributeError: 'NoneType' object has no attribute 'bio_read'
的錯誤(郵件會發送成功),具體緣由不是很懂,有大牛知道的能夠指導一下。ui解決方法參考:
<https://github.com/scrapy/scrapy/issues/3478>
code在
sender.send
前加return
就行了。htm
在擴展中寫好代碼後,須要在settings
中啓用
EXTENSIONS = { # 'scrapy.extensions.telnet.TelnetConsole': 300, 'bukalapak.extendions.sendmail.SendEmail': 300, } MYEXT_ENABLED = True
轉載請註明出處!