使用python(2.7)實現網頁截屏、查庫、發郵件的demo。用到了selenium、phantomjs、mailer、jinja二、mysqldb還有image,都是比較典型的用法,可複用性比較強,記錄分享一下。html
本demo是用於發週報郵件的,週報內容包括數據庫中的記錄以及網頁指定元素的截屏。linux中能夠用crontab每週定時發送。須要發相似週報的同窗這下輕鬆了!python
直接看代碼吧,用的python2.7,關於第三方模塊的安裝,都很簡單,這裏就不贅述了。mysql
其中相關數據庫參數、郵件參數、網址等真實數據都處理掉了,本身注意替換補全。linux
#!/usr/bin/python # -*-coding:utf-8 -*- # Author: lvs import MySQLdb.cursors import datetime from mailer import Mailer from mailer import Message from jinja2 import Environment, PackageLoader from selenium import webdriver from PIL import Image from time import sleep def fetch_results(): today = datetime.datetime.today() seven_day_ago = today - datetime.timedelta(days=7) today_str = today.strftime('%Y-%m-%d') seven_day_ago_str = seven_day_ago.strftime('%Y-%m-%d') db = MySQLdb.connect(host='127.0.0.1', port=3306, user='test', passwd='test', db='test', charset='utf8', cursorclass=MySQLdb.cursors.DictCursor) cursor = db.cursor() sql = "SELECT * FROM test.test WHERE start_time < '{today}' and start_time >= '{seven_day_ago}'".format( today=today_str, seven_day_ago=seven_day_ago_str) cursor.execute(sql) results = cursor.fetchall() db.close() return results def screen_shot(event_id): driver = webdriver.PhantomJS(executable_path='/usr/local/phantomjs-2.1.1-linux-x86_64/bin/phantomjs') driver.set_page_load_timeout(5) driver.set_window_size('1920', '1080') url = 'http://test.com/detail?id={}'.format(event_id) driver.get(url) sleep(3) img_path = '/home/lvs/image/event_{}.png'.format(event_id) driver.save_screenshot(img_path) element = driver.find_element_by_id('main') left = int(element.location['x']) top = int(element.location['y']) right = int(element.location['x'] + element.size['width']) bottom = int(element.location['y'] + element.size['height']) driver.quit() im = Image.open(img_path) im = im.crop((left, top, right, bottom)) im.save(img_path) def send_mail(results): env = Environment(loader=PackageLoader('jinja', 'templates')) template = env.get_template('mail.html') message = Message(From='test@123.com', To='test@123.com', charset='utf-8') message.Subject = '這是郵件主題' message.Html = template.render(results=results) for r in results: #指定cid參數將嵌入郵件html內容發送,不指定將做爲附件發送 message.attach('/home/lvs/image/event_{}.png'.format(r['id']), cid=r['id']) message.attach('/home/lvs/image/event_{}.png'.format(r['id'])) sender = Mailer('test.smtp.com') sender.send(message) if __name__ == '__main__': data = fetch_results() for row in data: screen_shot(row['id']) send_mail(data)
fetch_results()讀庫,返回結果,沒啥好說的。web
screen_shot(event_id)用於網頁截屏,event_id用於傳遞url參數。使用selenium+phantomjs實現,都是python爬蟲很典型的工具。注意其中使用Image截取DOM中id爲main的元素的操做。截取後保存到本地。sql
send_mail(results)天然是發郵件,利用了mailer和jinja2模板,其中env = Environment(loader=PackageLoader('jinja', 'templates'))這一句是jinja2加載模板的代碼,模板位於與此py腳本文件同目錄的jinja包下templates目錄下的mail.html中。能夠看下在mail中嵌入圖片和做爲附件發送的操做。數據庫
mail.html內容以下:python爬蟲
<!DOCTYPE html> <html> <head> <meta charset="utf-8"> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <style> .myimg img { max-width: 400px; max-height: 200px; } </style> </head> <body> <div> <div> <div> <p>最近一週事件記錄:</p> </div> <div> <table style="margin: 10px auto; border-collapse:collapse;" border="1" bordercolor="#a0c6e5"> <tr> <th>事件名稱</th> <th>事件類型</th> <th>開始時間</th> <th>結束時間</th> <th>事件地點</th> <th>事件描述</th> <th>事件詳情</th> </tr> {% for row in results %} <tr> <td>{{row["name"]}}</td> <td>{{row["type"]}}</td> <td>{{row["start_time"]}}</td> <td>{{row["end_time"]}}</td> <td>{{row["place"]}}</td> <td>{{row["description"]}}</td> <td class="myimg"><img src="cid:{{row['id']}}"></td> </tr> {% endfor %} </table> </div> </div> </div> </body> </html>
jinja變量row爲字典類型,對應數據庫一條記錄,索引都是表字段名,注意替換。python2.7
每行最後一列是來自網頁截屏的圖片,必定要注意此處在img標籤的src屬性中用cid引入,不然原始img標籤的引入方式是不生效的!工具
我的博客:www.hellolvs.cn