最近看到requests 做者 kennethreitz 出了一個新庫 requests-html,拿來練練手。該庫旨在儘量簡單直觀地解析html(例如,抓取網頁)。html
官方文檔 http://html.python-requests.org/python
來抓抓網易11選5的彩票的數據。 首先咱們打開網站,打開開發者工具找到對應的html。session
session = HTMLSession()
def getData():
response = session.get('http://caipiao.163.com/award/11xuan5/')
content = response.html.find('section.main', first=True)
body = content.find('tbody')
itemDicts = dict()
for tr in body:
list = tr.find('td.start')
for td in list:
try:
period = td.attrs['data-period']
award = td.attrs['data-award']
print("序號:" + td.text + " 期號:" + period + " 開獎號碼:" + award)
itemDicts[period] = award
except KeyError as e:
print('except: ', e)
finally:
print('finally')
複製代碼
由於還有沒有開出來的開獎號碼 咱們就try...except了。咱們發現網頁是表格的,咱們須要定期號排列。工具
sortItemDict = sorted(itemDicts.keys(), reverse=False)
# print(sortItemDict)
for key in sortItemDict:
print("期號:", key, " 開獎號碼:", itemDicts[key])
複製代碼
最後結果: 網站
from requests_html import HTMLSession
import requests
session = HTMLSession()
def getData():
response = session.get('http://caipiao.163.com/award/11xuan5/')
content = response.html.find('section.main', first=True)
body = content.find('tbody')
itemDicts = dict()
for tr in body:
list = tr.find('td.start')
for td in list:
try:
period = td.attrs['data-period']
award = td.attrs['data-award']
print("序號:" + td.text + " 期號:" + period + " 開獎號碼:" + award)
itemDicts[period] = award
except KeyError as e:
print('except: ', e)
finally:
print('finally')
sortItemDict = sorted(itemDicts.keys(), reverse=False)
# print(sortItemDict)
for key in sortItemDict:
print("期號:", key, " 開獎號碼:", itemDicts[key])
if __name__ == '__main__':
getData()
複製代碼