代碼很簡單,主要利用了requests進行網絡訪問,beautifulSoup進行頁面文本分析,re進行正則表達式抽取文字,前面兩個須要pip install name去安裝,後者是內部對象因此不用安裝。代碼以下,只有區區二十七行:html
#encoding=utf-8
from bs4 import BeautifulSoup import requests import re user_agent='Mozilla/4.0 (compatible;MEIE 5.5;windows NT)' headers={'User-Agent':user_agent} dic={}; #定義個字典對象,存月份和個數 for i in range(1,90): html=requests.get('='+str(i),headers=headers) soup= BeautifulSoup(html.text,'html.parser',from_encoding='utf-8'); for descDiv in soup.find_all(class_="postDesc2"): rawInfo=descDiv.text #獲得class="postDesc2"的div的內容 yearMonth=re.search(r'\d{4}-\d{2}',rawInfo).group() #用正則表達式去匹配年月並取其值 # 將年月存入字典,若是存在就在原基礎上加一 if yearMonth in dic: dic[yearMonth]=dic[yearMonth]+1 else: dic[yearMonth]=1 list=sorted(dic.items(),key=lambda x:x[1]) #將排序後的字典轉化爲數組 list.reverse() for item in list: print(item)
而獲得的結果以下:python
('2017-09', 80) ('2019-10', 66) ('2018-04', 56) ('2018-05', 45) ('2013-09', 43) ('2019-09', 42) ('2017-08', 38) ('2019-03', 37) ('2013-08', 32) ('2017-11', 32) ('2014-07', 26) ('2014-12', 22) ('2017-06', 21) ('2017-12', 21) ('2017-01', 20) ('2018-03', 19) ('2019-08', 18) ('2016-07', 17) ('2013-11', 15) ('2014-08', 15) ('2016-03', 15) ('2013-10', 14) ('2014-04', 14) ('2014-05', 14) ('2015-01', 14) ('2019-11', 13) ('2014-11', 12) ('2016-08', 12) ('2015-07', 10) ('2016-02', 9) ('2017-07', 9) ('2014-01', 8) ('2014-10', 7) ('2015-08', 7) ('2018-01', 7) ('2015-04', 6) ('2014-02', 5) ('2015-06', 5) ('2017-10', 5) ('2013-12', 4) ('2015-02', 4) ('2015-05', 4) ('2014-03', 3) ('2017-02', 3) ('2014-09', 2) ('2015-12', 2) ('2017-03', 2) ('2018-06', 2) ('2018-07', 2) ('2019-05', 2) ('2014-06', 1) ('2015-11', 1) ('2016-05', 1) ('2016-06', 1) ('2016-10', 1) ('2017-04', 1) ('2017-05', 1) ('2019-04', 1) ('2019-07', 1)
偶爾玩玩Python還挺有意思,這門技能可不能忘了。正則表達式
--END-- 2019年11月3日15:26:38windows
這是2020年1月31日的運行結果數組
C:\personal\programs\python>python 1.py C:\Users\ufo\AppData\Local\Programs\Python\Python38\lib\site-packages\bs4\__init__.py:203: UserWarning: You provided Unicode markup but also provided a value for from_encoding. Your from_encoding will be ignored. warnings.warn("You provided Unicode markup but also provided a value for from_encoding. Your from_encoding will be ignored.") ('2017-09', 79) ('2020-01', 79) ('2019-11', 76) ('2019-12', 66) ('2019-10', 65) ('2018-04', 55) ('2018-05', 45) ('2019-09', 42) ('2019-03', 37) ('2017-11', 32) ('2014-12', 22) ('2017-06', 21) ('2017-12', 21) ('2017-01', 20) ('2018-03', 19) ('2017-08', 18) ('2016-07', 17) ('2019-08', 17) ('2016-03', 15) ('2015-01', 14) ('2014-11', 12) ('2016-08', 12) ('2014-08', 10) ('2015-07', 10) ('2016-02', 9) ('2017-07', 9) ('2014-10', 7) ('2015-08', 7) ('2018-01', 7) ('2015-04', 6) ('2015-06', 5) ('2017-10', 5) ('2015-02', 4) ('2015-05', 4) ('2017-02', 3) ('2014-09', 2) ('2015-12', 2) ('2017-03', 2) ('2018-06', 2) ('2018-07', 2) ('2019-05', 2) ('2015-11', 1) ('2016-05', 1) ('2016-06', 1) ('2016-10', 1) ('2017-04', 1) ('2017-05', 1) ('2019-04', 1) ('2019-07', 1)