要執行下面程序,須要安裝Beautiful Soup和requests,具體安裝方法請見:http://www.javashuo.com/article/p-aeoxzaxa-c.htmlhtml
# 分析本身的博客 https://www.cnblogs.com/xiandedanteng/p/?page=XX,看每月發帖量是多少 from bs4 import BeautifulSoup import requests import re user_agent='Mozilla/4.0 (compatible;MEIE 5.5;windows NT)' headers={'User-Agent':user_agent} dic={}; #定義個字典對象,存月份和個數 #把2013年8月以來的每月都放進去 for i in range(8,13): yearMonth="2013-"+"{:0>2d}".format(i) dic[yearMonth]=0 for i in range(1,13): yearMonth="2014-"+"{:0>2d}".format(i) dic[yearMonth]=0 for i in range(1,13): yearMonth="2015-"+"{:0>2d}".format(i) dic[yearMonth]=0 for i in range(1,13): yearMonth="2016-"+"{:0>2d}".format(i) dic[yearMonth]=0 for i in range(1,13): yearMonth="2017-"+"{:0>2d}".format(i) dic[yearMonth]=0 for i in range(1,13): yearMonth="2018-"+"{:0>2d}".format(i) dic[yearMonth]=0 for i in range(1,12): yearMonth="2019-"+"{:0>2d}".format(i) dic[yearMonth]=0 for i in range(1,90): html=requests.get('http://www.cnblogs.com/xiandedanteng/p/?page='+str(i),headers=headers) soup= BeautifulSoup(html.text,'html.parser',from_encoding='utf-8'); for descDiv in soup.find_all(class_="postDesc2"): rawInfo=descDiv.text #獲得class="postDesc2"的div的內容 yearMonth=re.search(r'\d{4}-\d{2}',rawInfo).group() #用正則表達式去匹配年月並取其值 # 將年月存入字典,若是存在就在原基礎上加一 if yearMonth in dic: dic[yearMonth]=dic[yearMonth]+1 else: dic[yearMonth]=1 # 打印字典,須要再放開 for item in dic.items(): print(item)
獲得的結果是:正則表達式
('2013-08', 28) ('2013-09', 43) ('2013-10', 14) ('2013-11', 15) ('2013-12', 4) ('2014-01', 8) ('2014-02', 5) ('2014-03', 3) ('2014-04', 14) ('2014-05', 14) ('2014-06', 1) ('2014-07', 26) ('2014-08', 15) ('2014-09', 2) ('2014-10', 7) ('2014-11', 12) ('2014-12', 22) ('2015-01', 14) ('2015-02', 4) ('2015-03', 0) ('2015-04', 6) ('2015-05', 4) ('2015-06', 5) ('2015-07', 10) ('2015-08', 7) ('2015-09', 0) ('2015-10', 0) ('2015-11', 1) ('2015-12', 2) ('2016-01', 0) ('2016-02', 9) ('2016-03', 15) ('2016-04', 0) ('2016-05', 1) ('2016-06', 1) ('2016-07', 17) ('2016-08', 12) ('2016-09', 0) ('2016-10', 1) ('2016-11', 0) ('2016-12', 0) ('2017-01', 20) ('2017-02', 3) ('2017-03', 2) ('2017-04', 1) ('2017-05', 1) ('2017-06', 21) ('2017-07', 9) ('2017-08', 38) ('2017-09', 80) ('2017-10', 5) ('2017-11', 32) ('2017-12', 21) ('2018-01', 7) ('2018-02', 0) ('2018-03', 19) ('2018-04', 56) ('2018-05', 45) ('2018-06', 2) ('2018-07', 2) ('2018-08', 0) ('2018-09', 0) ('2018-10', 0) ('2018-11', 0) ('2018-12', 0) ('2019-01', 0) ('2019-02', 0) ('2019-03', 37) ('2019-04', 1) ('2019-05', 2) ('2019-06', 0) ('2019-07', 1) ('2019-08', 18) ('2019-09', 42) ('2019-10', 66) ('2019-11', 17)
把這個文本拷貝到Notepad++裏面,將括號替換掉,而後另存爲csv文件。再用Excel打開文件生成圖表以下:windows
工程下載:https://files.cnblogs.com/files/xiandedanteng/6.everyMonthMyblog20191104.rarpost
--END-- 2019年11月4日09:06:52spa