【Python】分析本身的博客 https://www.cnblogs.com/xiandedanteng/p/?page=XX,看每月發帖量是多少

要執行下面程序,須要安裝Beautiful Soup和requests,具體安裝方法請見:http://www.javashuo.com/article/p-aeoxzaxa-c.htmlhtml

# 分析本身的博客 https://www.cnblogs.com/xiandedanteng/p/?page=XX,看每月發帖量是多少
from bs4 import BeautifulSoup
import requests
import re

user_agent='Mozilla/4.0 (compatible;MEIE 5.5;windows NT)'
headers={'User-Agent':user_agent}

dic={}; #定義個字典對象,存月份和個數

#把2013年8月以來的每月都放進去
for i in range(8,13):
     yearMonth="2013-"+"{:0>2d}".format(i)
     dic[yearMonth]=0

for i in range(1,13):
     yearMonth="2014-"+"{:0>2d}".format(i)
     dic[yearMonth]=0

for i in range(1,13):
     yearMonth="2015-"+"{:0>2d}".format(i)
     dic[yearMonth]=0

for i in range(1,13):
     yearMonth="2016-"+"{:0>2d}".format(i)
     dic[yearMonth]=0

for i in range(1,13):
     yearMonth="2017-"+"{:0>2d}".format(i)
     dic[yearMonth]=0

for i in range(1,13):
     yearMonth="2018-"+"{:0>2d}".format(i)
     dic[yearMonth]=0

for i in range(1,12):
     yearMonth="2019-"+"{:0>2d}".format(i)
     dic[yearMonth]=0

for i in range(1,90):
    html=requests.get('http://www.cnblogs.com/xiandedanteng/p/?page='+str(i),headers=headers)
    soup= BeautifulSoup(html.text,'html.parser',from_encoding='utf-8');

    for descDiv in soup.find_all(class_="postDesc2"):
         rawInfo=descDiv.text #獲得class="postDesc2"的div的內容
         yearMonth=re.search(r'\d{4}-\d{2}',rawInfo).group() #用正則表達式去匹配年月並取其值

         # 將年月存入字典,若是存在就在原基礎上加一         
         if yearMonth in dic:
               dic[yearMonth]=dic[yearMonth]+1
         else:
               dic[yearMonth]=1

# 打印字典,須要再放開
for item in dic.items():
    print(item)

 獲得的結果是:正則表達式

('2013-08', 28)
('2013-09', 43)
('2013-10', 14)
('2013-11', 15)
('2013-12', 4)
('2014-01', 8)
('2014-02', 5)
('2014-03', 3)
('2014-04', 14)
('2014-05', 14)
('2014-06', 1)
('2014-07', 26)
('2014-08', 15)
('2014-09', 2)
('2014-10', 7)
('2014-11', 12)
('2014-12', 22)
('2015-01', 14)
('2015-02', 4)
('2015-03', 0)
('2015-04', 6)
('2015-05', 4)
('2015-06', 5)
('2015-07', 10)
('2015-08', 7)
('2015-09', 0)
('2015-10', 0)
('2015-11', 1)
('2015-12', 2)
('2016-01', 0)
('2016-02', 9)
('2016-03', 15)
('2016-04', 0)
('2016-05', 1)
('2016-06', 1)
('2016-07', 17)
('2016-08', 12)
('2016-09', 0)
('2016-10', 1)
('2016-11', 0)
('2016-12', 0)
('2017-01', 20)
('2017-02', 3)
('2017-03', 2)
('2017-04', 1)
('2017-05', 1)
('2017-06', 21)
('2017-07', 9)
('2017-08', 38)
('2017-09', 80)
('2017-10', 5)
('2017-11', 32)
('2017-12', 21)
('2018-01', 7)
('2018-02', 0)
('2018-03', 19)
('2018-04', 56)
('2018-05', 45)
('2018-06', 2)
('2018-07', 2)
('2018-08', 0)
('2018-09', 0)
('2018-10', 0)
('2018-11', 0)
('2018-12', 0)
('2019-01', 0)
('2019-02', 0)
('2019-03', 37)
('2019-04', 1)
('2019-05', 2)
('2019-06', 0)
('2019-07', 1)
('2019-08', 18)
('2019-09', 42)
('2019-10', 66)
('2019-11', 17)

把這個文本拷貝到Notepad++裏面,將括號替換掉,而後另存爲csv文件。再用Excel打開文件生成圖表以下:windows

 工程下載:https://files.cnblogs.com/files/xiandedanteng/6.everyMonthMyblog20191104.rarpost

--END-- 2019年11月4日09:06:52spa

相關文章
相關標籤/搜索