博客訪問人數統計

很早以前就寫了這個代碼,今天從新更新一下html

發現對千分位數字的匹配有些buggit

另外對來自不一樣地區的數據沒進行爬取。github

爬取數據:http://s04.flagcounter.com/more7/XTPq/數組

處理邏輯:url

1. 爬取數據spa

2. 構造數組:日期,博客訪問量,flag訪問量code

3. 保存數據到文件cdn

4. 保存pickle文件htm

5. 生成訪問折線圖blog

爬取代碼以下所示,詳細代碼見開源訪問:https://github.com/zpfbuaa/blogVisitors

# -*- coding: utf-8 -*-
# @Time    : 2018/5/25 下午1:15
# @Author  : 伊甸一點
# @FileName: getHtml.py
# @Software: PyCharm
# @Blog    : http://zpfbuaa.github.io

import requests
import re
import time
import os


date_pt = re.compile('<font face=arial size=-1>(\w+ \d+, \d+)')
visitors_pt = re.compile('<font face=arial size=2>(\w+)</td><td>')
flagViews_pt = re.compile('<font face=arial size=2>(\S+)</font></td></tr>')

def getTotalBlog(url, pages):

    date = []
    visitors = []
    flagViews = []

    for page in range(1, pages+1):
        newUrl = url + str(page)
        print(newUrl)

        html = requests.get(newUrl).text
        item_date = date_pt.findall(html)
        item_visitors = visitors_pt.findall(html)
        item_flagViews = flagViews_pt.findall(html)

        date.extend(item_date)
        visitors.extend(item_visitors)
        flagViews.extend(item_flagViews)


    return date, visitors, flagViews

def change_data(date, visitors, flagViews):
    print(len(visitors))
    print(len(flagViews))
    for i in range(0, len(date)):
        str_visitor = str(visitors[i])
        str_flagViews = str(flagViews[i])
        if (str_visitor.find(',') != -1):
            v_split = str_visitor.split(',')
            visitors[i] = int(v_split[0]) * 1000 + int(v_split[1])
        else:
            visitors[i] = int(str_visitor)

        if (str_flagViews.find(',') != -1):
            f_split = str_flagViews.split(',')
            flagViews[i] = int(f_split[0]) * 1000 + int(f_split[1])
        else:
            flagViews[i] = int(str_flagViews)

    return date, visitors, flagViews

def printData(date, visitors, flagViews):
    print('Date    Visitors    Flag Counter Views')
    for i in range(0, len(date)):
        print(date[i],visitors[i],flagViews[i])

def writeToFile(date, visitors, flagViews, data_root='data/'):

    today = time.strftime('%Y%m%d', time.localtime(time.time()))
    data_file = data_root+'blog_'+str(today)

    f = open(data_file,'w+')
    header = 'Date\tVisitors\tFlag Counter Views'+'\n'
    f.write(header)

    for i in range(0, len(date)):
        line = date[i]+'\t'+str(visitors[i])+'\t'+str(flagViews[i])+'\n'
        f.write(line)
    f.close()
    return 1


url = 'http://s04.flagcounter.com/more7/XTPq/'
pages = 23
date, visitors, flagViews = getTotalBlog(url, pages)

# printData(date, visitors, flagViews)

date, visitors, flagViews = change_data(date, visitors, flagViews)

# printData(date, visitors, flagViews)

flag = writeToFile(date, visitors, flagViews)

print('Data Prepare Done!')

 如下爲截止到當前2019年01月12日的訪問量折線圖

訪問量折線圖

訪問入口flag統計圖

flag訪問量

二者diff差值

訪問量差值折線圖

相關文章
相關標籤/搜索