爬取疫情數據,以django+pyecharts實現數據可視化web網頁

在家呆着也是呆着,不如作點什麼消磨時間唄~javascript

試試用django+pyecharts實現疫情數據可視化web頁面css

這裏要爬疫情數據html

來自丁香園、搜狗及百度的疫情實時動態展現頁java

github上這個項目收到了一個star,莫大的鼓勵。由於爬蟲部分出問題。因而過來更新下文章。jquery

先看看勞動成果:git

導航欄:github

疫情地理熱力圖:web

治癒/死亡折線圖mongodb

輿論詞雲:npm

至於項目完整代碼我會上傳到github,有興趣能夠點左上角直達瞭解下~

 連接:https://github.com/dao233/Django

在一個壓縮包內,上傳太慢了只能壓縮了...

丁香園要爬的數據,這些數據用在那個地理熱力圖上:

丁香園疫情實時動態(超連接)

百度要爬的數據,歷史數據,用在治癒/死亡折線圖上:

百度疫情實時動態

 

還有這裏,用於獲取媒體的文章。製做詞雲~

搜狗

 emmm...

正文:

爬蟲:

 爬這些數據其實很簡單,須要的數據都在html源碼裏,直接用requests請求連接後用re匹配就行,並且這些網站甚至都不用僞造請求頭來訪問。。。

爬蟲代碼:

import requests import json import re import time from pymongo import MongoClient def insert_item(item, type_): ''' 插入數據到mongodb,item爲要插入的數據,type_用來選擇collection ''' databaseIp='127.0.0.1' databasePort=27017 client = MongoClient(databaseIp, databasePort) mongodbName = 'dingxiang' db = client[mongodbName] if type_ == 'dxy_map': # 更新插入
        db.dxy_map.update({'id': item['provinceName']}, {'$set': item}, upsert=True) elif type_ == 'dxy_count': # 直接插入
 db.dxy_count.insert_one(item) else: # 更新插入
        db.baidu_line.update({},{'$set': item}, upsert=True) print(item,'插入成功') client.close() def dxy_spider(): ''' 丁香園爬取,獲取各省份的確診數,用來作地理熱力圖 ''' url = 'https://ncov.dxy.cn/ncovh5/view/pneumonia' r = requests.get(url) r.encoding = 'utf-8' res = re.findall('tryTypeService1 =(.*?)}catch', r.text, re.S) if res: # 獲取數據的修改時間
        time_result = json.loads(res[0]) res = re.findall('getAreaStat =(.*?)}catch', r.text, re.S) if res: # 獲取省份確診人數數據
        all_result = json.loads(res[0]) count = re.findall('getStatisticsService =(.*?)}catch', r.text, re.S) if count: count_res = json.loads(count[0]) count_res['crawl_time'] = int(time.time()) if count_res.get('confirmedIncr') > 0: count_res['confirmedIncr'] = '+' + str(count_res['confirmedIncr']) if count_res.get('seriousIncr') > 0: count_res['seriousIncr'] = '+' + str(count_res['seriousIncr']) if count_res.get('curedIncr') > 0: count_res['curedIncr'] = '+' + str(count_res['curedIncr']) if count_res.get('deadIncr') > 0: count_res['deadIncr'] = '+' + str(count_res['deadIncr']) insert_item(count_res, 'dxy_count') for times in time_result: for item in all_result: if times['provinceName'] == item['provinceName']: # 由於省份確診人數的部分沒有時間,這裏將時間整合進去
                item['createTime'] = times['createTime'] item['modifyTime'] = times['modifyTime'] insert_item(item, 'dxy_map') def baidu_spider(): ''' 百度爬蟲,爬取歷史數據,用來畫折線圖 ''' url = 'https://voice.baidu.com/act/newpneumonia/newpneumonia' r = requests.get(url=url) res = re.findall('"degree":"3408"}],"trend":(.*?]}]})',r.text,re.S) data = json.loads(res[0]) insert_item(data,'baidu_line') if __name__ == '__main__': dxy_spider() baidu_spider()

 

 

詞雲的數據準備則麻煩一點,中文分詞但是個麻煩事...

因此選了個精度還不錯的pkuseg(pkuseg官方測試~)

代碼:

import requests import json import pkuseg from lxml import etree '''爬蟲部分,獲取相關文章內容,用來生成詞雲''' headers= { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36' } url = 'https://sa.sogou.com/new-weball/api/sgs/epi-protection/list?type=' type_ = ['jujia','chunyun','waichu','kexue'] def down_text(type_): r = requests.get(url=url+type_,headers=headers) res = json.loads(r.text) for i in res['list']: print(i['linkUrl']) r = requests.get(url = i['linkUrl'],headers=headers) html = etree.HTML(r.text) # 獲取文章全部文本
        div = html.xpath('//div[@class="word-box ui-article"]//text()') string = ''
        for i in div: string += i+'\n'
        # 保存文本到note.txt
        with open('note.txt','a',encoding='utf-8') as f: f.write(string) def down_all(): for i in type_: down_text(i) '''分詞統計部分,用pkuseg對下載的文本進行分詞並統計詞頻'''
def word_count(): with open('note.txt', 'r', encoding='utf-8') as f: text = f.read() # 自定義詞典,意味着分詞時會專門保留出這些詞
    user_dict = ['冠狀病毒'] # 以默認配置加載模型
    seg = pkuseg.pkuseg(user_dict=user_dict) # 進行分詞
    text = seg.cut(text) # 讀取停用詞表
    with open('stop_word.txt', 'r', encoding='utf-8') as f: s_word = f.readlines() # 停用詞表一個停用詞佔一行,由於這樣讀readlines()會帶上換行符在每一個詞後面
    # 使用map對列表全部詞去掉空字符
    s_word = list(map(lambda x: x.strip(), s_word)) count = {} # 統計詞頻
    for word in text: # 當這個詞不在停用詞表中而且長度不爲1才統計
        if word in s_word or len(word) == 1: continue
        else: if word in count: # 已經記錄過,加1
                count[word] += 1
            else: # 不然將該詞添加到字典中
                count[word] = 1 all_pair = [] # 將統計的字典轉換爲pyecharts詞雲要求的輸入
    # 好比這樣:words = [("Sam S Club", 10000),("Macys", 6181)],前面是詞,後面是詞頻
    for pair in count: all_pair.append((pair, count[pair])) # 對結果排序
    li = sorted(all_pair, key=lambda x: x[1], reverse=True) # 將列表轉str直接寫入文件中,到時直接給pyecharts用
    # 不要每次都分詞,分詞過程有點慢
    with open('word_count.txt','w',encoding='utf-8') as f: f.write(str(li)) if __name__ == '__main__': down_all() word_count()

 Django+pyecharts創建web應用

這裏先按pyecharts的文檔來建立一個先後端分離的django項目

https://pyecharts.org/#/zh-cn/web_django

這裏:

而後漸進修改,這裏給出views.py及html的代碼:

views.py

import json import time from django.http import HttpResponse from django.shortcuts import render from pymongo import MongoClient from pyecharts.charts import Line, Map, WordCloud from pyecharts import options as opts def get_data(type_): ''' 返回用於製做地理熱力圖的數據,省份名和省份確診數 ''' databaseIp='127.0.0.1' databasePort=27017
    # 鏈接mongodb
    client = MongoClient(databaseIp, databasePort) mongodbName = 'dingxiang' db = client[mongodbName] if type_ == 'map': collection = db.dxy_map elif type_ == 'dxy_count': collection = db.dxy_count elif type_ == 'line': collection = db.baidu_line alls = collection.find() return alls cure_data = get_data('line')[0] def timestamp_2_date(timestamp): ''' 用來將時間戳轉爲日期時間形式 ''' time_array = time.localtime(timestamp) my_time = time.strftime("%Y-%m-%d %H:%M", time_array) return my_time def json_response(data, code=200): ''' 用於返回json數據,主要是將圖表信息做爲json返回 ''' data = { "code": code, "msg": "success", "data": data, } json_str = json.dumps(data) response = HttpResponse( json_str, content_type="application/json", ) response["Access-Control-Allow-Origin"] = "*"
    return response JsonResponse = json_response def index(request): ''' 返回首頁數據 ''' alls = get_data('dxy_count').sort("crawl_time", -1).limit(1) if alls: alls = alls[0] alls['modifyTime'] /= 1000 alls['modifyTime'] = timestamp_2_date(alls['modifyTime']) return render(request, "index.html", alls) def heat_map(request): ''' 地理熱力圖,以json返回 ''' map_data = [] alls = get_data('map') for item in alls: # 將各省份名和確診數組合成新的列表,以符合pyecharts map的輸入
        map_data.append([item['provinceShortName'], item['confirmedCount']]) max_ = max([i[1] for i in map_data]) map1 = ( Map() # is_map_symbol_show去掉默認顯示的小紅點
        .add("疫情", map_data, "china", is_map_symbol_show=False) .set_global_opts( #不顯示legend
            legend_opts=opts.LegendOpts(is_show=False), title_opts=opts.TitleOpts(title="疫情地圖"), visualmap_opts=opts.VisualMapOpts( # 最大值
                max_=max_, # 顏色分段顯示
                is_piecewise=True, # 自定義數據段,不一樣段顯示不一樣的自定義的顏色
                pieces=[ {"min": 1001,  "label": ">1000", 'color':'#70161d'}, {"max": 1000, "min": 500,  "label": "500-1000", 'color':'#cb2a2f'}, {"max": 499, "min": 100, "label": "100-499", 'color':'#e55a4e'}, {"max": 99, "min": 10, "label": "10-99", 'color':'#f59e83'}, {"max": 9, "min": 1, "label": "1-9",'color':'#fdebcf'}, ] ), ) # 獲取全局 options,JSON 格式(JsCode 生成的函數帶引號,在先後端分離傳輸數據時使用)
 .dump_options_with_quotes() ) return JsonResponse(json.loads(map1)) def cure_line(request): ''' 治癒/死亡折線圖,以json返回 ''' line2 = ( Line() .add_xaxis(cure_data['updateDate']) .add_yaxis('治癒', cure_data['list'][2]['data'],color='#5d7092',linestyle_opts = opts.LineStyleOpts(width=2),is_smooth=True,label_opts=opts.LabelOpts(is_show=False)) .add_yaxis('死亡', cure_data['list'][3]['data'],color='#29b7a3',is_smooth=True,linestyle_opts = opts.LineStyleOpts(width=2),label_opts=opts.LabelOpts(is_show=False)) .set_global_opts( title_opts=opts.TitleOpts(title='治癒/死亡累計趨勢圖',pos_top='top'), # x軸字體偏移45度
        xaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(rotate=45)), yaxis_opts=opts.AxisOpts( type_="value", #is_smooth = True,
            # 顯示分割線
            splitline_opts=opts.SplitLineOpts(is_show=True), # 不顯示y軸的黑線
            axisline_opts=opts.AxisLineOpts(is_show=False), ), tooltip_opts=opts.TooltipOpts( # 啓用提示線,當鼠標焦點在圖上時會顯現
            is_show=True, trigger="axis", axis_pointer_type="cross", ), ) .dump_options_with_quotes() ) return JsonResponse(json.loads(line2)) def confirm_line(request): ''' 確診/疑似折線圖,以json返回 ''' line2 = ( Line() .add_xaxis(cure_data['updateDate']) .add_yaxis('確診', cure_data['list'][0]['data'],color='#f9b97c',linestyle_opts = opts.LineStyleOpts(width=2),is_smooth=True,label_opts=opts.LabelOpts(is_show=False)) .add_yaxis('疑似', cure_data['list'][1]['data'],color='#ae212c',linestyle_opts = opts.LineStyleOpts(width=2),is_smooth=True,label_opts=opts.LabelOpts(is_show=False)) .set_global_opts( title_opts=opts.TitleOpts(title='確診/疑似累計趨勢圖',pos_top='top'), xaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(rotate=45)), yaxis_opts=opts.AxisOpts( type_="value", splitline_opts=opts.SplitLineOpts(is_show=True), axisline_opts=opts.AxisLineOpts(is_show=False), ), tooltip_opts=opts.TooltipOpts( is_show=True, trigger="axis", axis_pointer_type="cross", ), ) .dump_options_with_quotes() ) return JsonResponse(json.loads(line2)) def word_cloud(request): with open('demo/data/word_count.txt','r',encoding='utf-8') as f: li = eval(f.read()) c = ( WordCloud() .add("", li[:151], word_size_range=[20, 100], shape="circle") .set_global_opts(title_opts=opts.TitleOpts(title="輿論詞雲")) .dump_options_with_quotes() ) return JsonResponse(json.loads(c))

 

index.html

<!DOCTYPE html>
<html lang="zh-CN">
  <head>
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <!-- 上述3個meta標籤*必須*放在最前面,任何其餘內容都*必須*跟隨其後! -->
    <title>實時動態</title>
    <script type="text/javascript" src="/static/echarts.min.js"></script>

    <script type="text/javascript" src="/static/echarts-wordcloud.min.js"></script>
    <script type="text/javascript" src="/static/maps/china.js"></script>
    <script src="https://cdn.bootcss.com/jquery/3.0.0/jquery.min.js"></script>
    <!-- Bootstrap -->
    <script src="https://cdn.jsdelivr.net/npm/bootstrap@3.3.7/dist/js/bootstrap.min.js"></script>
    <link href="https://cdn.jsdelivr.net/npm/bootstrap@3.3.7/dist/css/bootstrap.min.css" rel="stylesheet">
    <link href="/static/css/grid.css" rel="stylesheet">
  </head>
  <body>

  <img src="/static/imgs/timg.jpg" alt="" style="width: 100%;height: 450px">
  <span style="color: #666;margin-left: 25rem;">截至 {{ timestamp }} 全國數據統計</span>
    <div class="container-fluid ">
      <div class="row">
        <div class="col-md-2 col-md-offset-2" style="border-left: none;">
            <b>較昨日<em style="color: rgb(247, 76, 49);">+{{ yesterdayIncreased.diagnosed }}</em></b>
            <strong style="color: rgb(247, 76, 49);">{{ diagnosed }}</strong>
            <span>累計確診</span>
        </div>
        <div class="col-md-2">
            <b>較昨日<em style="color: rgb(247, 130, 7);">+{{ yesterdayIncreased.suspect }}</em></b>
            <strong style="color: rgb(247, 130, 7);">{{ suspect }}</strong>
            <span>現有疑似</span>
        </div>
        <div class="col-md-2" style="border-right: none;">
            <b>較昨日<em style="color: rgb(40, 183, 163);">+{{ yesterdayIncreased.cured }}</em></b>
            <strong style="color: rgb(40, 183, 163);">{{ cured }}</strong>
            <span>累計治癒</span>
        </div>
        <div class="col-md-2">
            <b>較昨日<em style="color: rgb(93, 112, 146);">+{{ yesterdayIncreased.death }}</em></b>
            <strong style="color: rgb(93, 112, 146);">{{ death }}</strong>
            <span>累計死亡</span>
        </div>

      </div>
    </div>

     <ul>
        <li>病毒:SARS-CoV-2,其致使疾病命名 COVID-19</li>
        <li>傳染源:新冠肺炎的患者。無症狀感染者也可能成爲傳染源</li>
        <li>傳播途徑:經呼吸道飛沫、接觸傳播是主要的傳播途徑。氣溶膠傳播和消化道等傳播途徑尚待明確</li>
        <li>易感人羣:人羣廣泛易感。老年人及有基礎疾病者感染後病情較重,兒童及嬰幼兒也有發病</li>
        <li>潛伏期:通常爲 3~7 天,最長不超過 14 天,潛伏期內可能存在傳染性,其中無症狀病例傳染性很是罕見</li>
        <li>宿主:野生動物,可能爲中華菊頭蝠</li>
    </ul>
    <div id="map" style="width:1000px; height:500px;margin:0 auto;margin-bottom: 2rem;"></div>
    <div id="confirm_line" style="width:1000px; height:500px;margin:0 auto;"></div>
    <div id="cure_line" style="width:1000px; height:500px;margin:0 auto;margin-bottom: 2rem;"></div>
    <div id="word_cloud" style="width:1000px; height:500px;margin:0 auto;margin-bottom: 2rem;"></div>
  <script type="text/javascript" src="/static/chart.js"></script>
  </body>
</html>

而後還會用到js來生成圖表。這裏就不貼js代碼了。

先運行spider.py插入數據到mongodb

 

而後 運行django:

而後訪問

http://127.0.0.1:8000/demo/index/

END

相關文章
相關標籤/搜索