上一次作了全國疫情統計可視化圖表,此次嘗試着能不能實現數據庫裏的更新操做,首先考慮的就是python爬蟲,由於它易操做,而且python學習也是往後必須的。python
經過從網上查閱學習,代碼以下:mysql
import requests from bs4 import BeautifulSoup import re import pymysql import json def create(): db = pymysql.connect("localhost", "root", "0000", "grabdata_test",charset='utf8') # 鏈接數據庫 cursor = db.cursor() cursor.execute("DROP TABLE IF EXISTS info") sql = """CREATE TABLE info ( Id INT PRIMARY KEY AUTO_INCREMENT, Date varCHAR(255), Province varchar(255), City varchar(255), Confirmed_num varchar(255), Yisi_num varchar(255), Cured_num varchar(255), Dead_num varchar(255), Code varchar(255))""" cursor.execute(sql) db.close() def insert(value): db = pymysql.connect("localhost", "root", "0000", "grabdata_test",charset='utf8') cursor = db.cursor() sql = "INSERT INTO info(Date,Province,City,Confirmed_num,Yisi_num,Cured_num,Dead_num,Code) VALUES ( %s,%s,%s,%s,%s,%s,%s,%s)" try: cursor.execute(sql, value) db.commit() print('插入數據成功') except: db.rollback() print("插入數據失敗") db.close() create() # 建立表 url = 'https://raw.githubusercontent.com/BlankerL/DXY-2019-nCoV-Data/master/json/DXYArea.json' response = requests.get(url) # 將響應信息進行json格式化 versionInfo = response.text # print(versionInfo)#打印爬取到的數據 # print("------------------------")#重要數據分割線↓ #一個從文件加載,一個從內存加載#json.load(filename)#json.loads(string) jsonData = json.loads(versionInfo) #用於存儲數據的集合 dataSource = [] provinceShortNameList = [] confirmedCountList = [] curedCount = [] deadCountList = [] #遍歷對應的數據存入集合中 for k in range(len(jsonData['results'])): if(jsonData['results'][k]['countryName'] == '中國'): provinceShortName = jsonData['results'][k]['provinceName'] if("待明確地區" == provinceShortName): continue; for i in range(len(jsonData['results'][k]['cities'])): confirmnum=jsonData['results'][k]['cities'][i]['confirmedCount'] yisi_num=jsonData['results'][k]['cities'][i]['suspectedCount'] cured_num=jsonData['results'][k]['cities'][i]['curedCount'] dead_num=jsonData['results'][k]['cities'][i]['deadCount'] code=jsonData['results'][k]['cities'][i]['locationId'] cityname=jsonData['results'][k]['cities'][i]['cityName'] date='2020-3-10' insert((date,provinceShortName,cityname,confirmnum,yisi_num,cured_num,dead_num,code))
此次爬取的是https://raw.githubusercontent.com/BlankerL/DXY-2019-nCoV-Data/master/json/DXYArea.json網站上的疫情信息,pycharm運行結果:git
咱們再來看看數據庫裏的信息:github
咱們能夠看到數據庫裏已經成功導入數據了!sql
接着咱們嘗試讓他可視化,套用上一次的圖表,顯示結果以下:數據庫
至此,本次python爬蟲實踐算是成功了!老淚縱橫。。。json
在pycharm的使用過程當中遇到了諸多問題和bug,哭遼,把個人辛酸史寫在下一篇博客裏吧55555~~~python爬蟲