從0開始疫情3D地球 - 3D疫情地球VDEarth - 5 - 疫情數據爬蟲

接前幾章,已經實現了前端3D地球的展現,本章開始完成一個疫情數據的爬蟲html

爬取數據使用python爬蟲實現前端

1 運行環境

1 python   3.7python

2 scrapy  2.0.1mysql

3 selenium  git

4 chromedirver 選擇適合本身瀏覽器的 http://npm.taobao.org/mirrors/chromedriver/github

5 phantomjs web

6 mysql / mongodb   pymsql / pymongodbsql

2 開始

疫情數據最及時的是從國家衛健委的網站獲取,本例只是做爲demo聯繫,這裏使用qq的疫情地圖頁面進行數據爬取,參考連接mongodb

https://news.qq.com/zt2020/page/feiyan.htm#/global?ct=United%20States&nojump=1chrome

安裝第一節的環境和包

2.1 建立爬蟲項目

使用scrapy命令建立爬蟲項目,會建立一個爬蟲的項目文件夾,會有一個爬蟲項目的模板,執行命令

F:
cd F:\mygithub\VDDataServer
scrapy startproject COVID19

2.2 建立spider

建立一個爬蟲spider文件

cd COVID19
scrapy genspider Covid19Spider news.qq.com

一個爬蟲的項目就完成了,項目結構以下

3 分析爬取頁面

打開https://news.qq.com/zt2020/page/feiyan.htm#/global?ct=United%20States&nojump=1疫情地圖頁面,能夠看到有國內和海外的疫情數據,這裏咱們只爬取疫情的列表數據

海外疫情,數據列表在id爲foreignWraper的元素下

 

 

 國內疫情,數據列表在id爲foreignWraper的元素下

 

 

 數據都是存在html內的,這裏咱們用scrapy + selenium + phantom的方式進行爬取

4 定義數據

經過頁面能夠看到數據項共有6項

 

 

 打開爬蟲項目,找到以前scrapy生成項目模板的items.py文件,定義一個item類,依次是集合,表名,地區名稱,父地區名稱,新增數,現有數,累計數,治癒數,死亡數

class Covid19Item(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    collection = table = 'covid19'
    name = scrapy.Field()
    parent = scrapy.Field()
    new = scrapy.Field()
    now = scrapy.Field()
    total = scrapy.Field()
    cure = scrapy.Field()
    death = scrapy.Field()

5 Spider實現

切換到spider文件夾下的Covid19Spider.py

修改 start_requests方法,使用meta頭傳遞請求的頁面標識

    def start_requests(self):
        # 定義要爬取的頁面列表
        urls = ["https://news.qq.com/zt2020/page/feiyan.htm#/?ct=United%20States&nojump=1",
                "https://news.qq.com/zt2020/page/feiyan.htm#/global?ct=United%20States&nojump=1"]
        # 循環發送請求
        for i in range(len(urls)):
            if i == 0:
                # 執行中國疫情頁面的parer
                yield scrapy.Request(urls[i], callback=self.parse_China, meta={'page': i}, dont_filter=True)
            else:
                 # 執行海外疫情頁面的parer
                yield scrapy.Request(urls[i], callback=self.parse_Outsee, meta={'page': i}, dont_filter=True)

新增國內parser方法

# 疫情 中國
    def parse_China(self, response):

        provinces = response.xpath(
            '//*[@id="listWraper"]/table[2]/tbody').extract()

        for prn in provinces:
            item = Covid19Item()
            prnNode = Selector(text=prn)
            item['name'] = prnNode.xpath(
                '//tr[1]/th/p[1]/span//text()').extract_first().replace('', '')
            item['parent'] = ''
            item['new'] = prnNode.xpath(
                '//tr[1]/td[2]/p[2]//text()').extract_first()
            item['now'] = prnNode.xpath(
                '//tr[1]/td[1]/p[1]//text()').extract_first()
            item['total'] = prnNode.xpath(
                '//tr[1]/td[2]/p[1]//text()').extract_first()
            item['cure'] = prnNode.xpath(
                '//tr[1]/td[3]/p[1]//text()').extract_first()
            item['death'] = prnNode.xpath(
                '//tr[1]/td[4]/p[1]//text()').extract_first()

            cityNodes = prnNode.xpath('//*[@class="city"]').extract()
            for city in cityNodes:
                cityItem = Covid19Item()
                cityNode = Selector(text=city)
                cityItem['name'] = cityNode.xpath(
                    '//th/span//text()').extract_first().replace('', '')
                cityItem['parent'] = item['name']
                cityItem['new'] = ''
                cityItem['now'] = cityNode.xpath(
                    '//td[1]//text()').extract_first()
                cityItem['total'] = cityNode.xpath(
                    '//td[2]//text()').extract_first()
                cityItem['cure'] = cityNode.xpath(
                    '//td[3]//text()').extract_first()
                cityItem['death'] = cityNode.xpath(
                    '//td[4]//text()').extract_first()
                yield cityItem

            yield item

新增國外parser方法

 # 海外
    def parse_Outsee(self, response):
        countries = response.xpath(
            '//*[@id="foreignWraper"]/table/tbody').extract()
        for country in countries:
            countryNode = Selector(text=country)
            item = Covid19Item()
            item['name'] = countryNode.xpath(
                '//tr/th/span//text()').extract_first()
            item['parent'] = ''
            item['new'] = countryNode.xpath(
                '//tr/td[1]//text()').extract_first()
            item['now'] = ''
            item['total'] = countryNode.xpath(
                '//tr/td[2]//text()').extract_first()
            item['cure'] = countryNode.xpath(
                '//tr/td[3]//text()').extract_first()
            item['death'] = countryNode.xpath(
                '//tr/td[4]//text()').extract_first()
            yield item

6 downloader中間件

使用downloader中間件進行頁面的請求,在中間件中實現selenium + phantomjs的頁面請求,並將htmlresponse返回給spider處理

修改middlewares.py文件,新增SeleniumMiddelware類,根據以前設定的meta頭決定頁面的顯示等待元素

from scrapy import signals
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from scrapy.http import HtmlResponse
from logging import getLogger
from time import sleep

class SeleniumMiddelware():
    def __init__(self,timeout=None,service_args=[]):
        self.logger  = getLogger(__name__)
        self.timeout = timeout
        self.browser = webdriver.PhantomJS(service_args=service_args)
        self.browser.set_window_size(1400,700)
        self.browser.set_page_load_timeout(self.timeout)
        self.wait = WebDriverWait(self.browser,self.timeout)
    
    def __del__(self):
        self.browser.close()

    def process_request(self,request,spider):
        self.logger.debug('PhantomJs is Starting')
        page = request.meta.get('page',1)
        try:
            # 訪問URL
            self.browser.get(request.url)
            
            # 等待爬取的元素的加載
            if page == 0:
                self.wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,'#listWraper')))
            else:
                self.wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,'#foreignWraper')))
            # sleep(2)
            return HtmlResponse(url=request.url,body=self.browser.page_source,request=request,encoding='utf-8',status=200)
        except TimeoutException:
            return HtmlResponse(url=request.url,status=500,request=request)
    
    @classmethod
    def from_crawler(cls,crawler):
        return cls(timeout=crawler.settings.get('SELENIUM_TIMEOUT'),
                   service_args=crawler.settings.get('PHANTOMJS_SERVICE_ARGS'))
        

這裏定義了兩個配置,

在settings.py新增配置

一個是selenium的超時時間,一個是phantomjs服務的配置

SELENIUM_TIMEOUT = 20

PHANTOMJS_SERVICE_ARGS = ['--load-images=false', '--disk-cache=true']

啓用donwloader中間件

DOWNLOADER_MIDDLEWARES = {
   'COVID19.middlewares.SeleniumMiddelware': 543,
}

 

7 Pipelines

pipelines定義了對數據items的處理方式,在這裏能夠進行數據的存儲,定義兩個類,一個是mongo的存儲,一個是mysql的存儲

定義mongodb的操做類

class MongoPipeline(object):
    def __init__(self, mongo_uri, mongo_db):
        self.mongo_uri = mongo_uri
        self.mongo_db = mongo_db

    @classmethod
    def from_crawler(cls, crawler):
        return cls(
            mongo_uri=crawler.settings.get('MONGO_URI'),
            mongo_db=crawler.settings.get('MONGO_DB')
        )

    def open_spider(self, spider):
        self.client = pymongo.MongoClient(self.mongo_uri)
        self.db = self.client[self.mongo_db]

    def process_item(self, item, spider):
        self.db[item.collection].insert(dict(item))
        return item

    def close_spider(self, spider):
        self.client.close()

定義mysql的操做類

class MySqlPipeLine(object):
    def __init__(self, host, database, user, password, port):
        self.host = host
        self.database = database
        self.user = user
        self.password = password
        self.port = port

    @classmethod
    def from_crawler(cls, crawler):
        return cls(
            host=crawler.settings.get('MYSQL_HOST'),
            database=crawler.settings.get('MYSQL_DB'),
            user=crawler.settings.get('MYSQL_USER'),
            password=crawler.settings.get('MYSQL_PASSWORD'),
            port=crawler.settings.get('MYSQL_PORT')
        )

    def open_spider(self, spider):
        self.db = pymysql.connect(
            self.host, self.user, self.password, self.database, charset='utf8', port=self.port)
        self.cursor = self.db.cursor()

    def close_spider(self, spider):
        self.db.close()

    def process_item(self, item, spider):
        data = dict(item)
        keys = ', '.join(data.keys())
        values = ', '.join(['%s'] * len(data))
        sql = 'insert into {table}({keys}) values ({values}) on duplicate key update'.format(
            table=item.table, keys=keys, values=values)
        update = ','.join([" {key}=%s".format(key=key) for key in data])
        sql += update
        try:
            if self.cursor.execute(sql, tuple(data.values())*2):
                print('successful')
                self.db.commit()
        except pymysql.MySQLError as e:
            print(e)
            self.db.rollback()
        return item

這裏分別定義了兩個pipeline,其中調用了數據庫的配置

在settings.py增長配置

MONGO_URI = 'localhost'
MONGO_DB = 'COVID19'

MYSQL_HOST = 'localhost'
MYSQL_DB = 'covid19'
MYSQL_USER = 'root'
MYSQL_PASSWORD = '123456'
MYSQL_PORT = 3306

啓用pipeline中間件

ITEM_PIPELINES = {
    'COVID19.pipelines.MongoPipeline': 300,
    'COVID19.pipelines.MySqlPipeLine': 300
}

8 mysql數據庫

建立一個covid19的數據庫

CREATE DATABASE covid19

建立數據表,一個是疫情數據表covid19,一個是地區經緯度的字典表dic_lnglat

SET NAMES utf8mb4;
SET FOREIGN_KEY_CHECKS = 0;

-- ----------------------------
-- Table structure for covid19
-- ----------------------------
DROP TABLE IF EXISTS `covid19`;
CREATE TABLE `covid19`  (
  `name` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
  `parent` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  `new` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  `now` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  `total` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  `cure` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  `death` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  PRIMARY KEY (`name`) USING BTREE
) ENGINE = InnoDB CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Dynamic;

-- ----------------------------
-- Table structure for dic_lnglat
-- ----------------------------
DROP TABLE IF EXISTS `dic_lnglat`;
CREATE TABLE `dic_lnglat`  (
  `name` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
  `lng` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  `lat` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,
  `type` int(0) NULL DEFAULT NULL,
  PRIMARY KEY (`name`) USING BTREE
) ENGINE = InnoDB CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Dynamic;

SET FOREIGN_KEY_CHECKS = 1;

9 地區經緯度

爬蟲只能從這個頁面爬取到疫情的數據,若是要應用這些數據在3D地球VDEarth上顯示,還須要相關地區的經緯度

爬出的一共兩種類型數據,一種是國內的,包含省份和省份下的市,區等,一種是國外,只有國家名稱

國內的直接使用各城市的經緯度,國外使用國家的首都的經緯度,以前工做中,我已經存放了相關的數據,沒有的話能夠參考

國內城市經緯度整理:參考 http://www.javashuo.com/article/p-tpumegjz-hy.html

沒有找到合適的國外首都經緯度,附上國外首都經緯度整理

  1 globe = {
  2   "阿富汗": [69.11,34.28],
  3   "阿爾巴尼亞": [19.49,41.18],
  4   "阿爾及利亞": [3.08,36.42],
  5   "美屬薩摩亞": [-170.43,-14.16],
  6   "安道​​爾": [1.32,42.31],
  7   "安哥拉": [13.15,-8.50],
  8   "安提瓜和巴布達": [-61.48,17.20],
  9   "阿根廷": [-60.00,-36.30],
 10   "亞美尼亞": [44.31,40.10],
 11   "阿魯巴": [-70.02,12.32],
 12   "澳大利亞": [149.08,-35.15],
 13   "奧地利": [16.22,48.12],
 14   "阿塞拜疆": [49.56,40.29],
 15   "巴哈馬": [-77.20,25.05],
 16   "巴林": [50.30,26.10],
 17   "孟加拉國": [90.26,23.43],
 18   "巴巴多斯": [-59.30,13.05],
 19   "白俄羅斯": [27.30,53.52],
 20   "比利時": [4.21,50.51],
 21   "伯利茲": [-88.30,17.18],
 22   "貝寧": [2.42,6.23],
 23   "不丹": [89.45,27.31],
 24   "玻利維亞": [-68.10,-16.20],
 25   "波斯尼亞和黑塞哥維那": [18.26,43.52],
 26   "博茨瓦納": [25.57,-24.45],
 27   "巴西": [-47.55,-15.47],
 28   "英屬維爾京羣島": [-64.37,18.27],
 29   "文萊": [115.00,4.52],
 30   "保加利亞": [23.20,42.45],
 31   "布基納法索": [-1.30,12.15],
 32   "布隆迪": [29.18,-3.16],
 33   "柬埔寨": [104.55,11.33],
 34   "喀麥隆": [11.35,3.50],
 35   "加拿大": [-75.42,45.27],
 36   "佛得角": [-23.34,15.02],
 37   "開曼羣島": [-81.24,19.20],
 38   "中非共和國": [18.35,4.23],
 39   "乍得": [14.59,12.10],
 40   "智利": [-70.40,-33.24],
 41   "中國": [116.20,39.55],
 42   "哥倫比亞": [-74.00,4.34],
 43   "科摩羅": [43.16,-11.40],
 44   "剛果": [15.12,-4.09],
 45   "哥斯達黎加": [-84.02,9.55],
 46   "科特迪瓦": [-5.17,6.49],
 47   "克羅地亞": [15.58,45.50],
 48   "古巴": [-82.22,23.08],
 49   "塞浦路斯": [33.25,35.10],
 50   "捷克共和國": [14.22,50.05],
 51   "朝鮮": [125.30,39.09],
 52   "剛果(扎伊爾)": [15.15,-4.20],
 53   "丹麥": [12.34,55.41],
 54   "吉布提": [42.20,11.08],
 55   "多米尼加": [-61.24,15.20],
 56   "多米尼加共和國": [-69.59,18.30],
 57   "東帝汶": [125.34,-8.29],
 58   "厄瓜多爾": [-78.35,-0.15],
 59   "埃及": [31.14,30.01],
 60   "薩爾瓦多": [-89.10,13.40],
 61   "赤道幾內亞": [8.50,3.45],
 62   "厄立特里亞": [38.55,15.19],
 63   "愛沙尼亞": [24.48,59.22],
 64   "埃塞俄比亞": [38.42,9.02],
 65   "福克蘭羣島(馬爾維納斯羣島)": [-59.51,-51.40],
 66   "法羅羣島": [-6.56,62.05],
 67   "斐濟": [178.30,-18.06],
 68   "芬蘭": [25.03,60.15],
 69   "法國": [2.20,48.50],
 70   "法屬圭亞那": [-52.18,5.05],
 71   "法屬波利尼西亞": [-149.34,-17.32],
 72   "加蓬": [9.26,0.25],
 73   "岡比亞": [-16.40,13.28],
 74   "格魯吉亞": [44.50,41.43],
 75   "德國": [13.25,52.30],
 76   "加納": [-0.06,5.35],
 77   "希臘": [23.46,37.58],
 78   "格陵蘭": [-51.35,64.10],
 79   "瓜德羅普島": [-61.44,16.00],
 80   "危地馬拉": [-90.22,14.40],
 81   "根西島": [-2.33,49.26],
 82   "幾內亞": [-13.49,9.29],
 83   "幾內亞比紹": [-15.45,11.45],
 84   "圭亞那": [-58.12,6.50],
 85   "海地": [-72.20,18.40],
 86   "赫德島和麥當勞羣島": [74.00,-53.00],
 87   "洪都拉斯": [-87.14,14.05],
 88   "匈牙利": [19.05,47.29],
 89   "冰島": [-21.57,64.10],
 90   "印度": [77.13,28.37],
 91   "印度尼西亞": [106.49,-6.09],
 92   "伊朗": [51.30,35.44],
 93   "伊拉克": [44.30,33.20],
 94   "愛爾蘭": [-6.15,53.21],
 95   "以色列": [35.12,31.47],
 96   "意大利": [12.29,41.54],
 97   "牙買加": [-76.50,18.00],
 98   "約旦": [35.52,31.57],
 99   "哈薩克斯坦": [71.30,51.10],
100   "肯尼亞": [36.48,-1.17],
101   "基里巴斯": [173.00,1.30],
102   "科威特": [48.00,29.30],
103   "吉爾吉斯斯坦": [74.46,42.54],
104   "老撾": [102.36,17.58],
105   "拉脫維亞": [24.08,56.53],
106   "黎巴嫩": [35.31,33.53],
107   "萊索托": [27.30,-29.18],
108   "利比里亞": [-10.47,6.18],
109   "阿拉伯利比亞民衆國": [13.07,32.49],
110   "列支敦士登": [9.31,47.08],
111   "立陶宛": [25.19,54.38],
112   "盧森堡": [6.09,49.37],
113   "馬達加斯加": [47.31,-18.55],
114   "馬拉維": [33.48,-14.00],
115   "馬來西亞": [101.41,3.09],
116   "馬爾代夫": [73.28,4.00],
117   "馬裏": [-7.55,12.34],
118   "馬耳他": [14.31,35.54],
119   "馬提尼克島": [-61.02,14.36],
120   "毛里塔尼亞": [57.30,-20.10],
121   "馬約特島": [45.14,-12.48],
122   "墨西哥": [-99.10,19.20],
123   "密克羅尼西亞(聯邦) ": [158.09,6.55],
124   "摩爾多瓦共和國": [28.50,47.02],
125   "莫桑比克": [32.32,-25.58],
126   "緬甸": [96.20,16.45],
127   "納米比亞": [17.04,-22.35],
128   "尼泊爾": [85.20,27.45],
129   "荷蘭": [04.54,52.23],
130   "荷屬安的列斯": [-69.00,12.05],
131   "新喀里多尼亞": [166.30,-22.17],
132   "新西蘭": [174.46,-41.19],
133   "尼加拉瓜": [-86.20,12.06],
134   "尼日爾": [2.06,13.27],
135   "尼日利亞": [7.32,9.05],
136   "諾福克島": [168.43,-45.20],
137   "北馬裏亞納羣島": [145.45,15.12],
138   "挪威": [10.45,59.55],
139   "阿曼": [58.36,23.37],
140   "巴基斯坦": [73.10,33.40],
141   "帕勞": [134.28,7.20],
142   "巴拿馬": [-79.25,9.00],
143   "巴布亞新幾內亞": [147.08,-9.24],
144   "巴拉圭": [-57.30,-25.10],
145   "祕魯": [-77.00,-12.00],
146   "菲律賓": [121.03,14.40],
147   "波蘭": [21.00,52.13],
148   "葡萄牙": [-9.10,38.42],
149   "波多黎各": [-66.07,18.28],
150   "卡塔爾": [51.35,25.15],
151   "韓國": [126.58,37.31],
152   "羅馬尼亞": [26.10,44.27],
153   "俄羅斯": [37.35,55.45],
154   "盧旺達": [30.04,-1.59],
155   "聖基茨和尼維斯": [-62.43,17.17],
156   "聖盧西亞": [-60.58,14.02],
157   "聖皮埃爾和密克隆": [-56.12,46.46],
158   "聖文森特和格林納丁斯": [-61.10,13.10],
159   "薩摩亞": [-171.50,-13.50],
160   "聖馬力諾": [12.30,43.55],
161   "聖多美和普林西比": [6.39,0.10],
162   "沙特阿拉伯": [46.42,24.41],
163   "塞內加爾": [-17.29,14.34],
164   "塞拉利昂": [-13.17,8.30],
165   "斯洛伐克": [17.07,48.10],
166   "斯洛文尼亞": [14.33,46.04],
167   "所羅門羣島": [159.57,-9.27],
168   "索馬里": [45.25,2.02],
169   "比勒陀利亞": [28.12,-25.44],
170   "西班牙": [-3.45,40.25],
171   "蘇丹": [32.35,15.31],
172   "蘇里南": [-55.10,5.50],
173   "斯威士蘭": [31.06,-26.18],
174   "瑞典": [18.03,59.20],
175   "瑞士": [7.28,46.57],
176   "阿拉伯敘利亞共和國": [36.18,33.30],
177   "塔吉克斯坦": [68.48,38.33],
178   "泰國": [100.35,13.45],
179   "馬其頓": [21.26,42.01],
180   "多哥": [1.20,6.09],
181   "湯加": [-174.00,-21.10],
182   "突尼斯": [10.11,36.50],
183   "土耳其": [32.54,39.57],
184   "土庫曼斯坦": [57.50,38.00],
185   "圖瓦盧": [179.13,-8.31],
186   "烏干達": [32.30,0.20],
187   "烏克蘭": [30.28,50.30],
188   "阿聯酋": [54.22,24.28],
189   "英國": [-0.05,51.36],
190   "坦桑尼亞": [35.45,-6.08],
191   "美國": [-77.02,39.91],
192   "美屬維爾京羣島": [-64.56,18.21],
193   "烏拉圭": [-56.11,-34.50],
194   "烏茲別克斯坦": [69.10,41.20],
195   "瓦努阿圖": [168.18,-17.45],
196   "委內瑞拉": [-66.55,10.30],
197   "越南": [105.55,21.05],
198   "南斯拉夫": [20.37,44.50],
199   "贊比亞": [28.16,-15.28],
200   "津巴布韋": [31.02,-17.43]
201 }
View Code

能夠在python內增長寫入mysql庫的同步進字典表

10 執行爬取

執行scrapy命令啓動爬蟲

scrapy crawl covid19spider

能夠看到爬蟲運行的控制檯日誌輸出,打開數據庫查看

上面pipeline章節使用了mysql和mongo,能夠看到數據已經寫入mysql和mongo中,實際選擇一個就好

 

 爬蟲雖然能工做了,可是每次啓動都要手動執行命令,新增一個running.py文件,定時去調用爬蟲

# -*- coding: utf-8 -*-
from multiprocessing import Process
from scrapy import cmdline
import time
import logging
import os

# 配置參數便可, 爬蟲名稱,運行頻率
confs = [
    {
        "spider_name": "covid19spider",
        "frequency": 10,
    },
]
 
 
def start_spider(spider_name, frequency):
    args = ["cd covid19","scrapy", "crawl", spider_name]
    while True:
        start = time.time()
        p = Process(target=cmdline.execute, args=(args,))
        p.start()
        p.join()
        logging.debug("### use time: %s" % (time.time() - start))
        time.sleep(frequency)
 
 
if __name__ == '__main__':
    for conf in confs:
        process = Process(target=start_spider,args=(conf["spider_name"], conf["frequency"]))
        process.start()
        time.sleep(86400)

這樣爬蟲就能夠定時的去爬取數據,也可使用其餘的方式進行定時的調度,這裏很少說

至此 疫情數據的爬蟲就完成了

 

相關連接

從0開始疫情3D地球 - 3D疫情地球VDEarth - 1- 引言 

從0開始疫情3D地球 - 3D疫情地球VDEarth - 2 - 前端代碼構建 

從0開始疫情3D地球 - 3D疫情地球VDEarth - 3 - 3D地球組件實現(1) 

從0開始疫情3D地球 - 3D疫情地球VDEarth - 4 - 3D地球組件實現(2) 

從0開始疫情3D地球 - 3D疫情地球VDEarth - 5 - 疫情數據爬蟲 

從0開始疫情3D地球 - 3D疫情地球VDEarth - 6 - 數據推送  

相關文章
相關標籤/搜索