Python爬取房產數據，在地圖上展示！

時間 2020-06-09

原文原文鏈接

小夥伴，我又來了，此次咱們寫的是用python爬蟲爬取烏魯木齊的房產數據並展現在地圖上，地圖工具我用的是 BDP我的版-免費在線數據分析軟件，數據可視化軟件，這個能夠導入csv或者excel數據。javascript

首先仍是分析思路，爬取網站數據，獲取小區名稱，地址，價格，經緯度，保存在excel裏。再把excel數據上傳到BDP網站，生成地圖報表

本次我使用的是scrapy框架，可能有點大材小用了，主要是剛學完用這個練練手，再寫代碼前我仍是建議你們先分析網站，分析好數據，再去動手寫代碼，由於好的分析能夠事半功倍，烏魯木齊樓盤,2017烏魯木齊新樓盤,烏魯木齊樓盤信息 - 烏魯木齊吉屋網這個網站的數據比較全，每一頁獲取房產的LIST信息，而且翻頁，點進去是詳情頁，獲取房產的詳細信息(包含名稱，地址，房價，經緯度)，再用pipelines保存item到excel裏，最後在bdp生成地圖報表，廢話很少說上代碼：html

JiwuspiderSpider.pyjava

# -*- coding: utf-8 -*- 
from scrapy import Spider,Request import re from jiwu.items import JiwuItem class JiwuspiderSpider(Spider): name = "jiwuspider" allowed_domains = ["wlmq.jiwu.com"] start_urls = ['http://wlmq.jiwu.com/loupan'] def parse(self, response): """ 解析每一頁房屋的list :param response: :return: """ 
        for url in response.xpath('//a[@class="index_scale"]/@href').extract(): yield Request(url,self.parse_html)  # 取list集合中的url 調用詳情解析方法 
 
        # 若是下一頁屬性還存在，則把下一頁的url獲取出來 
        nextpage = response.xpath('//a[@class="tg-rownum-next index-icon"]/@href').extract_first() #判斷是否爲空 
        if nextpage: yield Request(nextpage,self.parse)  #回調本身繼續解析 
 
 
 
    def parse_html(self,response): """ 解析每個房產信息的詳情頁面，生成item :param response: :return: """ pattern = re.compile('<script type="text/javascript">.*?lng = \'(.*?)\';.*?lat = \'(.*?)\';.*?bname = \'(.*?)\';.*?' 
                             'address = \'(.*?)\';.*?price = \'(.*?)\';',re.S) item = JiwuItem() results = re.findall(pattern,response.text) for result in results: item['name'] = result[2] item['address'] = result[3] # 對價格判斷只取數字，若是爲空就設置爲0 
            pricestr =result[4] pattern2 = re.compile('(\d+)') s = re.findall(pattern2,pricestr) if len(s) == 0: item['price'] = 0 else:item['price'] = s[0] item['lng'] = result[0] item['lat'] = result[1] yield item

item.pypython

# -*- coding: utf-8 -*- 
 
# Define here the models for your scraped items  #  # See documentation in:  # http://doc.scrapy.org/en/latest/topics/items.html 
 
import scrapy class JiwuItem(scrapy.Item): # define the fields for your item here like: 
    name = scrapy.Field() price =scrapy.Field() address =scrapy.Field() lng = scrapy.Field() lat = scrapy.Field() pass

pipelines.py 注意此處是吧mongodb的保存方法註釋了，能夠自選選擇保存方式mongodb

# -*- coding: utf-8 -*- 
 
# Define your item pipelines here  #  # Don't forget to add your pipeline to the ITEM_PIPELINES setting  # See: http://doc.scrapy.org/en/latest/topics/item-pipeline.html 
import pymongo from scrapy.conf import settings from openpyxl import workbook class JiwuPipeline(object): wb = workbook.Workbook() ws = wb.active ws.append(['小區名稱', '地址', '價格', '經度', '緯度']) def __init__(self): # 獲取數據庫鏈接信息 
        host = settings['MONGODB_URL'] port = settings['MONGODB_PORT'] dbname = settings['MONGODB_DBNAME'] client = pymongo.MongoClient(host=host, port=port) # 定義數據庫 
        db = client[dbname] self.table = db[settings['MONGODB_TABLE']] def process_item(self, item, spider): jiwu = dict(item) #self.table.insert(jiwu) 
        line = [item['name'], item['address'], str(item['price']), item['lng'], item['lat']] self.ws.append(line) self.wb.save('jiwu.xlsx') return item