概述app
昨天那個推文發佈後,有朋友反饋說表格上的信息太少了,因而我就又增長了各個調劑信息的詳情。ide
此處我只列舉了一部分調劑院校數據,更多數據請公衆號後臺回覆「調劑」獲取,該回覆文件持續更新。祝成功上岸。祝福武漢,祝福湖北,祝福中國,祝福世界!網站
項目總述url
其實和以前寫的沒啥區別,這裏再也不贅述,詳見幾十行代碼批量下載高清壁紙 爬蟲入門實戰spa
部分代碼code
構建urlorm
# 構建全部url def get_url_list(self): url_list = [] for i in range(1, 17): url = self.base_url.format(i) url_list.append(url) return url_list
某網站的數據解析部分blog
# 解析存儲數據 def parse_data(self, data): tree = etree.HTML(data) info_list = tree.xpath("//div[@class='info-item font14']") for info in info_list: school_name = info.xpath('./span/text()')[0] major_name = info.xpath('./span/text()')[1] info_title = info.xpath('./span/a/text()') info_time = info.xpath('./span/text()')[2] global n sheet.write(n, 0, school_name) sheet.write(n, 1, major_name) sheet.write(n, 2, info_title) sheet.write(n, 3, info_time) n = n + 1
獲取數據查看get