selenium獲取動態網頁信息（某東）-具體配置信息 selenium之驅動環境配置chrome、firefox、IE

時間 2019-12-08

標籤 selenium 獲取動態網頁信息具體配置驅動環境 chrome firefox 欄目 Chrome 简体版

原文原文鏈接

須要安裝的包:html

selenium 
關於軟件的驅動：
selenium之 驅動環境配置chrome、firefox、IE

 1 # encoding:utf-8
 2 # Author:"richie"
 3 # Date:8/16/2017
 4 
 5 import re,json
 6 from selenium import webdriver 7 8 def spider(url): 9 html = get_file(url) 10 com = re.compile(r'<li class="gl-item">.*?<div class="p-price">.*?<em>(?P<currency>.)</em><i>(?P<price>.*?)</i>' 11 r'.*?<div class="p-name">.*?<em>(?P<name>.*?)</em>' 12 r'.*?<div class="p-commit">.*?<strong>.*?<a.*?>(?P<comment_num>.*?)</a>', re.S) 13 for item in com.finditer(html): 14 yield { 15 "name": item.group("name"), 16 "currency": item.group("currency"), 17 "price": item.group("price"), 18 "comment_num": item.group('comment_num'), 19  } 20 21 22 def get_file(url): 23 try: 24 driver = webdriver.Chrome() 25  driver.get(url) 26 driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") 27 source = driver.page_source 28  driver.quit() 29 return source 30 except BaseException as e: 31 print(e) 32 return None 33 34 if __name__ == '__main__': 35 for i in range(1,2): 36 page_url = "https://list.jd.com/list.html?cat=9987,653,655&ev=exprice_M1800L2500&page="+str(i)+"&sort=sort_rank_asc&trans=1&JL=6_0_0" 37 ret = spider(page_url) 38 f = open("jingdong.txt", "a",encoding='utf-8') 39 for obj in ret: 40 data = json.dumps(obj, ensure_ascii=False) 41 f.write(data + "\n") 42 print("ok")