話很少說,直接上代碼html
from selenium import webdriver from selenium.webdriver.common.desired_capabilities import DesiredCapabilities def getSource(url): headers = { 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'Accept-Language': 'zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3', 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.104 Safari/537.36 Core/1.53.4882.400 QQBrowser/9.7.13059.400', 'referer':'http://www.taobao.com' } #使用copy()防止修改原代碼定義dict cap = DesiredCapabilities.PHANTOMJS.copy() for key, value in headers.items(): cap['phantomjs.page.customHeaders.{}'.format(key)] = value # 不載入圖片,爬頁面速度會快不少 cap["phantomjs.page.settings.loadImages"] = False driver = webdriver.PhantomJS(desired_capabilities=cap) driver.get(encodeUrl(url))
部分博文提到設置User-Agent使用的是這個方法,好像也是能夠的:web
cap["phantomjs.page.settings.userAgent"] = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.104 Safari/537.36'
訪問以下網址,能夠回顯你請求的數據,用來檢查設置的header是否有效:
https://httpbin.org/get?show_env=1
例如我用如上的代碼,訪問這個地址,後面跟上兩個實驗參數:
https://httpbin.org/get?show_env=1&q=nihao&bbb=c
網頁返回:
app