selenium+phantomJS學習使用記錄

時間 2019-11-11

標籤 selenium+phantomjs selenium phantomjs 學習使用記錄欄目 JavaScript 简体版

原文原文鏈接

背景知識：html

phantomjs是一個基於webkit的沒有界面的瀏覽器，因此運行起來比完整的瀏覽器要高效。python

selenium是一個測試web應用的工具，目前是2.42.1版本，和1版的區別在於2.0+中把WebDrive整合在了一塊兒。web

selenium2支持的Python版本：2.7, 3.2, 3.3 and 3.4ajax

若是須要進行遠程操做的話，就須要額外安裝selenium serverexpress

安裝：windows

先裝selenium2，哪一種方式裝均可以，我通常都是直接下載壓縮包，而後用python setup.py install命令來裝，selenium 2.42.1的下載地址：https://pypi.python.org/pypi/selenium/2.42.1瀏覽器

而後下載phantomjs，https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-1.9.7-windows.zip，解壓後能夠看到一個phantomjs.exe的文件less

範例1：ide

#coding=utf-8
from selenium import webdriver

driver = webdriver.PhantomJS(executable_path=‘C:\Users\Gentlyguitar\Desktop\phantomjs-1.9.7-windows\phantomjs.exe‘)
driver.get("http://duckduckgo.com/")
driver.find_element_by_id(‘search_form_input_homepage‘).send_keys("Nirvana")
driver.find_element_by_id("search_button_homepage").click()
print driver.current_url
driver.quit()

其中的executable_path就是剛纔phantomjs.exe的路徑，運行結果：工具

https://duckduckgo.com/?q=Nirvana

Walk through of the example：

值得一提的是：

get方法會一直等到頁面被徹底加載，而後纔會繼續程序

可是對於ajax： It’s worth noting that if your page uses a lot of AJAX on load then WebDriver may not know when it has completely loaded

send_keys就是填充input

範例2：

#coding=utf-8
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver import ActionChains
import time
import sys

driver = webdriver.PhantomJS(executable_path=‘C:\Users\Gentlyguitar\Desktop\phantomjs-1.9.7-windows\phantomjs.exe‘)
driver.get("http://www.zhihu.com/#signin")
#driver.find_element_by_name(‘email‘).send_keys(‘your email‘)
driver.find_element_by_xpath(‘//input[@name="password"]‘).send_keys(‘your password‘)
#driver.find_element_by_xpath(‘//input[@name="password"]‘).send_keys(Keys.RETURN)
time.sleep(2)
driver.get_screenshot_as_file(‘show.png‘)
#driver.find_element_by_xpath(‘//button[@class="sign-button"]‘).click()
driver.find_element_by_xpath(‘//form[@class="zu-side-login-box"]‘).submit()

try:
    dr=WebDriverWait(driver,5)
    dr.until(lambda the_driver:the_driver.find_element_by_xpath(‘//a[@class="zu-top-nav-userinfo "]‘).is_displayed())
except:
    print ‘登陸失敗‘
    sys.exit(0)
driver.get_screenshot_as_file(‘show.png‘)
#user=driver.find_element_by_class_name(‘zu-top-nav-userinfo ‘)
#webdriver.ActionChains(driver).move_to_element(user).perform() #移動鼠標到個人用戶名
loadmore=driver.find_element_by_xpath(‘//a[@id="zh-load-more"]‘)
actions = ActionChains(driver)
actions.move_to_element(loadmore)
actions.click(loadmore)
actions.perform()
time.sleep(2)
driver.get_screenshot_as_file(‘show.png‘)
print driver.current_url
print driver.page_source
driver.quit()

這個程序完成的是，登錄知乎，而後能自動點擊頁面下方的「更多」，以載入更多的內容

Walk through of the example：

from selenium.webdriver.common.keys import Keys，keys這個類就是鍵盤上的鍵，文中的send_keys(Keys.RETURN)就是按一個回車

from selenium.webdriver.support.ui import WebDriverWait是爲了後面一個等待的操做

from selenium.webdriver import ActionChains是導入一個動做的類，這句話的寫法，我找了好久

find_element推薦使用Xpath的方法，緣由在於：逼格高，並且真的很是很是方便

Xpath表達式寫法教程：http://www.ruanyifeng.com/blog/2009/07/xpath_path_expressions.html

值得注意的是，避免選擇value帶有空格的屬性，譬如class = "country name"這種，否則會報錯，大概compound class之類的錯

檢查用戶密碼是否輸入正確的方法就是在填入後截屏看看

想要截屏，這麼一句話就行：

driver.get_screenshot_as_file(‘show.png‘)

可是，這裏的截屏是不帶滾動條的，就是給你把整個頁面所有照下來

try:
    dr=WebDriverWait(driver,5)
    dr.until(lambda the_driver:the_driver.find_element_by_xpath(‘//a[@class="zu-top-nav-userinfo "]‘).is_displayed())
except:
    print ‘登陸失敗‘
    sys.exit(0)

是用來經過檢查某個元素是否被加載來檢查是否登陸成功，我認爲當個黑盒子用就能夠了。其中5的解釋：5秒內每隔500毫秒掃描1次頁面變化，直到指定的元素

對於表單的提交，便可以選擇登陸按鈕而後使用click方法，也能夠選擇表單而後使用submit方法，後者能應付沒有登陸按鈕的狀況，因此推薦使用submit()

對於一次點擊，既可使用click()，也可使用一連串的action來實現，如文中：

loadmore=driver.find_element_by_xpath(‘//a[@id="zh-load-more"]‘)
actions = ActionChains(driver)
actions.move_to_element(loadmore)
actions.click(loadmore)
actions.perform()

這5句話其實就至關於一句話，find element而後click，可是action的適用範圍更廣，譬如在這個例子中，要點擊的是一個a標籤對象，我不知道爲何直接用click不行，不起做用

print driver.current_url
print driver.page_source

打印網頁的兩個屬性：url和source

參考文獻：

http://www.realpython.com/blog/python/headless-selenium-testing-with-python-and-phantomjs/#.U5FXUvmSziE

http://selenium-python.readthedocs.org/getting-started.html

http://www.ruanyifeng.com/blog/2009/07/xpath_path_expressions.html

http://www.cnblogs.com/paisen/p/3310067.html


phantomJS設置頭部的userAgent
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
 
dcap = dict(DesiredCapabilities.PHANTOMJS)
dcap["phantomjs.page.settings.userAgent"] = (
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0 "
)
 
driver = webdriver.PhantomJS(executable_path='./phantomjs', desired_capabilities=dcap)
driver.get("http://dianping.com/")
cap_dict = driver.desired_capabilities
for key in cap_dict:
    print '%s: %s' % (key, cap_dict[key])
print driver.current_url
driver.quit
查看是否成功

agent = browser.execute_script("return navigator.userAgent")print agent