咱們都知道Selenium是一個Web的自動化測試工具,能夠在多平臺下操做多種瀏覽器進行各類動做,好比運行瀏覽器,訪問頁面,點擊按鈕,提交表單,瀏覽器窗口調整,鼠標右鍵和拖放動做,下拉框和對話框處理等,咱們抓取時選用它,主要是Selenium能夠渲染頁面,運行頁面中的JS,以及其點擊按鈕,提交表單等操做。html
from selenium import webdriver driver = webdriver.PhantomJS() driver.get("http://www.xxxxxx.com") data = driver.title print data
咱們爲何要用phantomjs呢?前端
介紹node
PhantomJS是一個基於webkit的JavaScript API。任何你能夠在基於webkit瀏覽器作的事情,它都能作到。它不只是個隱形的瀏覽器(沒有UI界面的瀏覽器),提供了諸如CSS選擇器、支持Web標準、DOM操做、JSON、HTML五、Canvas、SVG等,同時也提供了處理文件I/O的操做,從而使你能夠向操做系統讀寫文件等。PhantomJS的用處可謂很是普遍,諸如前端無界面自動化測試(須要結合Jasmin)、網絡監測、網頁截屏等。linux
windows下進行安裝:web
pip install seleniumwindows
phantomjs使用簡單的使用方式:瀏覽器
from selenium import webdriver browser = webdriver.PhantomDS('D:\phantomjs.exe') #瀏覽器初始化;Win下須要設置phantomjs路徑,linux下置空便可 url = 'http://www.xxxxxx.com' # 設置訪問路徑地址 browser.get(url) # 打開網頁 title = browser.find_elements_by_xpath('xxxxxx') #用xpath獲取元素 for t in title: # 遍歷輸出 print t.text #輸出其中文本 print t.get_attribute(’class’)# 輸出屬性值 browser.qiiit() #關閉瀏覽器。當出現異常時記得在任務瀏覽器中關閉
咱們進行一個簡單的對比操做,首先請回顧一下selenium webdriver的操做網絡
from selenium import webdriver driver = webdriver.Firefox() driver.get("https: //www.xxxxxx.com/") dniver.find_element_by_id('xxxxxxxx').send_keys("nxxxxxx") dniver.find_element_by_id("xxxxxxxx").click() driver.quit()
使用phantomjs工具
from selenium import webdriver driver = webdriver.PhantomJS() driver.set_window_size(xxx,xxx) #瀏覽器大小 driver.get ("https: //www.xxx.com/") dniver.find_element_by_id('xxxx').send_keys("xxxx") dniver.find_element_by_id("xxxxxx").click() print driver.current_url driver.quit()
經過以上兩個案例你們應該能夠看出相關的一個區別所在!!
編寫一個簡單的斷言來判斷phantomjs獲取獲得的URL是否正確的呢:單元測試
import unittest from selenium import webdriver class TestOne(unittest.TestCase): def setUp(self): self.driver = webdniver.PhantomDS() self.driver.set_window_size(xxx, xxx) def test_url(self): self.driver.get("https://www.xxx.com") self.driver.find_element_by_id('xxxxxx').send_keys("xxxx") self.driver.find_element_by_id("xxxxx").click() self.assentln("https://www.xxx.com", self.driver.current_url) def tearDown(self): self.driver.quit() if __name__ == "__main__": unittest.main()
那麼你會發現經過以上的單元測試進行斷言後是徹底能夠經過的。
使用PhantomJS在瀏覽器的一個主要優勢是測試一般要快得多。
import unittest from selenium import webdriver import time class TestThree(unittest.TestCase): def setUp(self): self.startTime = time.time() def test_unl_fire(self): time.sleep(2) self.driver = webdniver.Firefox() self.driver.get("https://www.xxx.com") button = self.driver.find_element_by_id("xxx").get_attribute("xxxx") self.assentEquals('xxxxx', button) def test_unl_phantom(self): time.sleep(l) self.driver = webdniver.PhantomDS() self.driver.get("https://www.xxx.com") button = self.driver.find_element_by_id("xxxx").get_attribute("xxxx") self.assentEquals('xxxxx', button) def tearDown(self): t = time.time() - self.startTime print "%s: %.3f"% (self.id(), t) self.driver.quit() if __name__== '__main__': suite = unittest.TestLoader().loadTestsFromTestCase(TestThree) unittest.TextTestRunner(verbosity=0).run(suite)
經過兩個時間上的一個對比你會發現使用phantomjs速度有多快
內容拓展:
# coding:utf-8 from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as ec import nose.tools as nose #賬戶 email = 'user' password = 'password' # phantomjs # user agent user_agent = 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.66 Safari/537.36' # PhantomUS的路徑 pjs_path = 'xx/node_modules/phantomjs/bin/phantomjs dcap = {"phantomjs.page.settings.userAgent": user_agent, 'marionette' : True } driver = webdriver.PhantomJS(executable_path=pjs_path, desired_capabilities=dcap) # 5秒 wait = WebDriverWait(driver, 5) #獲取html登陸頁面 login_page_url = 'http://xxx' driver.get(login_page_url) #等到頁面加載 wait.until(ec.presence_of_all_elements_located) #檢查當前網址 nose.eq_('http://xxx', driver.current_url) # login # button click show_signin = driver.find_element_by_id('xxx') show_signin.click() # email login_xpath = 'xxx"]' #等待對象元素 wait.until(ec.visibility_of_element_located((By.XPATH, login_xpath))) login_id_form =driver.find_element_by_xpath(login_xpath) login_id_form.clean() login_id_form.send_keys(email) # password password_xpath = 'xxxx' #等待對象元素 wait.until(ec.visibility_of_element_located((By.XPATH, password_xpath))) # password password_form = driver.find_element_by_xpath(passwond_xpath) password_form.clean() password_form.send_keys(password) # submit submit_xpath = 'xxxx' dniver.find_element_by_xpath(submit_xpath).click() # result driver.get('http://xxx') #等到頁面加載 wait.until(ec.presence_of_all_elements_located) #檢查當前網址 nose.eq_('http://xxx', driver.current_url) user_email = driver.find_element_by_xpath('xxx').get_attribute( "XXX") nose.eq_(email, user_email)