python爬蟲---從零開始（六）Selenium庫

時間 2020-05-22

標籤 python 爬蟲開始 selenium 欄目 Python 简体版

原文原文鏈接

什麼是Selenium庫：javascript

　　自動化測試工具，支持多種瀏覽器。支持的瀏覽器包括IE（7, 8, 9, 10, 11），Mozilla Firefox，Safari，Google Chrome，Opera等。php

爬蟲中主要用來解決JavaScript渲染的問題。用於驅動瀏覽器，而且給予瀏覽器動做。css

安裝Selenium庫：pip3 install seleniumhtml

Selcnium庫的使用詳解：java

　　在使用以前咱們須要安裝webDriver驅動，具體安裝方式，自行百度，切記版本對應。python

　　基本使用：jquery

#!/usr/bin/env python # -*- coding: utf-8 -*- # 基本用法
from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.common.keys import Keys from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.support.wait import WebDriverWait browser = webdriver.Chrome() try: browser.get("http://www.baidu.com") input = browser.find_element_by_id('kw') input.send_keys('Python') input.send_keys(Keys.ENTER) wait = WebDriverWait(browser, 10) wait.until(EC.presence_of_element_located((By.ID,'content_left'))) print(browser.current_url) print(browser.get_cookies()) print(browser.page_source) finally: browser.close()

若是這段代碼能夠運行，說明你的webDriver版本正確（須要安裝Google瀏覽器）git

運行結果：web

　　聲明瀏覽器對象：api

剛纔咱們說了Selenium支持多瀏覽器，下面我看下分別怎麼進行聲明

#!/usr/bin/env python # -*- coding: utf-8 -*- # 聲明瀏覽器對象
from selenium import webdriver browser = webdriver.Chrome() browser = webdriver.Safari() browser = webdriver.Edge() browser = webdriver.Firefox() browser = webdriver.PhantomJS()

我這裏沒有安裝那些瀏覽器，就不給你們運行代碼了，建議使用Chrome瀏覽器（Google谷歌瀏覽器）

訪問頁面：

#!/usr/bin/env python # -*- coding: utf-8 -*- # 訪問頁面
from selenium import webdriver browser = webdriver.Chrome() browser.get("http://baidu.com") print(browser.page_source) browser.close()

運行結果：

查找元素：

　　單個元素：

#!/usr/bin/env python # -*- coding: utf-8 -*- # 查找元素，單個元素
from selenium import webdriver browser = webdriver.Chrome() browser.get("http://taobao.com") input_first = browser.find_element_by_id('q') input_second = browser.find_element_by_css_selector('#q') input_three = browser.find_element_by_xpath('//*[@id="q"]') print(input_first) print(input_second) print(input_three) browser.close()

運行結果：

```
find_element_by_name 
```
```
find_element_by_xpath 
```
```
find_element_by_link_text
```
```
find_element_by_partial_link_text
```
```
find_element_by_tag_name
```
```
find_element_by_class_name
```
```
find_element_by_css_selector
```

這些都爲查找方式

也能夠用通用方式來查找：

#!/usr/bin/env python # -*- coding: utf-8 -*- # 查找元素，單個元素
from selenium import webdriver from selenium.webdriver.common.by import By browser = webdriver.Chrome() browser.get("http://taobao.com") input_first = browser.find_element(By.ID,'q') print(input_first) browser.close()

運行結果：

多個元素：

#!/usr/bin/env python # -*- coding: utf-8 -*- # 查找元素，多個元素
from selenium import webdriver from selenium.webdriver.common.by import By browser = webdriver.Chrome() browser.get("http://taobao.com") input_first = browser.find_elements_by_css_selector('.service-bd li') for i in input_first: print(i) browser.close()

運行結果：

還有不少方法和find_elment用法徹底一致，返回一個列表數據。

元素交互操做：

對獲取的元素調用交互方法：

#!/usr/bin/env python # -*- coding: utf-8 -*- # 元素交互操做

from selenium import webdriver from selenium.webdriver.common.by import By browser = webdriver.Chrome() browser.get("http://baidu.com") input_first = browser.find_element(By.ID,'kw') input_first.send_keys('python從入坑到放棄') button = browser.find_element_by_class_name('bg s_btn') button.click()

運行代碼咱們會看到打開Chrome瀏覽器，而且輸入要搜索的內容，而後點擊搜索按鈕。更多操做訪問地址：https://selenium-python.readthedocs.io/api.html#module-selenium.webdriver.remote.webelement

交互操做：

將動做附加到動做鏈中串行執行

#!/usr/bin/env python # -*- coding: utf-8 -*- # 交互操做
from selenium import webdriver from selenium.webdriver import ActionChains browser = webdriver.Chrome() url = 'https://www.runoob.com/try/try.php?filename=jqueryui-api-droppable' browser.get(url) browser.switch_to.frame('iframeResult') source = browser.find_element_by_id('draggable') target = browser.find_element_by_id('droppable') actions = ActionChains(browser) actions.drag_and_drop(source, target) actions.perform()

運行代碼咱們會看到內部的滑塊進行了拖拽操做。更多詳細的操做能夠訪問：https://selenium-python.readthedocs.io/api.html#module-selenium.webdriver.common.action_chains

執行Javascript：⭐️⭐️⭐️⭐️⭐️

#!/usr/bin/env python # -*- coding: utf-8 -*- # 執行javascript
from selenium import webdriver browser = webdriver.Chrome() browser.get('https://www.zhihu.com/explore') browser.execute_script('window.scrollTo(0,document.body.scrollHeight)') browser.execute_script('alert("彈出")')

運行代碼咱們能夠看到，滾動條被下拉，而且給予了彈出框。

獲取元素信息：

　　獲取屬性：

#!/usr/bin/env python # -*- coding: utf-8 -*- # 獲取元素信息：獲取屬性
from selenium import webdriver browser = webdriver.Chrome() url = "http://www.zhihu.com/explore" browser.get(url) logo = browser.find_element_by_id('zh-top-link-logo') print(logo) print(logo.get_attribute('class'))

運行結果：

獲取文本值：

#!/usr/bin/env python # -*- coding: utf-8 -*- # 獲取文本值
from selenium import webdriver browser = webdriver.Chrome() url = "http://www.zhihu.com/explore" browser.get(url) question = browser.find_element_by_class_name('zu-top-add-question') print(question.text)

運行結果：

獲取ID，位置，標籤名，大小：

#!/usr/bin/env python # -*- coding: utf-8 -*- # 獲取ID，位置，標籤名，大小
from selenium import webdriver browser = webdriver.Chrome() url = "http://www.zhihu.com/explore" browser.get(url) question = browser.find_element_by_class_name('zu-top-add-question') print(question.id) print(question.location) print(question.tag_name) print(question.size)

運行結果：

Frame：

#!/usr/bin/env python # -*- coding: utf-8 -*- # Frame
from selenium import webdriver from selenium.common.exceptions import NoSuchElementException browser = webdriver.Chrome() url = 'https://www.runoob.com/try/try.php?filename=jqueryui-api-droppable' browser.get(url) browser.switch_to.frame('iframeResult') source = browser.find_element_by_id('draggable') print(source) try: logo = browser.find_element_by_class_name('logo') except NoSuchElementException: print("NO LOGO") browser.switch_to.parent_frame() logo = browser.find_element_by_class_name('logo') print(logo) print(logo.text)

運行結果：

等待：

隱式等待：

當使用了隱式等待執行測試的時候，若是WebDriver沒有在DOM中找到元素，將繼續等待，超出設定時間則拋出找不到元素的異常，換句話來講，當元素或查找元素沒有當即出現的時候，隱式等待將等待一段時間再查找DOM，默認時間是0

#!/usr/bin/env python # -*- coding: utf-8 -*- # 隱式等待
from selenium import webdriver browser = webdriver.Chrome() url = "http://www.zhihu.com/explore" browser.get(url) input = browser.find_element_by_class_name('zu-top-add-question') print(input)

運行結果：

顯示等待：比較經常使用

#!/usr/bin/env python # -*- coding: utf-8 -*- # 顯示等待
from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.common.keys import Keys from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.support.wait import WebDriverWait browser = webdriver.Chrome() browser.get("http://www.taobao.com") wait = WebDriverWait(browser, 10) wait.until(EC.presence_of_element_located((By.ID,'q'))) button = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR,'.btn-search'))) print(input,button)

title_is 標題是某內容
```
title_contains 標題包含某內容
```

presence_of_element_located 元素加載出，傳入定位元祖，如（By.ID,'p'）

visibility_of_element_located 元素可見，傳入定位元祖

visibility_of 可見，傳入元素對象

presence_of_all_elements_located 全部元素加載出

text_to_be_present_in_element 某個元素文本包含某文字

text_to_be_present_in_element_value 某個元素值包含某文字

frame_to_be_available_and_switch_to_it 加載並切換

invisibility_of_element_located 元素不可見

```
element_to_be_clickable 元素可點擊
```

staleness_of 判斷一個元素是否仍在DOM,可判斷頁面是否已經刷新

element_to_be_selected 元素可選擇，傳元素對象

element_located_to_be_selected 元素能夠選擇，傳入定位元祖

element_selection_state_to_be 傳入元素對象以及狀態，相等返回True，不然返回False

element_located_selection_state_to_be 傳入定位元祖以及狀態，相等返回True，不然返回False

```
alert_is_present 是否出現Alert
```

　　詳細內容，能夠閱讀官方地址：https://selenium-python.readthedocs.io/api.html#module-selenium.webdriver.support.expected_conditions

前進和後退：

#!/usr/bin/env python # -*- coding: utf-8 -*- # 前進和後退
from selenium import webdriver browser = webdriver.Chrome() browser.get("http://www.taobao.com") browser.get("http://www.baidu.com") browser.get("http://www.zhihu.com") browser.back() browser.forward()

運行代碼咱們會看到優先你們taobao.com而後打開baidu.com，最後打開zhihu.com，而後執行退回動做和前進動做

Cookies：

#!/usr/bin/env python # -*- coding: utf-8 -*- # Cookies
from selenium import webdriver browser = webdriver.Chrome() browser.get("http://www.zhihu.com") print(browser.get_cookies()) browser.add_cookie({'name':'admin','domain':'www.zhihu.com','value':'cxiaocai'}) print(browser.get_cookies()) browser.delete_all_cookies() print(browser.get_cookies())

運行結果：

選項卡管理：

#!/usr/bin/env python # -*- coding: utf-8 -*- # 選項卡管理
from selenium import webdriver browser = webdriver.Chrome() browser.get("http://www.baidu.com") browser.execute_script('window.open()') print(browser.window_handles) browser.switch_to.window(browser.window_handles[1]) browser.get('http://www.taobao.com') browser.switch_to.window(browser.window_handles[0]) browser.get('http://www.zhihu.com')

也可使用瀏覽器的快捷方式的操做鍵位來打開窗口（不建議這樣使用，建議使用上面的方式來管理選項卡）

異常處理：

#!/usr/bin/env python # -*- coding: utf-8 -*- # 異常處理
from selenium import webdriver from selenium.common.exceptions import TimeoutException,NoSuchElementException browser = webdriver.Chrome() try: browser.get("http://www.baidu.com") except TimeoutException: print("請求超時") try: browser.find_element_by_id('hello') except NoSuchElementException: print("NoSuchElementException")