python爬蟲——selenium+chrome使用代理

時間 2020-06-19

標籤 python 爬蟲 selenium+chrome selenium chrome 使用代理欄目 Python 简体版

原文原文鏈接

先看下本文中的知識點：html

python selenium庫安裝
chrome webdirver的下載安裝
selenium+chrome使用代理
進階學習

搭建開發環境：

PS：安裝了的同窗能夠跳過了接着下一步，沒安裝的同窗跟着個人步驟走一遍python

安裝selenium庫

pip install selenium

安裝chrome webdirver

這裏要注意要配置系統環境，把chrome webdirver解壓後放到python路徑的Scripts目錄下，跟pip在一個目錄下。這裏能夠教你們一個查看python安裝路徑的命令linux

# windows系統，打開cmd
where python
# linux系統
whereis python

谷歌瀏覽器

注意谷歌瀏覽器的版本要>=7.9，由於以前下載的chrome webdirver是7.9版本的。瀏覽器就本身安裝吧。web

代碼樣例

好的，如今咋們的環境都配置好了，寫幾行代碼試下，以請求百度爲例chrome

from selenium import webdriver
# 用webdriver的chrome瀏覽器打開
chrome = webdriver.Chrome()
chrome.get('https://www.baidu.com')
print(chrome.page_source)
chrome.quit() #退出

運行下，先會打開chrome瀏覽器，而後訪問百度，在打印page信息，最後關閉瀏覽器 windows

使用代理

使用代理IP去訪問就得加一個參數了，代碼以下api

from selenium import webdriver

chrome_options = webdriver.ChromeOptions()
# 代理IP,由快代理提供
proxy = '60.17.254.157:21222'
# 設置代理
chrome_options.add_argument('--proxy-server=%s' % proxy)
# 注意options的參數用以前定義的chrome_options
chrome = webdriver.Chrome(options=chrome_options)
# 百度查IP
chrome.get('https://www.baidu.com/s?ie=UTF-8&wd=ip')
print(chrome.page_source)
chrome.quit() #退出

運行下，結果如圖瀏覽器

擴展

不想用谷歌瀏覽器啊，想用火狐怎麼辦。沒問題啊，webdriver也支持火狐。看下webdriver的幫助文檔bash

from selenium import webdriver
help(webdriver)

看下圖，不止支持火狐firefox，谷歌chrome，ie，opera等等都支持的。學習

進階學習

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。