第二十六節:urllib、requests、selenium請求庫代理設置

一、urllib代理設置javascript

 1 from urllib.error import URLError  2 from urllib.request import ProxyHandler  3 from urllib.request import build_opener  4 
 5 # 經過ProxyHandler來設置代理服務器,參數爲字典類型,鍵名爲協議,鍵值爲代理
 6 proxy_handler = ProxyHandler({"http": "http://113.120.33.75:9999",  7                               "https":"https://120.83.99.72:9999"})  8 
 9 # 建立一個opener對象,比openurl多了一個header
10 opener = build_opener(proxy_handler) 11 try: 12     # 經過opener對象打開url
13     response = opener.open("http://httpbin.org/get") 14     print(response.read().decode("utf-8")) 15 except URLError as e: 16     print(e.reason)

結果會出現兩種狀況html

 [WinError 10061] 因爲目標計算機積極拒絕,沒法鏈接。 java

解決方法:主要是代理不可用,更換代理就行python

 [WinError 10060] 因爲鏈接方在一段時間後沒有正確答覆或鏈接的主機沒有反應,鏈接嘗試失敗。 web

解決方法:將瀏覽器的代理設置中的局域網設置,裏面的自動配置腳本選項改成自動檢測設置便可。chrome

運行結果以下,能夠看到origin已經改爲了代理IPwindows

{ "args": {}, "headers": { "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3", "Accept-Encoding": "gzip, deflate", "Accept-Language": "zh-CN,zh;q=0.9", "Host": "httpbin.org", "Upgrade-Insecure-Requests": "1", "User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36" }, "origin": "113.120.33.75, 113.120.33.75", "url": "https://httpbin.org/get" }

二、requests代理設置瀏覽器

 1 import requests  2 
 3 # 設置代理
 4 proxies = {"http":"http://182.92.113.183:8118",  5            "https":"https://120.83.99.72:9999"}  6 try:  7     # 請求url連接
 8     response = requests.get("http://httpbin.org/get",proxies=proxies)  9     
10     # 輸出文本內容
11     print(response.text) 12 except requests.exceptions.ConnectionError as e: 13     print(e.args)

運行結果以下,origin已經更改成代理IP了,顯然比urllib簡單多了,且不用建立opener對象緩存

{ "args": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Host": "httpbin.org", "User-Agent": "python-requests/2.21.0" }, "origin": "182.92.113.183, 182.92.113.183", "url": "https://httpbin.org/get" }

三、selenium代理設置服務器

 1 import time  2 from selenium import webdriver  3 
 4 # 代理IP地址
 5 proxy = "182.92.113.183:8118"
 6 
 7 # url連接
 8 url = "http://httpbin.org/get"
 9 
10 # 啓動谷歌控制選項,以便添加代理
11 chrom_options = webdriver.ChromeOptions() 12 
13 # 設置代理,注意「=」號兩邊不能有空格
14 chrom_options.add_argument("--proxy-server=http://" + proxy) 15 
16 # 模擬谷歌瀏覽器,並經過chrome_options參數傳遞代理
17 browser = webdriver.Chrome(executable_path="D:\chromedriver.exe",chrome_options=chrom_options) 18 
19 # 打開url連接
20 browser.get(url=url) 21 time.sleep(10) 22 
23 # 退出並清除瀏覽器緩存
24 browser.quit()

運行結果以下,能夠看到origin已經更改成代理IP了

{ "args": {}, "headers": { "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3", "Accept-Encoding": "gzip, deflate", "Accept-Language": "zh-CN,zh;q=0.9", "Host": "httpbin.org", "Upgrade-Insecure-Requests": "1", "User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36" }, "origin": "182.92.113.183, 182.92.113.183", "url": "https://httpbin.org/get" }

四、PhantomJS代理設置(新版本selenium已經棄用,使用谷歌無界面瀏覽器)

from selenium import webdriver """service_args = [ ‘--proxy=%s‘ % ip_html, # 代理 IP:prot (eg:192.168.0.28:808) ‘--proxy-type=http’, # 代理類型:http/https ‘--load-images=no’, # 關閉圖片加載(可選) ‘--disk-cache=yes’, # 開啓緩存(可選) ‘--ignore-ssl-errors=true’ # 忽略https錯誤(可選) ]"""
# url連接
url = "http://httpbin.org/get" service_args = ["--proxy=121.233.206.44:9999",      # 代理IP
                "--proxy-type=http"]                # 代理協議類型http/HTTPS

# 啓用PhantomJS無界面瀏覽器,並傳遞參數爲代理IP
browser = webdriver.PhantomJS(executable_path=r"D:\phantomjs-2.1.1-windows\bin\phantomjs.exe",service_args=service_args) browser.get(url=url) print(browser.page_source)

運行結果出乎意料的是最新版本的selenium不在支持PhantomJS了,讓咱們使用谷歌或者火狐的無頭瀏覽器

UserWarning: Selenium support for PhantomJS has been deprecated, please use headless versions of Chrome or Firefox instead warnings.warn('Selenium support for PhantomJS has been deprecated, please use headless '

因爲個人瀏覽器是谷歌瀏覽器,因此我就使用谷歌的無頭瀏覽器進行IP代理,

from selenium import webdriver from selenium.webdriver.chrome.options import Options # 建立谷歌瀏覽器模擬設置對象
chrome_options = Options() proxy = "182.92.113.183:8118"       # 代理IP
url = "http://httpbin.org/get"      # url連接

# 設置谷歌瀏覽器無界面模式,即瀏覽器不停可視化頁面
chrome_options.add_argument("--headless") # 禁用使用GPU加速
chrome_options.add_argument("--disable-gpu") # 設置語言
chrome_options.add_argument("-lang=zh-cn")      # 中文 # chrome_options.add_argument("-lang=en-GB") # 英文

# 設置谷歌瀏覽器代理IP
chrome_options.add_argument("--proxy-server=http://" + proxy) # 指定瀏覽器分辨率
chrome_options.add_argument("window-size=1920x3000") # 模擬谷歌瀏覽器,並經過chrome_options參數傳遞代理IP
browser = webdriver.Chrome(chrome_options=chrome_options,executable_path="D:\chromedriver.exe") browser.get(url=url) print(browser.find_element_by_xpath("/html/body/pre").text)

運行結果以下,origin的IP地址已經更改成代理IP

{ "args": {}, "headers": { "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3", "Accept-Encoding": "gzip, deflate", "Accept-Language": "zh-cn", "Host": "httpbin.org", "Upgrade-Insecure-Requests": "1", "User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/74.0.3729.169 Safari/537.36" }, "origin": "182.92.113.183, 182.92.113.183", "url": "https://httpbin.org/get" }

最後附上selenium模塊add_argument經常使用參數

chrome_options.add_argument('--user-agent=""')                  # 設置請求頭的User-Agent
chrome_options.add_argument('--window-size=1280x1024')          # 設置瀏覽器分辨率(窗口大小)
chrome_options.add_argument('--start-maximized')                # 最大化運行(全屏窗口),不設置,取元素會報錯
chrome_options.add_argument('--disable-infobars')               # 禁用瀏覽器正在被自動化程序控制的提示
chrome_options.add_argument('--incognito')                      # 隱身模式(無痕模式)
chrome_options.add_argument('--hide-scrollbars')                # 隱藏滾動條, 應對一些特殊頁面
chrome_options.add_argument('--disable-javascript')             # 禁用javascript
chrome_options.add_argument('--blink-settings=imagesEnabled=false')  # 不加載圖片, 提高速度
chrome_options.add_argument('--headless')                       # 瀏覽器不提供可視化頁面
chrome_options.add_argument('--ignore-certificate-errors')      # 禁用擴展插件並實現窗口最大化
chrome_options.add_argument('--disable-gpu')                    # 禁用GPU加速
chrome_options.add_argument('--disable-software-rasterizer') chrome_options.add_argument('--disable-extensions')             # 禁止擴展
chrome_options.add_argument('--start-maximized')                # 啓動就最大化
chrome_options.add_argument("--proxy-server=http://xxxxxxx")    # 設置IP代理

另外還有其餘參數,請參考https://blog.csdn.net/liaojianqiu0115/article/details/78353267

相關文章
相關標籤/搜索