爬蟲—代理的使用

使用代理IP

一,requests使用代理

  requests的代理須要構造一個字典,而後經過設置proxies參數便可。html

import requests

proxy = '60.186.9.233'
proxies = {
    'http': 'http://' + proxy,
    'https': 'https://' + proxy
}
try:
    res = requests.get('http://httpbin.org/get', proxies=proxies)
    print(res.text)
except requests.exceptions.ConnectionError as e:
    print('error', e.args)

運行結果:python

{
  "args": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.18.4"
  }, 
  "origin": "60.186.9.233", 
  "url": "https://httpbin.org/get"
}

  其運行結果的origin是代理的IP,說明代理設置成功。若是代理須要認證,再代理的前面加上用戶名密碼便可。web

proxy = 'username:password@60.186.9.233'

二,Selenium使用代理

  Selenium一樣能夠設置代理,一種是有界面瀏覽器,Chrome爲例;另外一種是無頭瀏覽器,以PhantomJS爲例。chrome

Chrome瀏覽器設置瀏覽器

  經過chrome_options來設置代理,才建立Chrome對象的時候用chrome_options參數傳遞便可。運行代碼會彈出Chrome瀏覽器,訪問鏈接後看到以下結果。app

# chrome代理設置
from selenium import webdriver

proxy = '60.186.9.233'
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--proxy-server=http://' + proxy)
browser = webdriver.Chrome(chrome_options=chrome_options)
res = browser.get('http://httpbin.org/get')
{
  "args": {}, 
  "headers": {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8", 
    "Accept-Encoding": "gzip, deflate", 
    "Accept-Language": "zh-CN,zh;q=0.9", 
    "Host": "httpbin.org", 
    "Upgrade-Insecure-Requests": "1", 
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36"
  }, 
  "origin": "60.186.9.233", 
  "url": "https://httpbin.org/get"
}

 

PhantomJS設置url

  使用service_args參數將命令行的一些參數定義爲列表,在初始化的時候傳遞給PhantomJS就能夠了。spa

# PhantomJs代理設置
from selenium import webdriver

service_args = [
    '--proxy=60.186.9.233',
    '--proxy-type=http'
]
browser = webdriver.PhantomJS(service_args=service_args)
browser.get('http://httpbin.org/get')
print(browser.page_source)

運行結果:命令行

{
  "args": {}, 
  "headers": {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8", 
    "Accept-Encoding": "gzip, deflate", 
    "Accept-Language": "zh-CN,zh;q=0.9", 
    "Host": "httpbin.org", 
    "Upgrade-Insecure-Requests": "1", 
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36"
  }, 
  "origin": "60.186.9.233", 
  "url": "https://httpbin.org/get"
}

若是須要認證,那麼在service_args參數中加入--proxy-auth選項便可。代理

service_args = [
    '--proxy=60.186.9.233',
    '--proxy-type=http',
    '--proxy-auth=username:password'
]
相關文章
相關標籤/搜索