Python 網絡請求模塊 urllib 、requests

Python 給人的印象是抓取網頁很是方便，提供這種生產力的，主要依靠的就是 urllib、requests這兩個模塊。html

urlib 介紹

urllib.request 提供了一個 urlopen 函數，來實現獲取頁面。支持不一樣的協議、基本驗證、cookie、代理等特性。 urllib 有兩個版本 urllib 以及 urllib2。 urllib2 可以接受 Request 對象，urllib 則只能接受 url。 urllib 提供了 urlencode 函數來對GET請求的參數進行轉碼，urllib2 沒有對應函數。 urllib 拋出了一個 URLError 和一個 HTTPError 來處理客戶端和服務端的異常狀況。python

Requests 介紹

Requests 是一個簡單易用的，用Python編寫的HTTP庫。這個庫讓咱們可以用簡單的參數就完成HTTP請求，而沒必要像 urllib 同樣本身指定參數。同時可以自動將響應轉碼爲Unicode，並且具備豐富的錯誤處理功能。api

International Domains and URLs
Keep-Alive & Connection Pooling
Sessions with Cookie Persistence
Browser-style SSL Verification
Basic/Digest Authentication
Elegant Key/Value Cookies
Automatic Decompression
Unicode Response Bodies
Multipart File Uploads
Connection Timeouts
.netrc support
List item
Python 2.6—3.4
Thread-safe

如下爲一些示例代碼，本文環境爲 Python 3.6.0服務器

無需參數直接請求單個頁面

import urllib
from urllib.request import request
from urllib.urlopen import urlopen
# import urllib2
import requests

# 使用 urllib 方式獲取
response = urllib.request.urlopen('http://www.baidu.com')
# read() 讀取的是服務器的原始返回數據 decode() 後會進行轉碼
print(response.read().decode())

# 使用 requests 方式獲取
# request 模塊相比
resp = requests.get('http://www.baidu.com')
print(resp)
print(resp.text)

HTTP 是基於請求和響應的工做模式，urllib.request 提供了一個 Request 對象來表明請求，所以上面的代碼也能夠這麼寫cookie

req = urllib.request.Request('http://www.baidu.com')
with urllib.request.urlopen(req) as response:
print(response.read())

Request對象能夠增長header信息網絡

req = urllib.request.Request('http://www.baidu.com')
req.add_header('User-Agent', 'Mozilla/6.0 (iPhone; CPU iPhone OS 8_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/8.0 Mobile/10A5376e Safari/8536.25')
with urllib.request.urlopen(req) as response:
print(response.read())

或者直接將 header 傳入 Request 構建函數。函數

帶參數的 GET 請求

帶有參數的請求和上面的例子本質同樣，能夠事先拼出URL請求字符串，而後再進行請求。本例使用了騰訊的股票API，能夠傳入不一樣的股票代碼以及日期，查詢對應股票在對應時間的價格、交易信息。學習

# 使用帶參數的接口訪問
tencent_api = "http://qt.gtimg.cn/q=sh601939"

response = urllib.request.urlopen(tencent_api)
# read() 讀取的是服務器的原始返回數據 decode() 後會進行轉碼
print(response.read())

resp = requests.get(tencent_api)
print(resp)
print(resp.text)

發送 POST 請求

urllib 沒有單獨區分 GET 和 POST 請求的函數，只是經過 Request 對象是否有 data 參數傳入來判斷。url

import urllib.parse
import urllib.request
url = 'http://www.someserver.com/cgi-bin/register.cgi'
values = {'name' : 'Michael Foord',
          'location' : 'Northampton',
          'language' : 'Python' }
data = urllib.parse.urlencode(values)
data = data.encode('ascii') # data should be bytes req = urllib.request.Request(url, data)
with urllib.request.urlopen(req) as response:
   the_page = response.read()

參考資料：一、python3 urllib.request 網絡請求操做二、Python3學習筆記（urllib模塊的使用）三、Python模擬登陸的幾種方法四、What are the differences between the urllib, urllib2, and requests module? 五、python3 urllib和requests模塊spa