Python3之requests模塊

時間 2019-11-12

標籤 python3 python requests 模塊欄目 Python 简体版

原文原文鏈接

　　Python標準庫中提供了：urllib等模塊以供Http請求，可是，它的 API 太渣了。它是爲另外一個時代、另外一個互聯網所建立的。它須要巨量的工做，甚至包括各類方法覆蓋，來完成最簡單的任務。html

　　發送GET請求node

import urllib.request

f = urllib.request.urlopen('http://www.webxml.com.cn//webservices/qqOnlineWebService.asmx/qqCheckOnline?qqCode=424662508')
result = f.read().decode('utf-8')

　　發送攜帶請求頭的GET請求python

import urllib.request

req = urllib.request.Request('http://www.example.com/')
req.add_header('Referer', 'http://www.python.org/')
r = urllib.request.urlopen(req)

result = f.read().decode('utf-8')

　　更多內容點擊查看官方文檔nginx

　　Requests 是使用 Apache2 Licensed 許可證的基於Python開發的HTTP 庫，其在Python內置模塊的基礎上進行了高度的封裝，從而使得Pythoner進行網絡請求時，變得美好了許多，使用Requests能夠垂手可得的完成瀏覽器可有的任何操做。git

requests庫特性：

Keep-Alive & 鏈接池
國際化域名和 URL
帶持久 Cookie 的會話
瀏覽器式的 SSL 認證
自動內容解碼
基本/摘要式的身份認證
優雅的 key/value Cookie
自動解壓
Unicode 響應體
HTTP(S) 代理支持
文件分塊上傳
流下載
鏈接超時
分塊請求
支持 .netrc

1. 安裝模塊

安裝:
	pip install requests
更新：
	pip install --upgrade requests

2. 使用模塊

　　HTTP的請求類型有POST，GET，PUT，DELETE，HEAD 以及 OPTIONS，其中POST和GET是最常使用的。github

　　GET請求web

import requests
# 無參數示例
r = requests.get('https://httpbin.org/get')
# 有參數示例
r = requets.get('http://httpbin.org/get', params=d)

傳遞URL參數：
    在URL中常見?符號，http://httpbin.org/get?key=val 這種帶有?傳遞關鍵字參數的方式，requests能夠經過params實現。
d = {'k1':'v1', 'k2':'v2', 'k3':None, 'k4':['v4','v5']}  
    # 字典中鍵值爲None的鍵不會被添加到URL中
    # 多個鍵值中間用&符號鏈接
    # 鍵值但是列表 例如'k4'
print(r.url)
執行結果爲：http://httpbin.org/get?k1=v1&k2=v2&k4=v4&k4=v5

　　POST請求json

# 一、基本POST實例
 
import requests
 
payload = {'key1': 'value1', 'key2': 'value2'}
ret = requests.post("http://httpbin.org/post", data=payload)
print(ret.text)
# 輸出結果
{
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {
    "key1": "value1", 
    "key2": "value2"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Connection": "close", 
    "Content-Length": "23", 
    "Content-Type": "application/x-www-form-urlencoded", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.18.4"
  }, 
  "json": null, 
  "origin": "不告訴你這裏返回的是你的IP地址", 
  "url": "http://httpbin.org/post"
}

 
# 二、發送請求頭和數據實例
 
import requests
import json

url = 'https://api.github.com/some/endpoint'
payload = {'some': 'data'}
headers = {'content-type': 'application/json'}
ret = requests.post(url, data=json.dumps(payload), headers=headers)
print(ret.text)
print(ret.cookies)
# 輸出結果
{"message":"Not Found","documentation_url":"https://developer.github.com/v3"}
<RequestsCookieJar[]>

　　關於響應內容api

　requests模塊的返回對象是一個Response對象，能夠從這個對象中獲取須要的信息。下面 r 表明Response對象。跨域

r.text 文本響應內容
r.context 二進制響應內容
r.json() JSON響應內容
r.raw 原始相應內容

# 文本響應內容
    Response對象包含不少信息，Requests能夠自動對大多數unicode字符集無縫解碼。
    請求發出後，Requests會基於HTTP頭部對響應的編碼作出有根據的推測。
    咱們能夠經過r.encoding獲得編碼，也可使用r.encoding屬性改變編碼

#二進制響應內容
      對於非文本請求r.content，Requests會自動解碼gzip和deflate傳輸編碼的響應內容。

# JSON相應內容
    須要注意若是JSON解碼失敗，r.json()會拋出異常。然而成功調用r.json()並不意味着響應成功，由於某些服務器失敗
    的相應中也會包含一個JSON對象，這種JSON會被解碼返回。若是要判斷請求是否成功，可使用r.raise_for_status()
    或者檢查r.status_code是否和預期相同。

# 原始相應內容
    若是須要獲取服務器的原始套接字相應，可使用r.raw，使用時要確保在初始請求中設置了 stream=True
r = requests.get('https://httpbin.org/get', stream=True)
print(r.raw)
print(r.raw.read(10))
# 結果輸出
<urllib3.response.HTTPResponse object at 0x061665F0>
b'{\n  "args"

相應內容介紹

　　定製請求頭

若是想要添加HTTP頭部，只須要傳遞一個字典給headers參數便可。注意: 全部的 header 值必須是 string、bytestring 或者 unicode。儘管傳遞 unicode header 也是容許的，但不建議這樣作。

注意：定製header的優先級低於某些特定的信息源，例如：

若是在 .netrc 中設置了用戶認證信息，使用 headers= 設置的受權就不會生效。而若是設置了 auth= 參數，``.netrc`` 的設置就無效了。
若是被重定向到別的主機，受權 header 就會被刪除。
代理受權 header 會被 URL 中提供的代理身份覆蓋掉。
在咱們能判斷內容長度的狀況下，header 的 Content-Length 會被改寫

更進一步講，Requests 不會基於定製 header 的具體狀況改變本身的行爲。只不過在最後的請求中，全部的 header 信息都會被傳遞進去。

url = 'https://api.github.com/some/endpoint'
headers = {'user-agent': 'my-app/0.0.1'}
r = requests.get(url, headers=headers)

　　響應狀態碼

能夠經過響應狀態碼得知請求的結果，通常 200表示請求成功，Requests還附帶一個內置的狀態碼查詢對象 request.codes:

>>> r = requests.get('http://httpbin.org/get')
>>> r.status_code
200
>>> r.status_code == requests.codes.ok
True

# 若是發送了一個錯誤請求(一個 4XX 客戶端錯誤，或者 5XX 服務器錯誤響應)，咱們能夠經過 Response.raise_for_status() 來拋出異常：

>>> bad_r = requests.get('http://httpbin.org/status/404')
>>> bad_r.status_code
404

>>> bad_r.raise_for_status()
Traceback (most recent call last):
  File "requests/models.py", line 832, in raise_for_status
    raise http_error
requests.exceptions.HTTPError: 404 Client Error

# 可是，因爲咱們的例子中 r 的 status_code 是 200 ，當咱們調用 raise_for_status() 時，獲得的是：
>>> r.raise_for_status()
None

　　響應頭

>>> r.headers
{
    'content-encoding': 'gzip',
    'transfer-encoding': 'chunked',
    'connection': 'close',
    'server': 'nginx/1.0.4',
    'x-runtime': '148ms',
    'etag': '"e1ca502697e5c9317743dc078f67693f"',
    'content-type': 'application/json'
}

#可是這個字典比較特殊：它是僅爲 HTTP 頭部而生的。根據 RFC 2616， HTTP 頭部是大小寫不敏感的。

>>> r.headers['Content-Type']
'application/json'

>>> r.headers.get('content-type')
'application/json'

　　Cookie

>>> url = 'http://example.com/some/cookie/setting/url'
>>> r = requests.get(url)

>>> r.cookies['example_cookie_name']
'example_cookie_value'

# 若是想要發送你的cookies到服務器，可使用cookies參數
>>> url = 'http://httpbin.org/cookies'
>>> cookies = dict(cookies_are='working')
>>> r = requests.get(url, cookies=cookies)
>>> r.text
'{"cookies": {"cookies_are": "working"}}'

# Cookie 的返回對象爲 RequestsCookieJar，它的行爲和字典相似，但界面更爲完整，適合跨域名跨路徑使用。你還能夠把 Cookie Jar 傳到 Requests 中：
>>> jar = requests.cookies.RequestsCookieJar()
>>> jar.set('tasty_cookie', 'yum', domain='httpbin.org', path='/cookies')
>>> jar.set('gross_cookie', 'blech', domain='httpbin.org', path='/elsewhere')
>>> url = 'http://httpbin.org/cookies'
>>> r = requests.get(url, cookies=jar)
>>> r.text
'{"cookies": {"tasty_cookie": "yum"}}'

　　超時

你能夠告訴 requests 在通過以 timeout 參數設定的秒數時間以後中止等待響應。基本上全部的生產代碼都應該使用這一參數。若是不使用，你的程序可能會永遠失去響應。

>>> requests.get('http://github.com', timeout=0.001)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
requests.exceptions.Timeout: HTTPConnectionPool(host='github.com', port=80): Request timed out. (timeout=0.001)

# 注意事項
    timeout 僅對鏈接過程有效，與響應體的下載無關。 timeout 並非整個下載響應的時間限制，而是若是服務器在 timeout 秒內沒有應答，
    將會引起一個異常（更精確地說，是在 timeout 秒內沒有從基礎套接字上接收到任何字節的數據時）
    If no timeout is specified explicitly, requests do not time out.

　　錯誤與異常

遇到網絡問題（如：DNS 查詢失敗、拒絕鏈接等）時，Requests 會拋出一個 ConnectionError 異常。
若是 HTTP 請求返回了不成功的狀態碼， Response.raise_for_status() 會拋出一個 HTTPError 異常。
若請求超時，則拋出一個 Timeout 異常。
若請求超過了設定的最大重定向次數，則會拋出一個 TooManyRedirects 異常。
全部Requests顯式拋出的異常都繼承自 requests.exceptions.RequestException 。

　　其餘請求

requests.get(url, params=None, **kwargs)
requests.post(url, data=None, json=None, **kwargs)
requests.put(url, data=None, **kwargs)
requests.head(url, **kwargs)
requests.delete(url, **kwargs)
requests.patch(url, data=None, **kwargs)
requests.options(url, **kwargs)
 
# 以上方法均是在此方法的基礎上構建
requests.request(method, url, **kwargs)

3. Http請求和XML實例

實例：檢測QQ帳號是否在線

import urllib
import requests
from xml.etree import ElementTree as ET

# 使用內置模塊urllib發送HTTP請求，或者XML格式內容
"""
f = urllib.request.urlopen('http://www.webxml.com.cn//webservices/qqOnlineWebService.asmx/qqCheckOnline?qqCode=424662508')
result = f.read().decode('utf-8')
"""


# 使用第三方模塊requests發送HTTP請求，或者XML格式內容
r = requests.get('http://www.webxml.com.cn//webservices/qqOnlineWebService.asmx/qqCheckOnline?qqCode=424662508')
result = r.text

# 解析XML格式內容
node = ET.XML(result)

# 獲取內容
if node.text == "Y":
    print("在線")
else:
    print("離線")

實例：查看火車停靠信息

import urllib
import requests
from xml.etree import ElementTree as ET

# 使用內置模塊urllib發送HTTP請求，或者XML格式內容
"""
f = urllib.request.urlopen('http://www.webxml.com.cn/WebServices/TrainTimeWebService.asmx/getDetailInfoByTrainCode?TrainCode=G666&UserID=')
result = f.read().decode('utf-8')
"""

# 使用第三方模塊requests發送HTTP請求，或者XML格式內容
r = requests.get('http://www.webxml.com.cn/WebServices/TrainTimeWebService.asmx/getDetailInfoByTrainCode?TrainCode=G666&UserID=')
result = r.text

# 解析XML格式內容
root = ET.XML(result)
for node in root.iter('TrainDetailInfo'):
    print(node.find('TrainStation').text,node.find('StartTime').text,node.tag,node.attrib)

相關標籤/搜索

requests模塊

python3+requests

模塊

python3+requests+unittest

python3+requests+beautifulsoup+mysql

python3+requests+excel

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。