python學習之-requests模塊基礎

時間 2019-11-12

原文原文鏈接

安裝版本：2.18html

模塊導入：import requests

l 發送請求python

發送GET請求：nginx

獲取GITHUB的公共時間線git

r = requests.get(url='https://api.github.com/events')

如今r爲response對象，從這個對象能夠獲取想要的信息github

發送POST請求json

r = requests.post(url='http://httpbin.org/post', data={'key':'value'})

發送put請求api

r = requests.put(url='http://httpbin.org/put', data={'key':'value'})

發送delete請求跨域

r = requests.delete(url='http://httpbin.org/delete’)

發送head請求

r = requests.head(url='http://httpbin.org/get')

發送options請求

r = requests.optinos(url='http://httpbin.org/get')

以上爲request的基本用法。服務器

l 傳遞URL參數cookie

Requests容許使用params關鍵字參數，以一個字符串字典來提供，好比：傳遞 key1=value1 和 key2=value2 到 httpbin.org/get，代碼以下：

Pload = {'key1':'value1','key2':'value2'}

r = requests.get('http://httpbin.org/get', params=Pload)

輸入新的url：print(r.url)

http://httpbin.org/get?key1=value1&key2=value2

注意：以上字典裏若是出現值爲None，那麼健不會被添加到URL的查詢字符串裏

將一個列表做爲值傳入

Pload = {'key1':'value1','key2':['value2','value3']}

r = requests.get('http://httpbin.org/get', params=Pload)

輸入新的url：print(r.url)

http://httpbin.org/get?key1=value1&key2=value2&key2=value3

l 響應內容

讀取服務器響應的內容，以GITHUB時間線爲例

import requests

r = requests.get(url='https://api.github.com/events')

print(r.text)

[{"id":"7610277004","type":"IssuesEvent","actor":{"id":1049678,"login":"tkurki","display_login":"tkurki","gravatar_id":"","url":"https://api.github.com/users/tkurki","avatar_url":"https://avatars.githubusercontent.com/u/1049678?"},"repo":{"id":58462216,"name":"vazco/uniforms","url":"https://api.github.com/repos/vazco/uniforms"},"payload":{"action":"opened",…………

Requests會自動解碼來自服務器的內容，大多數unicode字符集都能被無縫的解碼。

請求發出後，requests會基於HTTP頭部對響應的編碼做出有根據的推測，當你訪問r.text之時，request會基於其推斷的文本進行編碼，你能夠找出request使用了什麼編碼，而且可以使用r.encoding屬性修改它

r = requests.get(url='https://api.github.com/events')

print(r.encoding)

輸出默認編碼：utf-8

r.encoding='ISO-8859-1'

print(r.encoding)

輸出修改後使用的編碼：ISO-8859-1

當修改編碼後，每當使用r.text，requests都將使用r.encoding的新值。

好比：HTTP,XML自身能夠指定編碼，這樣的話，能夠經過r.content來找到編碼，而後設置 r.encoding 爲相應的編碼，這樣就能使用正確的編碼解析r.text

r = requests.get(url='http://www.etongbao.com.cn')

r.content

b'<!DOCTYPE html>\n<html lang="zh-CN">\n <head>\n <meta charset="utf-8">\n …..

以BYTES類型打印頁面全部內容

l 二進制響應內容

r.content

b'<!DOCTYPE html>\n<html lang="zh-CN">\n <head>\n <meta charset="utf-8">\n …..

Requests會自動爲你解碼gzip和deflate傳輸編碼的響應數據。

例如：以請求返回的二進制數據建立一張圖片，可使用以下：

from PIL import Image

from io import BytesIO

i = Image.open(BytesIO(r.content))

l JSON響應內容

Requests中有一個內置的JSON解碼器，可阻你處理json數據。

import requests

r = requests.get(url='https://api.github.com/events')

print(type(r.json()))

print(r.json())

輸出：

{'message': "API rate limit exceede

若是r.jsnon解析失敗，r.json會拋出一個異常，如：ValueError: No JSON object could be decoded 異常，可是，有個服務器在失敗的響應中也包含一個json對象，這種json會被解碼返回，要檢查請求是否成功，請使用：

r.raise_for_status() 或者檢查r.status_code 是否和指望值相同。

l 原始響應內容

獲取來自服務器的原始套接字響應，須要在初始請求中設置：stream=True

import requests

r = requests.get(url='https://api.github.com/events', stream = True)

print(r.raw)

返回：<urllib3.response.HTTPResponse object at 0x023C12F0> 對象

print(r.raw.read(10))

返回：b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03' 返回原始10字節內容

通常狀況，如下面方式保存文本流

import requests

r = requests.get(url='https://api.github.com/events', stream = True)

with open('test', 'wb') as fb:

    for chunk in r.iter_content(chunk_size):

        fb.write(chunk)

使用r.iter_content將會處理大量你直接使用r.raw不得不處理的內容。

l 定製請求頭

爲請求添加HTTP頭部，經過傳遞一個dict給headers參數便可。

url = 'https://api.github.com/events'

headers = {'user-agent':'my-app/1.0.0'}

r = requests.get(url, headers=headers)

注意：定製header的優先級低於某些特定的信息源。例如：

l 若是在.netrc中設置了用戶認證信息，使用headers=設置的受權就不會生效。而若是設置了auth=參數，’’.netrc’’的設置就無效了

l 若是被重定向到別的主機，受權header就會被刪除

l 代理受權header會被URL中提供的代理身份覆蓋掉

l 在咱們能判斷內容長度的狀況下，header的content-length會被改寫

更進一步將，requests不會基於定製header的具體狀況改變本身的行爲，只不過在最後的請求中，全部的header信息都會被傳遞進去。

注意：全部的header值必須是string,bytestring或者unicode。儘管傳遞unicode header也是運行的，但不建議這樣作。

l 更加複雜的POST請求

發送編碼爲表單形式的數據，只需將一個字典傳遞給data參數，數據字典在發送請求時會自動編碼爲表單形式：

import requests

payload = {'key1':'value1','key2':'value2'}

r = requests.post(url='http://httpbin.org/post', data=payload)

print(r.text)

輸出：

"form": {

"key1": "value1",

"key2": "value2"

能夠爲data參數傳入一個元祖參數，在表單中多個元素使用同一個key的時候，方式以下：

import requests

payload = (('key1', 'value1'),('key1','value2'))

r = requests.post(url='http://httpbin.org/post', data=payload)

print(r.text)

輸出：

"form": {

"key1": [

"value1",

"value2"

]

若是傳遞的是字符串，非dict，那麼數據會被直接發送出去。

例如：github api v3接受編碼爲json的POST/PATCH數據

import requests,json

url = 'https://api.github.com/some/endpoint'

payload = {'sone':'data'}

r = requests.post(url, data=json.dumps(payload))

使用json參數直接傳遞，而後它就會被自動編碼，這是2.4.2版新加功能

import requests,json

url = 'https://api.github.com/some/endpoint'

payload = {'sone':'data'}

r = requests.post(url, json=payload)

這裏payload會被自動轉化爲json格式，

data=json.dumps(payload) == json=payload 這2個是相同的結果

l POST一個多部分編碼(Multipart-Encoded)的文件

Requests使上傳多部分編碼文件變得簡單

import requests,json

url = 'http://httpbin.org/post'

files = {'file':open('t1','rb')}

r = requests.post(url, files=files)

print(r.text)

輸出：

{ "files": {

"file": "zhaoyong\r\nzhaoyong\r\nzhaoyong"

}

顯示設置文件名，文件類型和請求頭

import requests,json

url = 'http://httpbin.org/post'

files = {'file':('t1', open('t1','rb'), 'application/vnd.ms-excel', {'Expires': '0'})}

r = requests.post(url, files=files)

print(r.text)

也能夠發送做爲文件來接收的字符串

import requests,json

url = 'http://httpbin.org/post'

files = {'file':('t2','zhaoyong,zhoayong,zhaoyong')}

r = requests.post(url, files=files)

print(r.text)

輸出：

"files": {

"file": "zhaoyong,zhoayong,zhaoyong"

若是發送很是大的文件做爲 multipart/form-data請求，默認狀況requests不支持作成數據流，有個第三方包：requests-toolbelt支持，參閱：toolbelt文檔，http://toolbelt.readthedocs.io/en/latest/

一個請求發送多文件參考：http://docs.python-requests.org/zh_CN/latest/user/advanced.html#advanced

警告：必定要用二進制模式打開文件，由於requests可能會試圖爲你提供Content-Length header，這個值會被設爲文件的字節數，若是用文本模式打開，可能會發生錯誤。

l 響應狀態碼

檢測響應狀態碼：

import requests,json

r = requests.get(url='http://httpbin.org/get')

print(r.status_code)

輸出：200

一個錯誤請求，使用raise_for_status()來拋出異常，無異常輸出爲None

import requests,json

bad_r = requests.get('http://httpbin.org/status/404')

bad_r.status_code

輸出：404

bad_r.raise_for_status()

輸出：

Traceback (most recent call last):

File "D:/AutoCobbler/dellIdrac/idrac_api.py", line 35, in <module>

bad_r.raise_for_status()

File "C:\Python36-32\lib\site-packages\requests\models.py", line 935, in raise_for_status

raise HTTPError(http_error_msg, response=self)

requests.exceptions.HTTPError: 404 Client Error: NOT FOUND for url: http://httpbin.org/status/404

l 響應頭

r.headers : 會以python字典形式展現服務器的響應頭

    'content-encoding': 'gzip',

    'transfer-encoding': 'chunked',

    'connection': 'close',

    'server': 'nginx/1.0.4',

    'x-runtime': '148ms',

    'etag': '"e1ca502697e5c9317743dc078f67693f"',

    'content-type': 'application/json'

注：HTTP頭部大小寫不敏感

所以，可使用任意大寫形式訪問這些響應頭字段

url = 'http://httpbin.org/post'

files = {'file':('t2','zhaoyong,zhoayong,zhaoyong')}

r = requests.post(url, files=files)

print(r.headers['content-type'])  # 以字典形式打印

print(r.headers.get('content-type'))　　# 經過get獲取數據

特殊點，服務器能夠屢次接受同一header，每次都使用不一樣的值，但requests會將它們合併，這樣他們就能夠用一個映射來表示出來。

l   Cookie

獲取cookie

import requests

url = 'http://example.com/some/cookie/setting/url'

r = requests.get(url)

r.cookies['example_cookie_name']

發送cookeis到服務器

import requests

url = 'http://httpbin.org/cookies'

cookies = dict(cookies_are = 'working')

r = requests.get(url, cookies = cookies)

print(r.text)

輸出：

  "cookies": {

    "cookies_are": "working"

Cookies的返回對象爲RequestsCookieJar,它和字典相似，適合跨域名跨路勁使用，能夠把cookiejar傳到requests中。

import requests

jar = requests.cookies.RequestsCookieJar()

jar.set('tasty_cookie', 'yum', domain='httpbin.org', path='/cookies')

jar.set('gross_cookie', 'blech', domain='httpbin.org', path='/elsewhere')

url = 'http://httpbin.org/cookies'

r = requests.get(url, cookies=jar)

print(r.text)

輸出：

  "cookies": {

    "tasty_cookie": "yum"

l   重定向與請求歷史

除了使用head, requests自動處理重定向，還可用history來追蹤重定向。

Response.history是一個response對象的列表，爲了完成請求而建立了這些對象，這個對象的列表按照從最老到最近的請求進行排序。

import requests

r = requests.get(url='http://github.com')

print(r.url)

print(r.status_code)

print(r.history)

輸出：

https://github.com/

[<Response [301]>]

若是使用的是：GET,POST,OPTIONS,PUT,PATCH或者DELETE，能夠經過allow_redirects參數禁用重定向處理。

import requests

r = requests.get(url='http://github.com', allow_redirects=False)

print(r.url)

print(r.history)

輸出：

http://github.com/

[]

若是使用HEAD，也能夠啓動重定向

import requests

r = requests.head(url='http://github.com', allow_redirects=True)

print(r.url)

print(r.history)

輸出：

https://github.com/

[<Response [301]>]

l   超時

Requests以timeout參數設定的秒數時間以後中止等待響應，若是不設定，程序有可能永遠失去響應。

import requests

r = requests.head(url='http://github.com', timeout=0.001)

輸出：

requests.exceptions.ConnectTimeout: HTTPConnectionPool(host='github.com', port=80): Max retries exceeded with url:......................

注意：timeout只對鏈接過程有效，與響應體的下載無關，timeout並非整個下載響應的時間限制，而是若是服務器在timeout秒內沒有應答，將會引起一個異常（精確的說，在timeout秒內沒有從基礎套接字上接收到任何字節的數據時）

l   錯誤與異常

如遇網絡問題（DNS查詢失敗，拒絕鏈接等）時，requests會拋出一個ConnectionError異常。

若是HTTP請求返回了不成功的狀態碼，response.raise_for_status()會拋出一個HTTPError異常。

若請求超時，則拋出一個timeout異常。

若請求超過了設定的最大重定向次數，則會拋出一個TooManyRedirects異常。

全部requests顯示拋出的異常都繼承自requests.exceptions.RequestException

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。