python requests庫學習

時間 2019-11-08

標籤 python requests 學習欄目 Python 简体版

原文原文鏈接

Requests

python的request庫官方介紹就是讓HTTP服務人類，因此從這點咱們就能夠知道request庫是爲了讓咱們更加方便的進行http相關的各類操做html

咱們學習request有什麼用呢？node

1）web時代咱們須要熟悉掌握web交互原理python

2）爬蟲nginx

3）服務器編程git

4）自動化測試github

實驗環境準備

首先是環境的準備，首先咱們確定要裝requests庫直接使用pip命令便可（注意：本文使用的是py3.6版本）web

同時咱們須要一個服務器來測試咱們的各類操做，咱們能夠直接使用requests庫做者寫的一個網站 httpbin.org 來進行咱們的各類實驗操做編程

requests例程

咱們能夠從下面的簡單的例程來簡單瞭解下requestsjson

import requests

#實驗web地址 http://httpbin.org
url_ip='http://httpbin.org/ip'
url_get='http://httpbin.org/get'

#直接使用
def requests_simple():
    #利用get方法獲得一個response
    response1=requests.get(url_ip)
    #打印頭部headders
    print('Response Headers:',response1.headers)
    #打印Body
    print('Response Body:',response1.text)

#帶參數的請求
def requests_params():
    params_test={'param1':'hello','param2':'world'}
    #發送請求
    response2=requests.get(url_get,params=params_test)
    #處理響應
    #打印頭部headders
    print('Response Headers:', response2.headers)
    #打印Status Code
    print('Response Status Code',response2.status_code)
    # 打印Body  上面用到的是text 咱們能夠直接使用 .json()方法獲得js格式
    print('Response Body:', response2.json())


if __name__=='__main__':
    print('___________requests_simple方法____________')
    requests_simple()
    print('___________requests_params方法____________')
    requests_params()

看下輸出結果：api

___________requests_simple方法____________
Response Headers: {'Access-Control-Allow-Credentials': 'true', 'Access-Control-Allow-Origin': '*', 'Content-Encoding': 'gzip', 'Content-Type': 'application/json', 'Date': 'Tue, 13 Aug 2019 08:21:40 GMT', 'Referrer-Policy': 'no-referrer-when-downgrade', 'Server': 'nginx', 'X-Content-Type-Options': 'nosniff', 'X-Frame-Options': 'DENY', 'X-XSS-Protection': '1; mode=block', 'Content-Length': '56', 'Connection': 'keep-alive'}
Response Body: {
  "origin": "115.51.238.17, 115.51.238.17"
}

___________requests_params方法____________
Response Headers: {'Access-Control-Allow-Credentials': 'true', 'Access-Control-Allow-Origin': '*', 'Content-Encoding': 'gzip', 'Content-Type': 'application/json', 'Date': 'Tue, 13 Aug 2019 08:21:41 GMT', 'Referrer-Policy': 'no-referrer-when-downgrade', 'Server': 'nginx', 'X-Content-Type-Options': 'nosniff', 'X-Frame-Options': 'DENY', 'X-XSS-Protection': '1; mode=block', 'Content-Length': '215', 'Connection': 'keep-alive'}
Response Status Code 200
Response Body: {'args': {'param1': 'hello', 'param2': 'world'}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.18.4'}, 'origin': '115.51.238.17, 115.51.238.17', 'url': 'https://httpbin.org/get?param1=hello&param2=world'}

程序中每一步都有詳細註解，就再也不贅述

Requests發送請求

請求方法：

GET:　　　　查看資源
POST:　　　增長資源
PUT:　　　　修改資源
PATCH:　　更新資源
DELETE:　　刪除資源
HEAD:　　　查看響應頭
OPTIONS:　查看可用請求方法

咱們利用github提供的api接口來進行詳解網址https://developer.github.com/v3

咱們先來看下github上對於http請求方法的解釋

例程使用github api接口查看某個用戶的公開信息

網址：https://developer.github.com/v3/users/

如圖中最後一行顯示，咱們可使用 GET /users/:username 來獲取信息下面爲代碼：

import requests
import json

url='https://api.github.com'

#構建url函數  咱們須要提交的url是原api網址加上咱們本身額外添加的參數
# 因此簡單寫一個組成url的函數 即添加 '/' 號
def build_url(conend):
    return '/'.join([url,conend])

#更好的顯示返回json的打印參數
#咱們從網頁獲得的json可使用下面函數來更好的print
def better_jsprint(json_str):
    return json.dumps(json.loads(json_str),indent=4)

#主體函數 get請求獲得迴應  例如咱們查看用戶名爲Test的公開資料
def requests_method():
    response=requests.get(build_url('users/Test'))
    print(better_jsprint(response.text))

if __name__=="__main__":
    requests_method()

結果返回：

{
    "login": "test",
    "id": 383316,
    "node_id": "MDQ6VXNlcjM4MzMxNg==",
    "avatar_url": "https://avatars3.githubusercontent.com/u/383316?v=4",
    "gravatar_id": "",
    "url": "https://api.github.com/users/test",
    "html_url": "https://github.com/test",
    "followers_url": "https://api.github.com/users/test/followers",
    "following_url": "https://api.github.com/users/test/following{/other_user}",
    "gists_url": "https://api.github.com/users/test/gists{/gist_id}",
    "starred_url": "https://api.github.com/users/test/starred{/owner}{/repo}",
    "subscriptions_url": "https://api.github.com/users/test/subscriptions",
    "organizations_url": "https://api.github.com/users/test/orgs",
    "repos_url": "https://api.github.com/users/test/repos",
    "events_url": "https://api.github.com/users/test/events{/privacy}",
    "received_events_url": "https://api.github.com/users/test/received_events",
    "type": "User",
    "site_admin": false,
    "name": null,
    "company": null,
    "blog": "",
    "location": null,
    "email": null,
    "hireable": null,
    "bio": null,
    "public_repos": 5,
    "public_gists": 0,
    "followers": 23,
    "following": 0,
    "created_at": "2010-09-01T10:39:12Z",
    "updated_at": "2019-02-13T02:44:23Z"
}

這樣咱們就獲得了名稱爲Test的用戶的公開信息了

一樣的咱們也可使用githubapi文檔中的其餘方法進行測試

帶參數的請求

首先來看經常使用的帶參數請求的三種方法

params參數請求例程：

仍是使用github的api接口此次咱們使用下面這個提供的請求方法

能夠看到該方法中須要傳入 params參數下面是程序代碼：

import requests
import json

url='https://api.github.com'

#構建url函數  咱們須要提交的url是原api網址加上咱們本身額外添加的參數
# 因此簡單寫一個組成url的函數 即添加 '/' 號
def build_url(conend):
    return '/'.join([url,conend])

#更好的顯示返回json的打印參數
#咱們從網頁獲得的json可使用下面函數來更好的print
def better_jsprint(json_str):
    return json.dumps(json.loads(json_str),indent=4)

#主體函數 添加params參數來進行請求
def requests_params():
    response=requests.get(build_url('users'),params={'since':11})
    #打印返回信息
    print (better_jsprint(response.text))
    #查看具體的url地址
    print (response.url)

if __name__=="__main__":
    requests_params()

返回結果很長就再也不展現只展現下此請求的url地址爲 https://api.github.com/users?since=11

json參數請求例程

例程1 咱們使用api中更新用戶信息的方法來測試json參數請求

下面爲代碼：

import requests
import json

url='https://api.github.com'

#構建url函數  咱們須要提交的url是原api網址加上咱們本身額外添加的參數
# 因此簡單寫一個組成url的函數 即添加 '/' 號
def build_url(conend):
    return '/'.join([url,conend])

#更好的顯示返回json的打印參數
#咱們從網頁獲得的json可使用下面函數來更好的print
def better_jsprint(json_str):
    return json.dumps(json.loads(json_str),indent=4)

#主體函數 添加json參數來進行請求 下面代碼中的auth用來帳戶認證 即本身的帳戶密碼
def requests_json():
    response=requests.patch(build_url('user'),auth=('此處爲你的帳戶','此處爲你的密碼'),json={'name':'此處爲你想要改的名字'})
    #打印返回信息
    print (better_jsprint(response.text))
    #查看具體的url地址
    print (response.url)

if __name__=="__main__":
    requests_json()

運行代碼以後咱們能夠打開github主頁查看本身的帳戶名稱是否改變

例程2 使用post json參數請求添加一個email地址

代碼展現：

import requests
import json

url='https://api.github.com'

#構建url函數  咱們須要提交的url是原api網址加上咱們本身額外添加的參數
# 因此簡單寫一個組成url的函數 即添加 '/' 號
def build_url(conend):
    return '/'.join([url,conend])

#更好的顯示返回json的打印參數
#咱們從網頁獲得的json可使用下面函數來更好的print
def better_jsprint(json_str):
    return json.dumps(json.loads(json_str),indent=4)

#主體函數 添加json參數來進行請求 代碼中的auth參數爲認證信息 即本身的github帳戶密碼 json參數傳入要添加的email地址
def requests_json():
    response=requests.post(build_url('user/emails'),auth=('帳戶','密碼'),json=['test@qq.com'])
    #打印返回信息
    print (better_jsprint(response.text))
    #查看具體的url地址
    print (response.url)

if __name__=="__main__":
    requests_json()

返回內容：

[
    {
        "email": "1231231230@qq.com",
        "primary": true,
        "verified": true,
        "visibility": "private"
    },
    {
        "email": "32220+H3213@users.noreply.github.com",
        "primary": false,
        "verified": true,
        "visibility": null
    },
    {
        "email": "test@qq.com",
        "primary": false,
        "verified": false,
        "visibility": null
    }
]
https://api.github.com/user/emails

至此咱們已經完成了使用json參數進行請求

請求異常處理

咱們都知道在互聯網上常常會出現不少錯誤，好比超時，更好比404之類的，當咱們寫的程序遇到請求異常該如何處理呢？

咱們以超時（Timeout）和HTTPERR爲例寫出一個例程來分析：

首先錯誤都在requests包中的exceptions中因此咱們要引用：

from requests import exceptions

再例如對於timeout的設置有兩種方法：

requests.get(url,timeout=(3,7))
requests.get(url,timeout=10)

區別是什麼呢？咱們都知道訪問一個網站是咱們發送一個請求，而後網站給予咱們一個響應，因此第一種方法中的3和7分別對應這兩個過程的超時時間限制，

第二種則是整個訪問過程的時間限制

timeout例程：

import requests
import json
from requests import exceptions

url='https://api.github.com'

#構建url函數  咱們須要提交的url是原api網址加上咱們本身額外添加的參數
# 因此簡單寫一個組成url的函數 即添加 '/' 號
def build_url(conend):
    return '/'.join([url,conend])

#更好的顯示返回json的打印參數
#咱們從網頁獲得的json可使用下面函數來更好的print
def better_jsprint(json_str):
    return json.dumps(json.loads(json_str),indent=4)

#主體函數 添加timeout參數限制訪問時間，使用try except來捕獲錯誤並打印
def requests_err():
    try:
        response=requests.get(build_url('user/emails'),timeout=0.1)
    except exceptions.Timeout as err:
        print(err)


if __name__=="__main__":
    requests_err()

返回值：

HTTPSConnectionPool(host='api.github.com', port=443): Max retries exceeded with url: /user/emails (Caused by ConnectTimeoutError(<urllib3.connection.VerifiedHTTPSConnection object at 0x000002990F1A3F28>, 'Connection to api.github.com timed out. (connect timeout=0.1)'))

httperror例程：

httperror例程中咱們能夠顯式拋出狀態值statuscode而後捕獲

import requests
import json
from requests import exceptions

url='https://api.github.com'

#構建url函數  咱們須要提交的url是原api網址加上咱們本身額外添加的參數
# 因此簡單寫一個組成url的函數 即添加 '/' 號
def build_url(conend):
    return '/'.join([url,conend])

#更好的顯示返回json的打印參數
#咱們從網頁獲得的json可使用下面函數來更好的print
def better_jsprint(json_str):
    return json.dumps(json.loads(json_str),indent=4)

#主體函數 下列代碼中並未添加認證信息，因此狀態值不爲200，咱們須要raise出狀態值而後捕獲
def requests_err():
    try:
        response=requests.get(build_url('user/emails'))
        response.raise_for_status()
    except exceptions.HTTPError as err:
        print(err)

if __name__=="__main__":
    requests_err()

返回值：

401 Client Error: Unauthorized for url: https://api.github.com/user/emails

定製請求

咱們能夠自定義構造一些信息而後向目標網址發送來實現某些功能好比爬蟲之類的下面舉簡單的修改頭部信息

例程：

import requests

#主體函數 在代碼中添加頭部信息來向目標網址發送本身定製請求
def requests_header():
        response=requests.get('http://httpbin.org/get',headers={'User-Agent':'fake'})
        print(response.text)

if __name__=="__main__":
    requests_header()

看看返回信息：（http://httpbin.org/get 會返回咱們的信息）

{
  "args": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "httpbin.org", 
    "User-Agent": "fake"
  }, 
  "origin": "115.51.238.17, 115.51.238.17", 
  "url": "https://httpbin.org/get"
}

注意看咱們的頭部信息已經完成修改

響應的處理

# 響應的經常使用屬性
　　response.text # 響應回去的文本（字符串）
　　response.content # 響應回去的內容（二進制），通常用來爬取視頻
　　response.status_code # 響應的狀態碼
　　response.url # 獲取請求鏈接地址
　　response.cookies # 獲取返回的cookies信息
　　response.cookies.get_dict() # 獲取返回的cookies信息
　　response.request # 獲取請求方式
　　response.headers # 查看響應頭
　　response.history # 重定向歷史即前一次請求的地址

# 返回結果爲json數據處理
　　response.json() # 將結果進行反序列化

# 爬取文檔亂碼問題
　　response.apparent_encoding # 文檔的編碼的方式（從HTML文檔找）
　　response.encoding # 響應體編碼方式
　　eg: response.encoding = response.apparent_encoding # 文檔的聲明方式
　　eg: print(response.text.encode('utf-8'))

注意：response.headers是服務器發送給咱們的頭部信息，response.request.headers纔是咱們這個客服端向服務器發請求的頭部信息（即本身的信息）

下載文件

例如咱們如今去網絡下載一張圖片，只不過如今咱們使用requests庫來完成

就好比百度官網logo 地址：https://www.baidu.com/img/bd_logo1.png

簡單的下載例程以下：

import requests

url='https://www.baidu.com/img/bd_logo1.png'

def download_img():
    response=requests.get(url)
    with open('logo.png','wb') as img:#此步驟涉及文件讀寫操做 圖片是二進制，因此要用二進制寫文件 用參數 'wb'
        img.write(response.content)

if __name__=='__main__':
    download_img()

而後會發現當前目錄下出現了 logo.png的圖片此時已經下載成功

須要注意的是有時候文件很大，咱們須要以數據流的方式讀寫，不然可能形成內存溢出問題，

當下載大的文件的時候，咱們能夠在requests.get()中使用stream參數．

默認狀況下是false，他會當即開始下載文件並存放到內存當中，假若文件過大就會致使內存不足的狀況．

當把get函數的stream參數設置成True時，它不會當即開始下載，當你使用iter_content或iter_lines遍歷內容或訪問內容屬性時纔開始下載。須要注意一點：文件沒有下載以前，它也須要保持鏈接。

iter_content：一塊一塊的遍歷要下載的內容
iter_lines：一行一行的遍歷要下載的內容

使用上面兩個函數下載大文件能夠防止佔用過多的內存，由於每次只下載小部分數據。

例程：

import requests

url='https://www.baidu.com/img/bd_logo1.png'

def download_img():
    response=requests.get(url,stream=True)#stream參數記得選爲True
    with open('logo.png','wb') as img:#圖片是二進制，因此要用二進制寫文件 用參數 'wb'
        for chunk in response.iter_content(1024):#規定一次讀取寫入大小 單位爲字節
            img.write(chunk)
        response.close()#關閉數據流鏈接

if __name__=='__main__':
    download_img()

有時候會被服務器禁止請求，緣由多是user-agent未更改，這就至關於簡單的反爬蟲了，因此通常咱們寫爬蟲的時候須要修改user-agent信息，這些問題在講解爬蟲時會講

簡單總結流程就是：

瀏覽器模擬（修改headers）-->構建request-->讀取數據-->寫入數據

身份認證

auth

前面已經講過，咱們能夠傳入 auth 參數來將本身的帳號密碼傳入目標網址來進行身份驗證，但是這種方式是安全的嗎？

例如：

import requests

url='https://api.github.com'

#構建url函數  咱們須要提交的url是原api網址加上咱們本身額外添加的參數
# 因此簡單寫一個組成url的函數 即添加 '/' 號
def build_url(conend):
    return '/'.join([url,conend])

#基礎身份認證
def http_auth():
    response=requests.get(build_url('user'),auth=('test','test123'))
    #看下咱們的請求的頭部數據
    print(response.request.headers)

http_auth()

咱們再看下返回頭部數據：

{'User-Agent': 'python-requests/2.18.4', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive', 'Authorization': 'Basic dGVzdDp0ZXN0MTIz'}

咱們發現裏面含有 'Authorization': 'Basic dGVzdDp0ZXN0MTIz' 一段那這個會是咱們的帳號密碼嗎? 咱們使用解碼來看下

import base64

print(base64.b64decode('dGVzdDp0ZXN0MTIz'))

輸出：

b'test:test123'

發現還真的是咱們的帳號密碼，因此這樣也不是很安全

所以如今普遍使用更安全的Oauth認證

OAuth

oauth是Open Authorization的簡寫，oauth協議爲用戶資源的受權提供了一個安全的、開放而又建議的標準。第三方無需使用用戶的用戶名與密碼就能夠申請得到該用戶資源的受權，所以oauth是安全的。

最最簡單的例子就是如今咱們登陸多個網站（例如微博，網盤等）應用能夠直接選擇QQ快速登陸，這就是OAuth認證，它無需你輸入帳戶密碼便可認證

下面咱們仍是用github的api來作一個簡單的oauth認證

首先打開github網址：https://github.com/settings/tokens/new 來申請一個本身的令牌認證記得勾選須要的權限選項

而後咱們就獲得了一串神祕代碼以下圖

接下來咱們開始寫簡單的例程：

import requests

url='https://api.github.com'

#構建url函數  咱們須要提交的url是原api網址加上咱們本身額外添加的參數
# 因此簡單寫一個組成url的函數 即添加 '/' 號
def build_url(conend):
    return '/'.join([url,conend])

#基礎oauth認證
def http_oauth():
    #構建頭部數據 加上token信息
    header={'Authorization':'token 62646e55689b597eb076cb08c5e020e3762bc84f '}
    response=requests.get(build_url('user'),headers=header)
    print(response.text)

http_oauth()