Python Requests庫介紹

時間 2019-12-20

原文原文鏈接

Requests惟一的一個非轉基因的Python HTTP庫，人類能夠安全享用。javascript

警告：非專業使用其餘HTTP庫會致使危險的反作用，包括：安全缺陷症、冗餘代碼症、從新發明輪子症、啃文檔症、抑鬱、頭疼、甚至死亡。¹css

環境準備：

# 新建Python3.x虛擬環境
> mkvirtualenv Py3_requests

# 安裝request庫
(Py3_requests) > pip install requests

# 環境
(Py3_requests) > python --version
Python 3.7.1

(Py3_requests) > pip list
Package      Version
------------ ----------
certifi    2018.11.29
chardet    3.0.4
idna       2.8
pip        19.0.2
requests   2.21.0
setuptools 40.8.0
urllib3    1.24.1
wheel      0.33.0

(Py3_requests) >

Requests類庫官方中文文檔： http://docs.python-requests.org/zh_CN/latest/index.htmlhtml

Requests類庫GitHub源碼：https://github.com/kennethreitz/requestsjava

Requests類庫做者Kenneth Reitz博客： https://www.kennethreitz.orgpython

HTTP協議簡介

什麼是http協議？git

HyperText Transfer Protocol 超文本傳輸協議

The Hypertext Transfer Protocol(HTTP) is a stateless(無狀態)
application-level protocol for distributed(分佈式),
collaborative(協做式), hypertext information systems(超文本信息系統).

curl命令

在Linux中curl是一個利用URL規則在命令行下工做的文件傳輸工具，能夠說是一款很強大的http命令行工具。它支持文件的上傳和下載，是綜合傳輸工具，但按傳統，習慣稱url爲下載工具。github

語法：$ curl [option] [url]web

經常使用參數：
-v 參數能夠顯示一次http通訊的整個過程，包括端口鏈接和http request頭信息。shell

(Py3_requests)  > curl --help
# ... 省略

(Py3_requests)  > curl -v http://www.baidu.com > tmp.txt
* Rebuilt URL to: http://www.baidu.com/
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 14.215.177.39...
* TCP_NODELAY set
* Connected to www.baidu.com (14.215.177.39) port 80 (#0)
> GET / HTTP/1.1
> Host: www.baidu.com
> User-Agent: curl/7.55.1
> Accept: */*
>
< HTTP/1.1 200 OK
< Accept-Ranges: bytes
< Cache-Control: private, no-cache, no-store, proxy-revalidate, no-transform
< Connection: Keep-Alive
< Content-Length: 2381
< Content-Type: text/html
< Date: Sun, 17 Feb 2019 11:18:49 GMT
< Etag: "588604d8-94d"
< Last-Modified: Mon, 23 Jan 2017 13:27:52 GMT
< Pragma: no-cache
< Server: bfe/1.0.8.18
< Set-Cookie: BDORZ=27315; max-age=86400; domain=.baidu.com; path=/
<
{ [1040 bytes data]
100  2381  100  2381    0     0   2381      0  0:00:01 --:--:--  0:00:01  5850
* Connection #0 to host www.baidu.com left intact

(Py3_requests)  >

Request

Start Line: 方法地址協議django

Headers: key:value

> GET / HTTP/1.1  # ... Start Line
# ... Headers
> Host: www.baidu.com
> User-Agent: curl/7.55.1
> Accept: */*
>

Response

Start Line: 協議狀態碼具體解釋

Headers: key:value

< HTTP/1.1 200 OK # ... Start Line
# ... Headers
< Accept-Ranges: bytes
< Cache-Control: private, no-cache, no-store, proxy-revalidate, no-transform
< Connection: Keep-Alive
< Content-Length: 2381
< Content-Type: text/html
< Date: Sun, 17 Feb 2019 11:18:49 GMT
< Etag: "588604d8-94d"
< Last-Modified: Mon, 23 Jan 2017 13:27:52 GMT
< Pragma: no-cache
< Server: bfe/1.0.8.18
< Set-Cookie: BDORZ=27315; max-age=86400; domain=.baidu.com; path=/
<

Message Body

打開剛纔的tmp.txt文件，html代碼已經格式化：

<!DOCTYPE html>
<!--STATUS OK-->
<html>
 <head>
  <meta http-equiv="content-type" content="text/html;charset=utf-8" />
  <meta http-equiv="X-UA-Compatible" content="IE=Edge" />
  <meta content="always" name="referrer" />
  <link rel="stylesheet" type="text/css" href="http://s1.bdstatic.com/r/www/cache/bdorz/baidu.min.css" />
  <title>百度一下，你就知道</title>
 </head> 
 <body link="#0000cc"> 
  <div id="wrapper"> 
   <div id="head"> 
    <div class="head_wrapper"> 
     <div class="s_form"> 
      <div class="s_form_wrapper"> 
       <div id="lg"> 
        <img hidefocus="true" src="//www.baidu.com/img/bd_logo1.png" width="270" height="129" /> 
       </div> 
       <form id="form" name="f" action="//www.baidu.com/s" class="fm"> 
        <input type="hidden" name="bdorz_come" value="1" /> 
        <input type="hidden" name="ie" value="utf-8" /> 
        <input type="hidden" name="f" value="8" /> 
        <input type="hidden" name="rsv_bp" value="1" /> 
        <input type="hidden" name="rsv_idx" value="1" /> 
        <input type="hidden" name="tn" value="baidu" />
        <span class="bg s_ipt_wr"><input id="kw" name="wd" class="s_ipt" value="" maxlength="255" autocomplete="off" autofocus="" /></span>
        <span class="bg s_btn_wr"><input type="submit" id="su" value="百度一下" class="bg s_btn" /></span> 
       </form> 
      </div> 
     </div> 
     <div id="u1"> 
      <a href="http://news.baidu.com" name="tj_trnews" class="mnav">新聞</a> 
      <a href="http://www.hao123.com" name="tj_trhao123" class="mnav">hao123</a> 
      <a href="http://map.baidu.com" name="tj_trmap" class="mnav">地圖</a> 
      <a href="http://v.baidu.com" name="tj_trvideo" class="mnav">視頻</a> 
      <a href="http://tieba.baidu.com" name="tj_trtieba" class="mnav">貼吧</a> 
      <noscript> 
       <a href="http://www.baidu.com/bdorz/login.gif?login&amp;tpl=mn&amp;u=http%3A%2F%2Fwww.baidu.com%2f%3fbdorz_come%3d1" name="tj_login" class="lb">登陸</a> 
      </noscript> 
      <script>document.write('<a href="http://www.baidu.com/bdorz/login.gif?login&tpl=mn&u='+ encodeURIComponent(window.location.href+ (window.location.search === "" ? "?" : "&")+ "bdorz_come=1")+ '" name="tj_login" class="lb">登陸</a>');</script> 
      <a href="//www.baidu.com/more/" name="tj_briicon" class="bri" style="display: block;">更多產品</a> 
     </div> 
    </div> 
   </div> 
   <div id="ftCon"> 
    <div id="ftConw"> 
     <p id="lh"> <a href="http://home.baidu.com">關於百度</a> <a href="http://ir.baidu.com">About Baidu</a> </p> 
     <p id="cp">&copy;2017&nbsp;Baidu&nbsp;<a href="http://www.baidu.com/duty/">使用百度前必讀</a>&nbsp; <a href="http://jianyi.baidu.com/" class="cp-feedback">意見反饋</a>&nbsp;京ICP證030173號&nbsp; <img src="//www.baidu.com/img/gs.gif" /> </p> 
    </div> 
   </div> 
  </div>   
 </body>
</html>

本地服務端

http://httpbin.org/ 是Requests做者本身寫的服務端，這個網站能測試HTTP請求和響應的各類信息，好比cookie、ip、headers和登陸驗證等，且支持GET、POST等多種方法，對web開發和測試頗有幫助。它用Python+Flask編寫，是一個開源項目²。

注意：

Will it work with windows? - No.³

因爲windows平臺不支持gunicorn，gunicorn安裝成功後，啓動會出現ModuleNotFoundError: No module named 'fcntl'。

本地啓動服務端程序，注意從新開一個concle。Linux下部署：

$ pip3 install httpbin
$ pip3 install gunicorn

# 啓動服務
$ gunicorn -b :80 httpbin:app

啓動以後能夠看到

root@xxx:~# gunicorn -b :80 httpbin:app
[2019-02-19 10:29:26 +0800] [5110] [INFO] Starting gunicorn 19.9.0
[2019-02-19 10:29:26 +0800] [5110] [INFO] Listening at: http://0.0.0.0:80 (5110)
[2019-02-19 10:29:26 +0800] [5110] [INFO] Using worker: sync
[2019-02-19 10:29:26 +0800] [5114] [INFO] Booting worker with pid: 5114

而後訪問就好了，這裏將ip和域名進行了映射：

安裝到本地後能加快訪問速度，更高效幫助咱們本地作測試。

一、使用urllib系列⁴

這裏只作簡單的介紹。

urllib，urllib2，urllib3是進化關係嗎？

urllib和urllib2是相互獨立的模塊(Python2.x中)，Python3.x將兩個模塊進行了整合，整合以後的模塊爲urllib。
urllib3提供線程安全鏈接池和文件post等支持，與urllib及urllib2的關係不大。
requests庫使用了urllib3，這樣的好處是屢次請求使用同一個socket。

注：這裏在Python2.7環境

# -*- coding: utf-8 -*-
import urllib
import urllib2

URL_IP = 'http://www.onefine.top/ip'
URL_GET = 'http://www.onefine.top/get'


def use_simple_urllib2():
    response = urllib2.urlopen(URL_IP)
    print '>>>>Response Headers:'
    print response.info()  # 讀取headers
    print '>>>>Response body:'
    print ''.join([line for line in response.readlines()])  # 讀取body


def use_params_urllib2():
    # GET請求，構建請求參數
    params = urllib.urlencode({'param1': 'hello', 'param2': 'world'})
    print '>>>Request params:'
    print params
    # 發送請求
    response = urllib2.urlopen('?'.join([URL_GET, '%s']) % params)
    # 處理響應
    print '>>>>Response Headers:'
    print response.info()
    print '>>>>Status Code:'
    print response.getcode()
    print '>>>>Request body:'
    print ''.join([line for line in response.readlines()])


if __name__ == '__main__':
    print '>>>Use simple urllib2:'
    use_simple_urllib2()
    print ''
    print '>>>Use params urllib2:'
    use_params_urllib2()

執行結果：

>>>Use simple urllib2:
>>>>Response Headers:
Server: gunicorn/19.9.0
Date: Tue, 19 Feb 2019 05:28:11 GMT
Connection: close
Content-Type: application/json
Content-Length: 26
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true

>>>>Response body:
{"origin":"42.243.137.5"}


>>>Use params urllib2:
>>>Request params:
param2=world&param1=hello
>>>>Response Headers:
Server: gunicorn/19.9.0
Date: Tue, 19 Feb 2019 05:28:11 GMT
Connection: close
Content-Type: application/json
Content-Length: 250
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true

>>>>Status Code:
200
>>>>Request body:
{"args":{"param1":"hello","param2":"world"},"headers":{"Accept-Encoding":"identity","Connection":"close","Host":"www.onefine.top","User-Agent":"Python-urllib/2.7"},"origin":"42.243.137.5","url":"http://www.onefine.top/get?param2=world&param1=hello"}

二、使用requests

這裏回到Python3.x環境：

# -*- coding: utf-8 -*-

import requests

URL_IP = 'http://www.onefine.top/ip'
URL_GET = 'http://www.onefine.top/get'


def use_simple_requests():
    # get/post/options/put/delete
    response = requests.get(URL_IP)
    print('>>>>Response Headers:')
    print(response.headers)
    print('>>>>Response body:')
    print(response.text) # 不需考慮編碼等問題


def use_params_requests():
    params = {'param1': 'hello', 'param2': 'world'}
    response = requests.get(URL_GET, params=params)
    print('>>>>Response Headers:')
    print(response.headers)
    print('>>>>Status Code:')
    print(response.status_code)
    print('>>>>Reason:')
    print(response.reason)
    print('>>>>Request body:')
    print(response.text)


if __name__ == '__main__':
    print('>>>Use simple requests:')
    use_simple_requests()
    print('')
    print('>>>Use params requests:')
    use_params_requests()

執行結果爲：

>>>Use simple requests:
>>>>Response Headers:
{'Server': 'gunicorn/19.9.0', 'Date': 'Tue, 19 Feb 2019 05:38:00 GMT', 'Connection': 'close', 'Content-Type': 'application/json', 'Content-Length': '26', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Credentials': 'true'}
>>>>Response body:
{"origin":"42.243.137.5"}


>>>Use params requests:
>>>>Response Headers:
{'Server': 'gunicorn/19.9.0', 'Date': 'Tue, 19 Feb 2019 05:38:00 GMT', 'Connection': 'close', 'Content-Type': 'application/json', 'Content-Length': '280', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Credentials': 'true'}
>>>>Status Code:
200
>>>>Reason:
OK
>>>>Request body:
{"args":{"param1":"hello","param2":"world"},"headers":{"Accept":"*/*","Accept-Encoding":"gzip, deflate","Connection":"keep-alive","Host":"www.onefine.top","User-Agent":"python-requests/2.21.0"},"origin":"42.243.137.5","url":"http://www.onefine.top/get?param1=hello&param2=world"}

咱們對比一下使用urllib和requests的demo的響應頭：

# urllib
"headers": {
    "Accept-Encoding": "identity", 
    "Connection": "close", 
    "Host": "www.onefine.top", 
    "User-Agent": "Python-urllib/2.7"
}, 

# requests
"headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Connection": "keep-alive", 
    "Host": "www.onefine.top", 
    "User-Agent": "python-requests/2.21.0"
},