Requests
惟一的一個非轉基因的Python HTTP
庫,人類能夠安全享用。javascript
警告:非專業使用其餘HTTP
庫會致使危險的反作用,包括:安全缺陷症、冗餘代碼症、從新發明輪子症、啃文檔症、抑鬱、頭疼、甚至死亡。1css
# 新建Python3.x虛擬環境 > mkvirtualenv Py3_requests # 安裝request庫 (Py3_requests) > pip install requests # 環境 (Py3_requests) > python --version Python 3.7.1 (Py3_requests) > pip list Package Version ------------ ---------- certifi 2018.11.29 chardet 3.0.4 idna 2.8 pip 19.0.2 requests 2.21.0 setuptools 40.8.0 urllib3 1.24.1 wheel 0.33.0 (Py3_requests) >
Requests類庫官方中文文檔: http://docs.python-requests.org/zh_CN/latest/index.htmlhtml
Requests類庫GitHub源碼:https://github.com/kennethreitz/requestsjava
Requests類庫做者Kenneth Reitz博客: https://www.kennethreitz.orgpython
什麼是http協議?git
- HyperText Transfer Protocol 超文本傳輸協議
- The Hypertext Transfer Protocol(HTTP) is a stateless(無狀態)
application-level protocol for distributed(分佈式),
collaborative(協做式), hypertext information systems(超文本信息系統).
在Linux中curl是一個利用URL規則在命令行下工做的文件傳輸工具,能夠說是一款很強大的http命令行工具。它支持文件的上傳和下載,是綜合傳輸工具,但按傳統,習慣稱url爲下載工具。github
語法:$ curl [option] [url]
web
經常使用參數:
-v
參數能夠顯示一次http通訊的整個過程,包括端口鏈接和http request頭信息。shell
(Py3_requests) > curl --help # ... 省略 (Py3_requests) > curl -v http://www.baidu.com > tmp.txt * Rebuilt URL to: http://www.baidu.com/ % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying 14.215.177.39... * TCP_NODELAY set * Connected to www.baidu.com (14.215.177.39) port 80 (#0) > GET / HTTP/1.1 > Host: www.baidu.com > User-Agent: curl/7.55.1 > Accept: */* > < HTTP/1.1 200 OK < Accept-Ranges: bytes < Cache-Control: private, no-cache, no-store, proxy-revalidate, no-transform < Connection: Keep-Alive < Content-Length: 2381 < Content-Type: text/html < Date: Sun, 17 Feb 2019 11:18:49 GMT < Etag: "588604d8-94d" < Last-Modified: Mon, 23 Jan 2017 13:27:52 GMT < Pragma: no-cache < Server: bfe/1.0.8.18 < Set-Cookie: BDORZ=27315; max-age=86400; domain=.baidu.com; path=/ < { [1040 bytes data] 100 2381 100 2381 0 0 2381 0 0:00:01 --:--:-- 0:00:01 5850 * Connection #0 to host www.baidu.com left intact (Py3_requests) >
Start Line: 方法 地址 協議
django
Headers: key:value
> GET / HTTP/1.1 # ... Start Line # ... Headers > Host: www.baidu.com > User-Agent: curl/7.55.1 > Accept: */* >
Start Line: 協議 狀態碼 具體解釋
Headers: key:value
< HTTP/1.1 200 OK # ... Start Line # ... Headers < Accept-Ranges: bytes < Cache-Control: private, no-cache, no-store, proxy-revalidate, no-transform < Connection: Keep-Alive < Content-Length: 2381 < Content-Type: text/html < Date: Sun, 17 Feb 2019 11:18:49 GMT < Etag: "588604d8-94d" < Last-Modified: Mon, 23 Jan 2017 13:27:52 GMT < Pragma: no-cache < Server: bfe/1.0.8.18 < Set-Cookie: BDORZ=27315; max-age=86400; domain=.baidu.com; path=/ <
打開剛纔的tmp.txt
文件,html代碼已經格式化:
<!DOCTYPE html> <!--STATUS OK--> <html> <head> <meta http-equiv="content-type" content="text/html;charset=utf-8" /> <meta http-equiv="X-UA-Compatible" content="IE=Edge" /> <meta content="always" name="referrer" /> <link rel="stylesheet" type="text/css" href="http://s1.bdstatic.com/r/www/cache/bdorz/baidu.min.css" /> <title>百度一下,你就知道</title> </head> <body link="#0000cc"> <div id="wrapper"> <div id="head"> <div class="head_wrapper"> <div class="s_form"> <div class="s_form_wrapper"> <div id="lg"> <img hidefocus="true" src="//www.baidu.com/img/bd_logo1.png" width="270" height="129" /> </div> <form id="form" name="f" action="//www.baidu.com/s" class="fm"> <input type="hidden" name="bdorz_come" value="1" /> <input type="hidden" name="ie" value="utf-8" /> <input type="hidden" name="f" value="8" /> <input type="hidden" name="rsv_bp" value="1" /> <input type="hidden" name="rsv_idx" value="1" /> <input type="hidden" name="tn" value="baidu" /> <span class="bg s_ipt_wr"><input id="kw" name="wd" class="s_ipt" value="" maxlength="255" autocomplete="off" autofocus="" /></span> <span class="bg s_btn_wr"><input type="submit" id="su" value="百度一下" class="bg s_btn" /></span> </form> </div> </div> <div id="u1"> <a href="http://news.baidu.com" name="tj_trnews" class="mnav">新聞</a> <a href="http://www.hao123.com" name="tj_trhao123" class="mnav">hao123</a> <a href="http://map.baidu.com" name="tj_trmap" class="mnav">地圖</a> <a href="http://v.baidu.com" name="tj_trvideo" class="mnav">視頻</a> <a href="http://tieba.baidu.com" name="tj_trtieba" class="mnav">貼吧</a> <noscript> <a href="http://www.baidu.com/bdorz/login.gif?login&tpl=mn&u=http%3A%2F%2Fwww.baidu.com%2f%3fbdorz_come%3d1" name="tj_login" class="lb">登陸</a> </noscript> <script>document.write('<a href="http://www.baidu.com/bdorz/login.gif?login&tpl=mn&u='+ encodeURIComponent(window.location.href+ (window.location.search === "" ? "?" : "&")+ "bdorz_come=1")+ '" name="tj_login" class="lb">登陸</a>');</script> <a href="//www.baidu.com/more/" name="tj_briicon" class="bri" style="display: block;">更多產品</a> </div> </div> </div> <div id="ftCon"> <div id="ftConw"> <p id="lh"> <a href="http://home.baidu.com">關於百度</a> <a href="http://ir.baidu.com">About Baidu</a> </p> <p id="cp">©2017 Baidu <a href="http://www.baidu.com/duty/">使用百度前必讀</a> <a href="http://jianyi.baidu.com/" class="cp-feedback">意見反饋</a> 京ICP證030173號 <img src="//www.baidu.com/img/gs.gif" /> </p> </div> </div> </div> </body> </html>
http://httpbin.org/ 是Requests做者本身寫的服務端,這個網站能測試HTTP請求和響應的各類信息,好比cookie、ip、headers和登陸驗證等,且支持GET、POST等多種方法,對web開發和測試頗有幫助。它用Python+Flask編寫,是一個開源項目2。
注意:
Will it work with windows? - No.3
因爲windows平臺不支持gunicorn,gunicorn安裝成功後,啓動會出現ModuleNotFoundError: No module named 'fcntl'
。
本地啓動服務端程序,注意從新開一個concle。Linux下部署:
$ pip3 install httpbin $ pip3 install gunicorn # 啓動服務 $ gunicorn -b :80 httpbin:app
啓動以後能夠看到
root@xxx:~# gunicorn -b :80 httpbin:app [2019-02-19 10:29:26 +0800] [5110] [INFO] Starting gunicorn 19.9.0 [2019-02-19 10:29:26 +0800] [5110] [INFO] Listening at: http://0.0.0.0:80 (5110) [2019-02-19 10:29:26 +0800] [5110] [INFO] Using worker: sync [2019-02-19 10:29:26 +0800] [5114] [INFO] Booting worker with pid: 5114
而後訪問就好了,這裏將ip和域名進行了映射:
安裝到本地後能加快訪問速度,更高效幫助咱們本地作測試。
這裏只作簡單的介紹。
urllib,urllib2,urllib3是進化關係嗎?
注:這裏在Python2.7環境
# -*- coding: utf-8 -*- import urllib import urllib2 URL_IP = 'http://www.onefine.top/ip' URL_GET = 'http://www.onefine.top/get' def use_simple_urllib2(): response = urllib2.urlopen(URL_IP) print '>>>>Response Headers:' print response.info() # 讀取headers print '>>>>Response body:' print ''.join([line for line in response.readlines()]) # 讀取body def use_params_urllib2(): # GET請求,構建請求參數 params = urllib.urlencode({'param1': 'hello', 'param2': 'world'}) print '>>>Request params:' print params # 發送請求 response = urllib2.urlopen('?'.join([URL_GET, '%s']) % params) # 處理響應 print '>>>>Response Headers:' print response.info() print '>>>>Status Code:' print response.getcode() print '>>>>Request body:' print ''.join([line for line in response.readlines()]) if __name__ == '__main__': print '>>>Use simple urllib2:' use_simple_urllib2() print '' print '>>>Use params urllib2:' use_params_urllib2()
執行結果:
>>>Use simple urllib2: >>>>Response Headers: Server: gunicorn/19.9.0 Date: Tue, 19 Feb 2019 05:28:11 GMT Connection: close Content-Type: application/json Content-Length: 26 Access-Control-Allow-Origin: * Access-Control-Allow-Credentials: true >>>>Response body: {"origin":"42.243.137.5"} >>>Use params urllib2: >>>Request params: param2=world¶m1=hello >>>>Response Headers: Server: gunicorn/19.9.0 Date: Tue, 19 Feb 2019 05:28:11 GMT Connection: close Content-Type: application/json Content-Length: 250 Access-Control-Allow-Origin: * Access-Control-Allow-Credentials: true >>>>Status Code: 200 >>>>Request body: {"args":{"param1":"hello","param2":"world"},"headers":{"Accept-Encoding":"identity","Connection":"close","Host":"www.onefine.top","User-Agent":"Python-urllib/2.7"},"origin":"42.243.137.5","url":"http://www.onefine.top/get?param2=world¶m1=hello"}
這裏回到Python3.x環境:
# -*- coding: utf-8 -*- import requests URL_IP = 'http://www.onefine.top/ip' URL_GET = 'http://www.onefine.top/get' def use_simple_requests(): # get/post/options/put/delete response = requests.get(URL_IP) print('>>>>Response Headers:') print(response.headers) print('>>>>Response body:') print(response.text) # 不需考慮編碼等問題 def use_params_requests(): params = {'param1': 'hello', 'param2': 'world'} response = requests.get(URL_GET, params=params) print('>>>>Response Headers:') print(response.headers) print('>>>>Status Code:') print(response.status_code) print('>>>>Reason:') print(response.reason) print('>>>>Request body:') print(response.text) if __name__ == '__main__': print('>>>Use simple requests:') use_simple_requests() print('') print('>>>Use params requests:') use_params_requests()
執行結果爲:
>>>Use simple requests: >>>>Response Headers: {'Server': 'gunicorn/19.9.0', 'Date': 'Tue, 19 Feb 2019 05:38:00 GMT', 'Connection': 'close', 'Content-Type': 'application/json', 'Content-Length': '26', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Credentials': 'true'} >>>>Response body: {"origin":"42.243.137.5"} >>>Use params requests: >>>>Response Headers: {'Server': 'gunicorn/19.9.0', 'Date': 'Tue, 19 Feb 2019 05:38:00 GMT', 'Connection': 'close', 'Content-Type': 'application/json', 'Content-Length': '280', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Credentials': 'true'} >>>>Status Code: 200 >>>>Reason: OK >>>>Request body: {"args":{"param1":"hello","param2":"world"},"headers":{"Accept":"*/*","Accept-Encoding":"gzip, deflate","Connection":"keep-alive","Host":"www.onefine.top","User-Agent":"python-requests/2.21.0"},"origin":"42.243.137.5","url":"http://www.onefine.top/get?param1=hello¶m2=world"}
咱們對比一下使用urllib和requests的demo的響應頭:
# urllib "headers": { "Accept-Encoding": "identity", "Connection": "close", "Host": "www.onefine.top", "User-Agent": "Python-urllib/2.7" }, # requests "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Connection": "keep-alive", "Host": "www.onefine.top", "User-Agent": "python-requests/2.21.0" },
requests庫有urllib3的支持,因此將connection狀態置爲keep-alive
,屢次請求使用一個鏈接,消耗更小的資源。
參考:
python中urllib, urllib2,urllib3, httplib,httplib2, request的區別 http://www.cnblogs.com/arxive/p/6194368.html
Python網絡請求urllib和urllib3詳解 https://www.jianshu.com/p/f05d33475c78
guicorn 是什麼 http://www.javashuo.com/article/p-prysvjem-o.html
Linux curl命令詳解 http://www.javashuo.com/article/p-usdmlmbu-m.html
curl 的使用 https://www.jianshu.com/p/f05bbd5007d9
Request類庫官方文檔 ↩︎
關於urllib的更多介紹,請參閱: https://blog.csdn.net/jiduochou963/article/details/87564467 ↩︎