requests
是一個很實用的Python HTTP客戶端庫,編寫爬蟲和測試服務器響應數據時常常會用到。能夠說,Requests 徹底知足現在網絡的需求python
本文所有來源於官方文檔 http://docs.python-requests.org/en/master/git
安裝方式通常採用$ pip install requests。其它安裝方式參考官方文檔github
HTTP - requestsweb
import requestsjson
GET請求api
r = requests.get('http://httpbin.org/get')瀏覽器
傳參服務器
>>> payload = {'key1': 'value1', 'key2': 'value2', 'key3': None}
>>> r = requests.get('http://httpbin.org/get', params=payload)cookie
http://httpbin.org/get?key2=value2&key1=value1網絡
Note that any dictionary key whose value is None will not be added to the URL's query string.
參數也能夠傳遞列表
>>> payload = {'key1': 'value1', 'key2': ['value2', 'value3']}
>>> r = requests.get('http://httpbin.org/get', params=payload)
>>> print(r.url)
http://httpbin.org/get?key1=value1&key2=value2&key2=value3
r.text 返回headers中的編碼解析的結果,能夠經過r.encoding = 'gbk'來變動解碼方式
r.content返回二進制結果
r.json()返回JSON格式,可能拋出異常
r.status_code
r.raw返回原始socket respons,須要加參數stream=True
>>> r = requests.get('https://api.github.com/events', stream=True)
>>> r.raw
<requests.packages.urllib3.response.HTTPResponse object at 0x101194810>
>>> r.raw.read(10)
'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03'
將結果保存到文件,利用r.iter_content()
with open(filename, 'wb') as fd:
for chunk in r.iter_content(chunk_size):
fd.write(chunk)
傳遞headers
>>> headers = {'user-agent': 'my-app/0.0.1'}
>>> r = requests.get(url, headers=headers)
傳遞cookies
>>> url = 'http://httpbin.org/cookies'
>>> r = requests.get(url, cookies=dict(cookies_are='working'))
>>> r.text
'{"cookies": {"cookies_are": "working"}}'
POST請求
傳遞表單
r = requests.post('http://httpbin.org/post', data = {'key':'value'})
一般,你想要發送一些編碼爲表單形式的數據—很是像一個HTML表單。 要實現這個,只需簡單地傳遞一個字典給 data 參數。你的數據字典 在發出請求時會自動編碼爲表單形式:
>>> payload = {'key1': 'value1', 'key2': 'value2'}
>>> r = requests.post("http://httpbin.org/post", data=payload)
>>> print(r.text)
{
...
"form": {
"key2": "value2",
"key1": "value1"
},
...
}
不少時候你想要發送的數據並不是編碼爲表單形式的。若是你傳遞一個 string 而不是一個dict ,那麼數據會被直接發佈出去。
>>> url = 'https://api.github.com/some/endpoint'
>>> payload = {'some': 'data'}
>>> r = requests.post(url, data=json.dumps(payload))
或者
>>> r = requests.post(url, json=payload)
傳遞文件
url = 'http://httpbin.org/post'
>>> files = {'file': open('report.xls', 'rb')}
>>> r = requests.post(url, files=files)
配置files,filename, content_type and headers
files = {'file': ('report.xls', open('report.xls', 'rb'), 'application/vnd.ms-excel', {'Expires': '0'})}
files = {'file': ('report.csv', 'some,data,to,send\nanother,row,to,send\n')}
響應
r.status_code
r.heards
r.cookies
跳轉
By default Requests will perform location redirection for all verbs except HEAD.
>>> r = requests.get('http://httpbin.org/cookies/set?k2=v2&k1=v1')
>>> r.url
'http://httpbin.org/cookies'
>>> r.status_code
200
>>> r.history
[<Response [302]>]
If you're using HEAD, you can enable redirection as well:
r=requests.head('http://httpbin.org/cookies/set?k2=v2&k1=v1',allow_redirects=True)
You can tell Requests to stop waiting for a response after a given number of seconds with the timeoutparameter:
requests.get('http://github.com', timeout=0.001)
高級特性
來自 <http://docs.python-requests.org/en/master/user/advanced/#advanced>
session,自動保存cookies,能夠設置請求參數,下次請求自動帶上請求參數
s = requests.Session()
s.get('http://httpbin.org/cookies/set/sessioncookie/123456789')
r = s.get('http://httpbin.org/cookies')
print(r.text)
# '{"cookies": {"sessioncookie": "123456789"}}'
session能夠用來提供默認數據,函數參數級別的數據會和session級別的數據合併,若是key重複,函數參數級別的數據將覆蓋session級別的數據。若是想取消session的某個參數,能夠在傳遞一個相同key,value爲None的dict
s = requests.Session()
s.auth = ('user', 'pass') #權限認證
s.headers.update({'x-test': 'true'})
# both 'x-test' and 'x-test2' are sent
s.get('http://httpbin.org/headers', headers={'x-test2': 'true'})
函數參數中的數據只會使用一次,並不會保存到session中
如:cookies僅本次有效
r = s.get('http://httpbin.org/cookies', cookies={'from-my': 'browser'})
session也能夠自動關閉
with requests.Session() as s:
s.get('http://httpbin.org/cookies/set/sessioncookie/123456789')
響應結果不只包含響應的所有信息,也包含請求信息
r = requests.get('http://en.wikipedia.org/wiki/Monty_Python')
r.headers
r.request.headers
SSL證書驗證
Requests能夠爲HTTPS請求驗證SSL證書,就像web瀏覽器同樣。要想檢查某個主機的SSL證書,你可使用 verify 參數:
>>> requests.get('https://kennethreitz.com', verify=True)
requests.exceptions.SSLError: hostname 'kennethreitz.com' doesn't match either of '*.herokuapp.com', 'herokuapp.com'
在該域名上我沒有設置SSL,因此失敗了。但Github設置了SSL:
>>> requests.get('https://github.com', verify=True)
<Response [200]>
對於私有證書,你也能夠傳遞一個CA_BUNDLE文件的路徑給 verify 。你也能夠設置REQUEST_CA_BUNDLE 環境變量。
>>> requests.get('https://github.com', verify='/path/to/certfile')
若是你將 verify 設置爲False,Requests也能忽略對SSL證書的驗證。
>>> requests.get('https://kennethreitz.com', verify=False)
<Response [200]>
默認狀況下, verify 是設置爲True的。選項 verify 僅應用於主機證書。
你也能夠指定一個本地證書用做客戶端證書,能夠是單個文件(包含密鑰和證書)或一個包含兩個文件路徑的元組:
>>> requests.get('https://kennethreitz.com', cert=('/path/server.crt', '/path/key'))
<Response [200]>
響應體內容工做流
默認狀況下,當你進行網絡請求後,響應體會當即被下載。你能夠經過 stream 參數覆蓋這個行爲,推遲下載響應體直到訪問 Response.content 屬性:
tarball_url = 'https://github.com/kennethreitz/requests/tarball/master'
r = requests.get(tarball_url, stream=True)
此時僅有響應頭被下載下來了,鏈接保持打開狀態,所以容許咱們根據條件獲取內容:
if int(r.headers['content-length']) < TOO_LONG:
content = r.content
...
若是設置stream爲True,請求鏈接不會被關閉,除非讀取全部數據或者調用Response.close。
可使用contextlib.closing來自動關閉鏈接:
import requests
from contextlib
import closing
tarball_url = 'https://github.com/kennethreitz/requests/tarball/master'
file = r'D:\Documents\WorkSpace\Python\Test\Python34Test\test.tar.gz'
with closing(requests.get(tarball_url, stream=True)) as r:
with open(file, 'wb') as f:
for data in r.iter_content(1024):
f.write(data)
Keep-Alive
來自 <http://docs.python-requests.org/en/master/user/advanced/>
同一會話內你發出的任何請求都會自動複用恰當的鏈接!
注意:只有全部的響應體數據被讀取完畢鏈接纔會被釋放爲鏈接池;因此確保將 stream設置爲 False 或讀取 Response 對象的 content 屬性。
流式上傳
Requests支持流式上傳,這容許你發送大的數據流或文件而無需先把它們讀入內存。要使用流式上傳,僅需爲你的請求體提供一個類文件對象便可:
讀取文件請使用字節的方式,這樣Requests會生成正確的Content-Length
with open('massive-body', 'rb') as f:
requests.post('http://some.url/streamed', data=f)
分塊傳輸編碼
對於出去和進來的請求,Requests也支持分塊傳輸編碼。要發送一個塊編碼的請求,僅需爲你的請求體提供一個生成器
注意生成器輸出應該爲bytes
def gen():
yield b'hi'
yield b'there'
requests.post('http://some.url/chunked', data=gen())
For chunked encoded responses, it's best to iterate over the data using Response.iter_content(). In an ideal situation you'll have set stream=True on the request, in which case you can iterate chunk-by-chunk by calling iter_content with a chunk size parameter of None. If you want to set a maximum size of the chunk, you can set a chunk size parameter to any integer.
POST Multiple Multipart-Encoded Files
來自 <http://docs.python-requests.org/en/master/user/advanced/>
<input type="file" name="images" multiple="true" required="true"/>
To do that, just set files to a list of tuples of (form_field_name, file_info):
>>> url = 'http://httpbin.org/post'
>>> multiple_files = [
('images', ('foo.png', open('foo.png', 'rb'), 'image/png')),
('images', ('bar.png', open('bar.png', 'rb'), 'image/png'))]
>>> r = requests.post(url, files=multiple_files)
>>> r.text
{
...
'files': {'images': ' ....'}
'Content-Type': 'multipart/form-data; boundary=3131623adb2043caaeb5538cc7aa0b3a',
...
}
Custom Authentication
Requests allows you to use specify your own authentication mechanism.
Any callable which is passed as the auth argument to a request method will have the opportunity to modify the request before it is dispatched.
Authentication implementations are subclasses of requests.auth.AuthBase, and are easy to define. Requests provides two common authentication scheme implementations in requests.auth:HTTPBasicAuth and HTTPDigestAuth.
Let's pretend that we have a web service that will only respond if the X-Pizza header is set to a password value. Unlikely, but just go with it.
from requests.auth import AuthBase
class PizzaAuth(AuthBase):
"""Attaches HTTP Pizza Authentication to the given Request object."""
def __init__(self, username):
# setup any auth-related data here
self.username = username
def __call__(self, r):
# modify and return the request
r.headers['X-Pizza'] = self.username
return r
Then, we can make a request using our Pizza Auth:
>>> requests.get('http://pizzabin.org/admin', auth=PizzaAuth('kenneth'))
<Response [200]>
來自 <http://docs.python-requests.org/en/master/user/advanced/>
流式請求
r = requests.get('http://httpbin.org/stream/20', stream=True)
for line in r.iter_lines():
代理
If you need to use a proxy, you can configure individual requests with the proxies argument to any request method:
import requests
proxies = {
'http': 'http://10.10.1.10:3128',
'https': 'http://10.10.1.10:1080',
}
requests.get('http://example.org', proxies=proxies)
To use HTTP Basic Auth with your proxy, use the http://user:password@host/ syntax:
proxies = {'http': 'http://user:pass@10.10.1.10:3128/'}
超時
If you specify a single value for the timeout, like this:
r = requests.get('https://github.com', timeout=5)
The timeout value will be applied to both the connect and the read timeouts. Specify a tuple if you would like to set the values separately:
r = requests.get('https://github.com', timeout=(3.05, 27))
If the remote server is very slow, you can tell Requests to wait forever for a response, by passing None as a timeout value and then retrieving a cup of coffee.
r = requests.get('https://github.com', timeout=None)