早就據說requests的庫的強大,只是尚未接觸,今天接觸了一下,發現之前使用urllib,urllib2等方法真是太搓了……html
這裏寫些簡單的使用初步做爲一個記錄python
1、安裝 http://cn.python-requests.org/en/latest/user/install.html#installgit
2、發送無參數的get請求github
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
>>>
print
r.text
{
"args"
: {},
"headers"
: {
"Accept"
:
"*/*"
,
"Accept-Encoding"
:
"gzip, deflate"
,
"Connection"
:
"close"
,
"Host"
:
"httpbin.org"
,
"User-Agent"
:
"python-requests/2.3.0 CPython/2.6.6 Windows/7"
,
"X-Request-Id"
:
"8a28bbea-55cd-460b-bda3-f3427d66b700"
},
"origin"
:
"124.192.129.84"
,
}
|
3、發送帶參數的get請求,將key與value放入一個字典中,經過params參數來傳遞,其做用至關於urllib.urlencodeajax
?json
1
2
3
4
5
|
>>>
import
requests
>>> pqyload
=
{
'q'
:
'楊彥星'
}
>>> r.url
|
4、發送post請求,經過data參數來傳遞,api
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
|
>>> payload
=
{
'a'
:
'楊'
,
'b'
:
'hello'
}
>>>
print
r.text
{
"args"
: {},
"data"
: "",
"files"
: {},
"form"
: {
"a"
:
"\u6768"
,
"b"
:
"hello"
},
"headers"
: {
"Accept"
:
"*/*"
,
"Accept-Encoding"
:
"gzip, deflate"
,
"Connection"
:
"close"
,
"Content-Length"
:
"19"
,
"Content-Type"
:
"application/x-www-form-urlencoded"
,
"Host"
:
"httpbin.org"
,
"User-Agent"
:
"python-requests/2.3.0 CPython/2.6.6 Windows/7"
,
"X-Request-Id"
:
"c81cb937-04b8-4a2d-ba32-04b5c0b3ba98"
},
"json"
: null,
"origin"
:
"124.192.129.84"
,
}
>>>
|
能夠看到,post參數已經傳到了form裏,data不光能夠接受字典類型的數據,還能夠接受json等格式cookie
1
2
3
|
>>> payload
=
{
'a'
:
'楊'
,
'b'
:
'hello'
}
>>>
import
json
|
5、發送文件的post類型,這個至關於向網站上傳一張圖片,文檔等操做,這時要使用files參數session
1
2
3
|
>>> files
=
{
'file'
:
open
(
'touxiang.png'
,
'rb'
)}
>>> r
=
requests.post(url, files
=
files)
|
5.1 定製headers,使用headers參數來傳遞app
1
2
3
4
5
6
|
>>>
import
json
>>> payload
=
{
'some'
:
'data'
}
>>> headers
=
{
'content-type'
:
'application/json'
}
>>> r
=
requests.post(url, data
=
json.dumps(payload), headers
=
headers)
|
6、響應內容
6.1 響應狀態碼
r = requests.get('http://httpbin.org/get')
print r.status_code
6.2 響應頭
1
2
|
>>>
print
r.headers
{
'content-length'
:
'519'
,
'server'
:
'gunicorn/18.0'
,
'connection'
:
'keep-alive'
,
'date'
:
'Sun, 15 Jun 2014 14:19:52 GMT'
,
'access-control-allow-origin'
:
'*'
,
'content-type'
:
'application/json'
}
|
也能夠取到這個個別的響應頭用來作一些判斷,這裏的參數是不區分大小寫的
r.headers[‘Content-Type’]
r.headers.get(‘Content-Type’)
6.3 響應內容,前面已經在應用了
r.text
r.content
7、獲取響應中的cookies
1
2
3
|
>>> r.cookies[
'BAIDUID'
]
'D5810267346AEFB0F25CB0D6D0E043E6:FG=1'
|
也能夠自已定義請求的COOKIES
1
2
3
4
5
6
7
8
9
10
11
|
>>> cookies
=
{
'cookies_are'
:
'working'
}
>>> r
=
requests.get(url,cookies
=
cookies)
>>>
>>>
print
r.text
{
"cookies"
: {
"cookies_are"
:
"working"
}
}
>>>
|
cookies還有不少,由於目前我也還不是不少,之後再擴充吧
8、使用timeout參數設置超時時間
>>> requests.get('http://github.com', timeout=1)
<Response [200]>
若是將時間設置成很是小的數,如requests.get('http://github.com', timeout=0.001),那麼若是在timeout的時間內沒有鏈接,那麼將會拋出一個Timeout的異常
9、訪問中使用session
先初始化一個session對象,s = requests.Session()
而後使用這個session對象來進行訪問,r = s.post(url,data = user)
參考文章 http://blog.csdn.net/iloveyin/article/details/21444613 基本上都是從這扒的代碼
如下經過訪問人人網來獲取首頁中的最近來訪問,而後再訪問查看更多的來訪來讀取更多的最近來訪
更多的來訪就是以帶session的訪問http://www.renren.com/myfoot.do
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
|
#coding:utf-8
import
requests
import
re
user
=
{
'email'
:
'email'
,
'password'
:
'pass'
}
s
=
requests.Session()
r
=
s.post(url,data
=
user)
html
=
r.text
visit
=
[]
first
=
re.
compile
(r
'</span><span class="time-tip first-tip"><span class="tip-content">(.*?)</span>'
)
second
=
re.
compile
(r
'</span><span class="time-tip"><span class="tip-content">(.*?)</span>'
)
third
=
re.
compile
(r
'</span><span class="time-tip last-second-tip"><span class="tip-content">(.*?)</span>'
)
last
=
re.
compile
(r
'</span><span class="time-tip last-tip"><span class="tip-content">(.*?)</span>'
)
visit.extend(first.findall(html))
visit.extend(second.findall(html))
visit.extend(third.findall(html))
visit.extend(last.findall(html))
for
i
in
visit:
print
i
print
'如下是更多的最近來訪'
fm
=
re.
compile
(r
'"name":"(.*?)"'
)
visitmore
=
fm.findall(vm.text)
for
i
in
visitmore:
print
i
|