一、官方文檔:Requests: HTTP for Humans 。(秒殺urllib、urllib二、httplib。。。)javascript
1.一、請求主要參數說明:requests.request(method, url, **kwargs)html
二、官方下載地址:https://pypi.python.org/pypi/requestsjava
三、安裝方法:python
pip install requests -i https://mirrors.aliyun.com/pypi/simple/
四、一個有用的特性:持久保持cookie。(參考這裏)json
五、添加cookie:(能夠參閱這篇文章)segmentfault
requests.utils.add_dict_to_cookiejar(cj, cookie_dict)
5.一、若是sn.headers['Cookie']有值,那麼sn.cookies 再也不起做用。api
六、用python爬蟲抓站的一些技巧總結,注意(3.5 終極絕招)中提到的工具:selenium,pamie,watir。bash
七、Python的擴展包requests的高級用法,應該是這兩個頁面的組合翻譯:一,二。cookie
八、關於默認超時值:socket.getdefaulttimeout() requests 庫 使用過程當中timeout值最大可設值? session
九、Requests在get時url中的百分號(%)被強制編碼爲 25% 可用以下方法迂迴解決。(參考評論)
import requests sn = requests.Session() url = 'http://xxx.net/xxx.aspx?Param=ASP.brief_result_aspx%23/%u5E74 req = requests.Request('GET', url) prepped = sn.prepare_request(req) prepped.url = prepped.url.replace('%25', '%') r = sn.send(prepped)
或者拆分後從新拼接:
import requests from urllib import parse url = 'http://xxx.net/xxx.aspx?' + parse.urlencode(dict(parse.parse_qsl(parse.urlparse(url).query))) r = sn.get(url)
十、post表單重複鍵值的處理:
dic = { 'key1': ['aaa', 'bbb', 'ccc'], 'key2': 'xxx', } r = requests.post(url, data=dic)
這樣會被編碼成:key1=aaa&key1=bbb&key1=ccc&key2=xxx
十一、關於 Request Payload。(POST a Multipart-Encoded File)
代碼:
import requests import json r = requests.post('http://www.baidu.com', files={'key':'val'}) print('***test 1: %s' % r.request.body) print('------------------------') print(r.request.body.decode('utf8')) print('\n') line = json.dumps({'k1':'v1', 'k2':'v2'}) r = requests.post('http://www.baidu.com', files={'json':(None, line)}) print('***test 2: %s' % r.request.body) print('------------------------') print(r.request.body.decode('utf8'))
輸出:
***test 1: b'--5b57f36c03ca462f93dfbd8dfc97e2d1\r\nContent-Disposition: form-data; name="key"; filename="key"\r\n\r\nval\r\n--5b57f36c03ca462f93dfbd8dfc97e2d1--\r\n' ------------------------ --5b57f36c03ca462f93dfbd8dfc97e2d1 Content-Disposition: form-data; name="key"; filename="key" val --5b57f36c03ca462f93dfbd8dfc97e2d1-- ***test 2: b'--cf66a355e1f6441fa5a3079e1fafc43a\r\nContent-Disposition: form-data; name="json"\r\n\r\n{"k1": "v1", "k2": "v2"}\r\n--cf66a355e1f6441fa5a3079e1fafc43a--\r\n' ------------------------ --cf66a355e1f6441fa5a3079e1fafc43a Content-Disposition: form-data; name="json" {"k1": "v1", "k2": "v2"} --cf66a355e1f6441fa5a3079e1fafc43a--
十二、關閉 https 證書警告。
import requests from urllib3.exceptions import InsecureRequestWarning # 關閉 https 證書警告 requests.urllib3.disable_warnings(InsecureRequestWarning)
1三、用Requests實現聊天的小黃雞和小黑雞。(參考這篇文章)
(1)、源碼
#coding=utf-8 import requests class SimSimi: def __init__(self): self.session = requests.Session() def initCookie(self): headers = { 'x-requested-with': 'XMLHttpRequest', 'Accept-Language': 'zh-cn', 'Referer': 'http://www.simsimi.com/talk.htm', 'Accept': 'application/json, text/javascript, */*; q=0.01', 'Content-Type': 'application/json; charset=utf-8', 'Accept-Encoding': 'gzip, deflate', 'User-Agent': 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E)', 'Host': 'www.simsimi.com', 'Connection': 'Keep-Alive', #Cookie是否設置彷佛沒有關係 #'Cookie': '', } self.session.headers.update(headers) self.session.get('http://www.simsimi.com/talk.htm') self.session.get('http://www.simsimi.com/talk.htm?lc=ch') self.session.headers.update({'Referer': 'http://www.simsimi.com/talk.htm?lc=ch'}) def getAnswer(self, message="hello"): url = 'http://www.simsimi.com/func/req?msg=%s&lc=ch' r = self.session.get(url=url%message) if len(r.json()) < 1: return ("hehe...") else: return r.json()['response'] simi = SimSimi() simi.initCookie() msg = u'花果山' print(u'topic: ' + msg) for i in range(1, 5): msg = simi.getAnswer(msg) print i, if i % 2 == 0: print u"小黃雞:", else: print u"小黑雞:", try: print(msg) except: print('哈哈')
(2)、效果
topic: 花果山 1 小黑雞: 我也沒去過 據說那是個傳說 也許在他們心中吧 2 小黃雞: 去過啊。很美的。 3 小黑雞: 別難過哈~開心點~努力!讓本身更優秀!讓愛你的人自豪! 4 小黃雞: 不要緊。開心點。
【相關閱讀】
*** walker * Updated 2017-01-13 ***