ython3.5,requests遇到連接 http://app.cnmo.com/android/233888/history.html,抓取出現亂碼,發現是chunked編碼的,指定編碼也不行,自動檢測到編碼爲None。html
QQ羣裏問羣友,羣友用python2.x的,一樣的代碼,不亂碼。我也切換python2.x驗證,確實不出現亂碼。python
1 #coding:utf-8 2 import requests 3 headers = { 4 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36" 5 } 6 7 # 這個連接是chunked編碼的,源碼是GB2312編碼,python3.x亂碼,python2.x正常 8 url = 'http://app.cnmo.com/android/233888/history.html' 9 resp = requests.get(url=url,headers=headers) 0 print(resp.text)
python3.5.2android
python2.7.13python3.x
這個問題百思不得其解,百度、谷歌、360、搜狗、必應,能搜的都搜一遍,仍是沒搞定。app
晚上再看了一遍網頁請求頭,乾脆所有添加進去,結果不亂碼了。後面只保留"Accept-Encoding"、"User-Agent"字段,不亂碼,"Accept-Encoding"的值能夠爲空或任意編碼,好像都不亂碼。至於爲何我不清楚,可能須要開發者解答了python2.7
1 #coding:utf-8 2 import requests 3 headers = { 4 "Accept-Encoding": "", # 添加這個字段後,python3.x下不亂碼了 5 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36" 6 } 7 8 # 這個連接是chunked編碼的,源碼是GB2312編碼,headers添加了Accept-Encoding字段,結果不會亂碼了 9 url = 'http://app.cnmo.com/android/233888/history.html' 10 resp = requests.get(url=url,headers=headers) 11 print(resp.text)
python3.5.2已經不亂碼了 編碼