win10+python3.5，使用requests抓取信息遇到chunked亂碼的詭異問題。python2.7則不亂碼

時間 2019-11-24

標籤 win10+python3.5 win python 使用 requests 抓取信息遇到 chunked 亂碼詭異問題 python2.7 欄目 Python 简体版

原文原文鏈接

ython3.5，requests遇到連接 http://app.cnmo.com/android/233888/history.html，抓取出現亂碼，發現是chunked編碼的，指定編碼也不行，自動檢測到編碼爲None。html

QQ羣裏問羣友，羣友用python2.x的，一樣的代碼，不亂碼。我也切換python2.x驗證，確實不出現亂碼。python

 1 #coding:utf-8
 2 import requests
 3 headers = {
 4     "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36"
 5 }
 6 
 7 # 這個連接是chunked編碼的，源碼是GB2312編碼，python3.x亂碼，python2.x正常
 8 url = 'http://app.cnmo.com/android/233888/history.html'
 9 resp = requests.get(url=url,headers=headers)
0 print(resp.text)

python3.5.2android

python2.7.13python3.x

這個問題百思不得其解，百度、谷歌、360、搜狗、必應，能搜的都搜一遍，仍是沒搞定。app

晚上再看了一遍網頁請求頭，乾脆所有添加進去，結果不亂碼了。後面只保留"Accept-Encoding"、"User-Agent"字段，不亂碼，"Accept-Encoding"的值能夠爲空或任意編碼，好像都不亂碼。至於爲何我不清楚，可能須要開發者解答了python2.7

 1 #coding:utf-8
 2 import requests
 3 headers = {
 4     "Accept-Encoding": "", # 添加這個字段後，python3.x下不亂碼了
 5     "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36"
 6 }
 7 
 8 # 這個連接是chunked編碼的，源碼是GB2312編碼，headers添加了Accept-Encoding字段，結果不會亂碼了
 9 url = 'http://app.cnmo.com/android/233888/history.html'
10 resp = requests.get(url=url,headers=headers)
11 print(resp.text)