BeautifulSoup解決中文網頁亂碼

如下代碼,在執行結果中的中文出現亂碼。html

from bs4 import BeautifulSoup
import urllib2

request = urllib2.Request('http://www.163.com')
response = urllib2.urlopen(request)
html_doc = response.read()
soup = BeautifulSoup(html_doc)

print soup.find_all('a')

由於中文頁面編碼是gb2312,gbk,在BeautifulSoup構造器中傳入from_encoding = "gb18030"參數可解決亂碼問題。編碼

注:在BeautifulSoup3中,from_encoding需修改成fromEncoding。url

from bs4 import BeautifulSoup
import urllib2

request = urllib2.Request('http://www.163.com')
response = urllib2.urlopen(request)
html_doc = response.read()
soup = BeautifulSoup(html_doc, from_encoding = "gb18030")

print soup.find_all('a')
相關文章
相關標籤/搜索