獲取當前節點:etree.tostring
html
正確顯示中文
方法一:html.unescape.net
from lxml import etree import html with open('list.html', 'r', encoding='utf-8') as f: text = f.read() tree = etree.HTML(text) r = html.unescape(etree.tostring(tree.xpath( '//*[@id="scroll_marquee"]')[0]).decode('utf-8')) print(r) print(type(r))
參考連接:爬取網頁時調用tostring()中文亂碼("數字;")解決方案
方法二:code
from lxml import etree import requests response = requests.get('https://www.baidu.com/).text tree = etree.HTML(response) strs = tree.xpath( "//body") strs = strs[0] strs = (etree.tostring(strs)) # 不能正常顯示中文 strs = (etree.tostring(strs, encoding = "utf-8", pretty_print = True, method = "html")) # 能夠正常顯示中文 print (strs)