1.lxml是解析庫,使用時須要導入該包,直接在命令行輸入:pip3 install lxml,基本上會報錯。正確應該去對應的網址:https://pypi.org/project/lxml/#files,直接下載對應的lxmlhtml
(根據python版本本身去選擇,筆者是python3.6,故下載:lxml-4.2.4-cp36-cp36m-win32.whl ,切換到下載的whl目錄,在該目錄下執行:python
pip3 install lxml-4.2.4-cp36-cp36m-win32.whl )url
2.代碼以下所示:命令行
import requests from lxml import etree url = 'https://www.mafengwo.cn/gonglve/ziyouxing/2033.html' response = requests.get(url) #返回一個response對象 page = response.text html = etree.HTML(page) #返回一個Element對象,將字符串解析爲HTML文檔 content = html.xpath('//h2') for i in content: print(i.text)
3.代碼解釋:xml
A:定義好url的路徑,使用url獲取到response對象 如:url = ''htm
B:須要將reponse對象轉化爲字符串格式,page = response.text對象
C:使用解析庫將字符串轉爲爲HTML文檔,根據本身想要獲取的內容去定義xpath路徑blog