環境:python2.7html
安裝lxml模塊python
pip install lxml
例子:bash
from lxml import etree text = ''' <div> <ul> <li class="item-0"><a href="link1.html">first item</a></li> <li class="item-1"><a href="link2.html">second item</a></li> <li class="item-inactive"><a href="link3.html">third item</a></li> <li class="item-1"><a href="link4.html">fourth item</a></li> <li class="item-0"><a href="link5.html">fifth item</a> </ul> </div> ''' html = etree.HTML(text) #這是一個地址 result = etree.tostring(html) #讀出來源碼,而且補全,如輸出的《body》標籤 print(result)
輸出:微信
<html> <body> <div> <ul> <li class="item-0"><a href="link1.html">first item</a></li> <li class="item-1"><a href="link2.html">second item</a></li> <li class="item-inactive"><a href="link3.html">third item</a></li> <li class="item-1"><a href="link4.html">fourth item</a></li> <li class="item-0"><a href="link5.html">fifth item</a></li> </ul> </div> </body> </html>
#讀取文件裏的內容 from lxml import etree html = etree.parse('hello.html') result = etree.tostring(html, pretty_print=True) print(result)
獲取li標籤裏的東西python2.7
html = etree.parse('hello.html')ide print type(html)學習 result = html.xpath('//li')ui print resultspa print len(result)code print type(result) print type(result[0]) |
參考文章:http://cuiqingcai.com/2621.html
說明:此篇博客僅僅是爲了本身學習lxml模塊,故沒好好寫,下面是我微信二維碼