Python 抓取html全部特定元素的方法

時間 2019-12-10

原文原文鏈接

直接上代碼哦，夠直接了吧~html

from lxml import etree
#import mechanize
import lxml.html
#import cookielib

#br = mechanize.Browser()
#r = br.open('http://yourdomain.com')
#html = br.response().read()
#root = lxml.html.fromstring(html)
#divs = root.xpath("//div[@class='test']")
hparser = etree.HTMLParser(encoding='utf-8') #for avoiding unicode codec problems
htree = etree.parse('http://yourdomain.com',hparser)
htree.write('/tmp/bi.html') 
divs= htree.xpath("//div[@class='test']")

要獲取class包含test的全部div，好比<div class="test website"></div> web

把上述xpath的參數修改成 "div[contains(@class,'test')]" 便可。cookie

1. JS獲取HTML DOM元素的方法
2. selenium抓取元素排除某個特定的class標籤
3. python中刪除list的特定值的元素的方法
4. HTML全部元素的分類
5. HTML 頭部元素
6. HTML --頭部元素
7. redis 獲取 list 中的全部元素
8. appium-- 頁面元素抓取
9. Python + Selenium 八種元素定位方法
10. jQuery獲取特定子元素
更多相關文章...
• Web 品質 - 重要的 HTML 元素 - 網站品質教程
• SQLite - Python - SQLite教程
• JDK13 GA發佈：5大特性解讀
• 常用的分佈式事務解決方案

相關標籤/搜索