XPath 與 CSS （parsel）

時間 2020-01-27

標籤 xpath css parsel 欄目 CSS 简体版

原文原文鏈接

前言

XPath 即爲XML路徑語言（XML Path Language）
層疊樣式表（Cascading Style Sheets）是一種用來表現 HTML或XML等文件樣式的計算機語言
parsel 是從 Scrapy 獨立出來的解析器，能夠用 XPath 或 CSS 提取 XML 或 HTML

實例

XPath 取字符串包含的方法

>>> from parsel import Selector
>>> htmlText = r'''
<html>
<body>
        <div>
                <em>Cancer Discovery</em><br>
                eISSN: 2159-8290<br>
                ISSN: 2159-8274<br>
        </div>
</body>
</html>'''
>>> sel = Selector(htmlText, type='html')

# 包含
>>> sel.xpath('/html/body/div/text()[contains(., "eISSN")]').get()
'\n                eISSN: 2159-8290'
# 不包含
>>> sel.xpath('/html/body/div/text()[not(contains(., "eISSN"))]').getall()
['\n                ', '\n                ISSN: 2159-8274', '\n        ']

XPath 與 CSS 比對

>>> from parsel import Selector
>>> htmlText = r'''
<html>
<body>
    <div class="value test">111</div>
    <div class="value test     ">222</div>
    <div class="first value test last">333</div>
    <div class="test value">444</div>
</body>
</html>'''
>>> sel = Selector(htmlText, type='html')

# 精確匹配 111
>>> sel.xpath('/html/body/div[@class="value test"]/text()').get()
'111'
>>> sel.css('div[class="value test"]::text').get()
'111'

# 匹配 1十一、22二、333
>>> sel.xpath('/html/body/div[contains(@class, "value test")]/text()').getall()
['111', '222', '333']
>>> sel.css('div[class*="value test"]::text').getall()
['111', '222', '333']

# 匹配 1十一、22二、33三、444
>>> sel.xpath('/html/body/div[contains(@class, "value") and contains(@class, "test")]/text()').getall()
['111', '222', '333', '444']
>>> sel.css('div.value.test::text').getall()
['111', '222', '333', '444']

本文出自 walker snapshot

1. XPath 與 CSS （parsel）
2. Xpath、CSS定位
3. xpath&css選擇器
4. Xpath 與Css 定位方式的比較
5. selenium的css定位與xpath定位
6. python爬蟲網頁解析之parsel模塊
7. xpath與css_selector
8. Scrapy選擇器Xpath和CSS
9. scrapy中xpath、css用法
10. BeautifulSoup 與 Xpath
更多相關文章...
• jQuery Mobile CSS 類 - jQuery Mobile 教程
• CSS 指南 - 網站建設指南
• Composer 安裝與使用
• 使用阿里雲OSS+CDN部署前端頁面與加速靜態資源

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。