【定義】css
XPath 即爲XML路徑語言(XML Path Language)
html
層疊樣式表(Cascading Style Sheets)是一種用來表現 HTML或XML等文件樣式的計算機語言。python
【實例】ide
class相關htm
>>> from parsel import Selector >>> htmlText = r''' <html> <body> <div class="value test">111</div> <div class="value test ">222</div> <div class="first value test last">333</div> <div class="test value">444</div> </body> </html>''' # 精確匹配 111 >>> sel.xpath('/html/body/div[@class="value test"]/text()').get() '111' >>> sel.css('div[class="value test"]::text').get() '111' # 匹配 1十一、22二、333 >>> sel.xpath('/html/body/div[contains(@class, "value test")]/text()').getall() ['111', '222', '333'] >>> sel.css('div[class*="value test"]::text').getall() ['111', '222', '333'] # 匹配 1十一、22二、33三、444 >>> sel.xpath('/html/body/div[contains(@class, "value") and contains(@class, "test")]/text()').getall() ['111', '222', '333', '444'] >>> sel.css('div.value.test::text').getall() ['111', '222', '333', '444']
XPath 取字符串包含的方法
blog
>>> from parsel import Selector >>> htmlText = r''' <html> <body> <div> <em>Cancer Discovery</em><br> eISSN: 2159-8290<br> ISSN: 2159-8274<br> </div> </body> </html>''' >>> sel = Selector(htmlText, type='html') >>> sel.xpath('/html/body/div/text()[contains(., "eISSN")]').get() '\n eISSN: 2159-8290'
【相關閱讀】
ip
*** walker ***字符串