xpath

時間 2019-11-10

標籤 xpath 简体版

原文原文鏈接

xpath實例：

html

路徑表達式	結果
/bookstore/book[1]	選取屬於 bookstore 子元素的第一個 book 元素。
/bookstore/book[last()]	選取屬於 bookstore 子元素的最後一個 book 元素。
/bookstore/book[last()-1]	選取屬於 bookstore 子元素的倒數第二個 book 元素。
/bookstore/book[position()<3]	選取最前面的兩個屬於 bookstore 元素的子元素的 book 元素。
//title[@lang]	選取全部擁有名爲 lang 的屬性的 title 元素。
//title[@lang='eng']	選取全部 title 元素，且這些元素擁有值爲 eng 的 lang 屬性。
/bookstore/book[price>35.00]	選取 bookstore 元素的全部 book 元素，且其中的 price 元素的值須大於 35.00。
/bookstore/book[price>35.00]/title	選取 bookstore 元素中的 book 元素的全部 title 元素，且其中的 price 元素的值須大於 35.00。

路徑表達式	結果
bookstore	選取 bookstore 元素的全部子節點。
/bookstore	選取根元素 bookstore。node 註釋：假如路徑起始於正斜槓( / )，則此路徑始終表明到某元素的絕對路徑！python
bookstore/book	選取屬於 bookstore 的子元素的全部 book 元素。
//book	選取全部 book 子元素，而無論它們在文檔中的位置。
bookstore//book	選擇屬於 bookstore 元素的後代的全部 book 元素，而無論它們位於 bookstore 之下的什麼位置。
//@lang	選取名爲 lang 的全部屬性。

XPath 通配符可用來選取未知的 HTML元素。scrapy

通配符	描述
*	匹配任何元素節點。
@*	匹配任何屬性節點。
node()	匹配任何類型的節點。

在下面的表格中，咱們列出了一些路徑表達式，以及這些表達式的結果：url

路徑表達式	結果
/bookstore/*	選取 bookstore 元素的全部子元素。
//*	選取文檔中的全部元素。
//title[@*]	選取全部帶有屬性的 title 元素。

xpath 例子：spa

 1 #!/usr/bin/env python
 2 # -*- coding:utf-8 -*-
 3 from scrapy.selector import Selector, HtmlXPathSelector
 4 from scrapy.http import HtmlResponse
 5 html = """<!DOCTYPE html>
 6 <html>
 7     <head lang="en">
 8         <meta charset="UTF-8">
 9         <title></title>
10     </head>
11     <body>
12         <ul>
13             <li class="item-"><a id='i1' href="link.html">first item</a></li>
14             <li class="item-0"><a id='i2' href="llink.html">first item</a></li>
15             <li class="item-1"><a href="llink2.html">second item<span>vv</span></a></li>
16         </ul>
17         <div><a href="llink2.html">second item</a></div>
18     </body>
19 </html>
20 """
21 response = HtmlResponse(url='http://example.com', body=html,encoding='utf-8')
22 # hxs = HtmlXPathSelector(response)
23 # print(hxs)
24 # hxs = Selector(response=response).xpath('//a')
25 # print(hxs)
26 # hxs = Selector(response=response).xpath('//a[2]')
27 # print(hxs)
28 # hxs = Selector(response=response).xpath('//a[@id]')
29 # print(hxs)
30 # hxs = Selector(response=response).xpath('//a[@id="i1"]')
31 # print(hxs)
32 # hxs = Selector(response=response).xpath('//a[@href="link.html"][@id="i1"]')
33 # print(hxs)
34 # hxs = Selector(response=response).xpath('//a[contains(@href, "link")]')
35 # print(hxs)
36 # hxs = Selector(response=response).xpath('//a[starts-with(@href, "link")]')
37 # print(hxs)
38 # hxs = Selector(response=response).xpath('//a[re:test(@id, "i\d+")]')
39 # print(hxs)
40 # hxs = Selector(response=response).xpath('//a[re:test(@id, "i\d+")]/text()').extract()
41 # print(hxs)
42 # hxs = Selector(response=response).xpath('//a[re:test(@id, "i\d+")]/@href').extract()
43 # print(hxs)
44 # hxs = Selector(response=response).xpath('/html/body/ul/li/a/@href').extract()
45 # print(hxs)
46 # hxs = Selector(response=response).xpath('//body/ul/li/a/@href').extract_first()
47 # print(hxs)
48  
49 # ul_list = Selector(response=response).xpath('//body/ul/li')
50 # for item in ul_list:
51 #     v = item.xpath('./a/span')
52 #     # 或
53 #     # v = item.xpath('a/span')
54 #     # 或
55 #     # v = item.xpath('*/a/span')
56 #     print(v)

相關標籤/搜索

c#+xpath+htmlagilitypack

XPath 教程

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。