1.Windows7x64_SP1css
2.anaconda3 + python3.7.3(anaconda集成,不需單獨安裝)html
3.scrapy1.6.0python
scrapy shell http://doc.scrapy.org/en/latest/_static/selectors-sample1.html
結果以下:shell
result = response.xpath('//a')
結果以下:scrapy
[<Selector xpath='//a' data='<a href="image1.html">Name: My image 1 <'>, <Selector xpath='//a' data='<a href="image2.html">Name: My image 2 <'>, <Selector xpath='//a' data='<a href="image3.html">Name: My image 3 <'>, <Selector xpath='//a' data='<a href="image4.html">Name: My image 4 <'>, <Selector xpath='//a' data='<a href="image5.html">Name: My image 5 <'>]
result = response.css('a')
結果以下:函數
[<Selector xpath='descendant-or-self::a' data='<a href="image1.html">Name: My image 1 <'>, <Selector xpath='descendant-or-self::a' data='<a href="image2.html">Name: My image 2 <'>, <Selector xpath='descendant-or-self::a' data='<a href="image3.html">Name: My image 3 <'>, <Selector xpath='descendant-or-self::a' data='<a href="image4.html">Name: My image 4 <'>, <Selector xpath='descendant-or-self::a' data='<a href="image5.html">Name: My image 5 <'>]
type(result)
結果以下:spa
scrapy.selector.unified.SelectorList
result.extract()
結果以下:命令行
['<a href="image1.html">Name: My image 1 <br><img src="image1_thumb.jpg"></a>', '<a href="image2.html">Name: My image 2 <br><img src="image2_thumb.jpg"></a>', '<a href="image3.html">Name: My image 3 <br><img src="image3_thumb.jpg"></a>', '<a href="image4.html">Name: My image 4 <br><img src="image4_thumb.jpg"></a>', '<a href="image5.html">Name: My image 5 <br><img src="image5_thumb.jpg"></a>']
response.xpath('//a/text()')
結果以下:htm
[<Selector xpath='//a/text()' data='Name: My image 1 '>, <Selector xpath='//a/text()' data='Name: My image 2 '>, <Selector xpath='//a/text()' data='Name: My image 3 '>, <Selector xpath='//a/text()' data='Name: My image 4 '>, <Selector xpath='//a/text()' data='Name: My image 5 '>]
查看HTML內容blog
response.xpath('//a/text()').extract()
結果以下:
['Name: My image 1 ', 'Name: My image 2 ', 'Name: My image 3 ', 'Name: My image 4 ', 'Name: My image 5 ']
response.css('a::text').extract()
結果以下:
['Name: My image 1 ', 'Name: My image 2 ', 'Name: My image 3 ', 'Name: My image 4 ', 'Name: My image 5 ']
response.xpath('//a/@href').extract()
結果以下:
['image1.html', 'image2.html', 'image3.html', 'image4.html', 'image5.html']
response.css('a::attr("href")').extract()
結果以下:
['image1.html', 'image2.html', 'image3.html', 'image4.html', 'image5.html']
response.xpath('//a/img').extract()
結果以下:
['<img src="image1_thumb.jpg">', '<img src="image2_thumb.jpg">', '<img src="image3_thumb.jpg">', '<img src="image4_thumb.jpg">', '<img src="image5_thumb.jpg">']
response.css('a img').extract()
結果以下:
['<img src="image1_thumb.jpg">', '<img src="image2_thumb.jpg">', '<img src="image3_thumb.jpg">', '<img src="image4_thumb.jpg">', '<img src="image5_thumb.jpg">']
再提取其中的src屬性值,與步驟6相同
response.xpath('//a/img/@src').extract()
response.css('a img::attr("src")').extract()