筆記-scrapy-selector

時間 2019-12-16

標籤筆記 scrapy selector 欄目 Python 简体版

原文原文鏈接

筆記-scrapy-selector

scrapy版本：1.5.0css

1.總述

scrapy內置selector創建在lxml上。html

2.使用

能夠使用xpath和css方法來進行解析，二者都返回列表；node

sel = Selector(text=body).xpath('//div[@class="ip_list"/text()]').extract()express

selector中也能夠使用re()方法進行正則解析，使用方法相似於re庫；less

3.類用經常使用屬性

Selector objects

class scrapy.selector.Selector(response=None, text=None, type=None)scrapy

response is an HtmlResponse or an XmlResponse object that will be used for selecting and extracting data.spa

text is a unicode string or utf-8 encoded text for cases when a response isn’t available. Using text and response together is undefined behavior.code

type defines the selector type, it can be "html", "xml" or None (default).xml

If type is None, the selector automatically chooses the best type based on response type (see below), or defaults to "html" in case it is used together with text.htm

If type is None and a response is passed, the selector type is inferred from the response type as follows:

"html" for HtmlResponse type
"xml" for XmlResponse type
"html" for anything else
Otherwise, if type is set, the selector type will be forced and no detection will occur.