scrapy版本:1.5.0css
scrapy內置selector創建在lxml上。html
能夠使用xpath和css方法來進行解析,二者都返回列表;node
sel = Selector(text=body).xpath('//div[@class="ip_list"/text()]').extract()express
selector中也能夠使用re()方法進行正則解析,使用方法相似於re庫;less
class scrapy.selector.
Selector
(response=None, text=None, type=None)scrapy
response is an HtmlResponse or an XmlResponse object that will be used for selecting and extracting data.spa
text is a unicode string or utf-8 encoded text for cases when a response isn’t available. Using text and response together is undefined behavior.code
type defines the selector type, it can be "html", "xml" or None (default).xml
If type is None, the selector automatically chooses the best type based on response type (see below), or defaults to "html" in case it is used together with text.htm
If type is None and a response is passed, the selector type is inferred from the response type as follows:
"html" for HtmlResponse type
"xml" for XmlResponse type
"html" for anything else
Otherwise, if type is set, the selector type will be forced and no detection will occur.
Apply the given regex and return a list of unicode strings with the matches.
regex can be either a compiled regular expression or a string which will be compiled to a regular expression using re.compile(regex)
Serialize and return the matched nodes as a list of unicode strings. Percent encoded content is unquoted.
remove_namespaces
()Remove all namespaces, allowing to traverse the document using namespace-less xpaths. See example below.
selector類對象是內建list的一個子類,能夠理解爲多個selector對象組合,對selectorlist對象使用xpath,css,extract,re方法能夠理解爲對list中每個對象使用方法後再將返回組合爲一個列表(注意:返回值並非做爲一個總體進行插入)。