很是簡單,直接上爬蟲代碼dom
# -*- coding: utf-8 -*- import scrapy import urllib import logging class TopitComSpider(scrapy.Spider): name = "topit.com" allowed_domains = ["topit.com"] start_urls = [ 'http://www.topit.me', ] def parse(self, response): counter = 0 image_urls1=response.xpath("//div[@class='catalog']/div[@class='e m'][position()<=8]/a/img/@src").extract() image_urls2=response.xpath("//div[@class='catalog']/div[@class='e m'][position()>8]/a/img/@data-original").extract() image_urls = image_urls1 + image_urls2 for url in image_urls: urllib.urlretrieve(url, "/root/pic/"+str(counter)+'.jpg') logging.debug(url) counter=counter+1 pass
遺留問題:scrapy
在用xpath匹配的時候用or將兩個表達式鏈接起來匹配不到,只好分開匹配,再把結果合併。緣由不明,有知道的朋友還請告知,謝謝ide