Scrapy設置代理Proxy - 轉

一. From: http://www.sharejs.com/codes/Python/8309html

 

1.在Scrapy工程下新建「middlewares.py」dom

 1 # Importing base64 library because we'll need it ONLY in case if the proxy we are going to use requires authentication
 2 import base64
 3  
 4 # Start your middleware class
 5 class ProxyMiddleware(object):
 6     # overwrite process request
 7     def process_request(self, request, spider):
 8         # Set the location of the proxy
 9         request.meta['proxy'] = "http://YOUR_PROXY_IP:PORT"
10  
11         # Use the following lines if your proxy requires authentication
12         proxy_user_pass = "USERNAME:PASSWORD"
13         # setup basic authentication for the proxy
14         encoded_user_pass = base64.encodestring(proxy_user_pass)
15         request.headers['Proxy-Authorization'] = 'Basic ' + encoded_user_pass
16 
17 
18 #該代碼片斷來自於: http://www.sharejs.com/codes/Python/8309

2.在項目配置文件裏(./project_name/settings.py)添加scrapy

1 DOWNLOADER_MIDDLEWARES = {
2     'scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware': 110,
3     'project_name.middlewares.ProxyMiddleware': 100,
4 }

只要兩步,如今請求就是經過代理的了。測試一下^_^ide

 1 from scrapy.spider import BaseSpider
 2 from scrapy.contrib.spiders import CrawlSpider, Rule
 3 from scrapy.http import Request
 4  
 5 class TestSpider(CrawlSpider):
 6     name = "test"
 7     domain_name = "whatismyip.com"
 8     # The following url is subject to change, you can get the last updated one from here :
 9     # http://www.whatismyip.com/faq/automation.asp
10     start_urls = ["http://xujian.info"]
11  
12     def parse(self, response):
13         open('test.html', 'wb').write(response.body)
14 #該代碼片斷來自於: http://www.sharejs.com/codes/Python/8309

二.From: http://blog.csdn.net/haipengdai/article/details/50972983測試

http://stackoverflow.com/questions/4710483/scrapy-and-proxiesui

增長文件middlewares.py放置在setting.py平行的目錄下編碼

 1 import base64
 2 class ProxyMiddleware(object):
 3 # overwrite process request
 4 def process_request(self, request, spider):
 5     # Set the location of the proxy
 6     request.meta['proxy'] = "http://YOUR_PROXY_IP:PORT"
 7 
 8     # Use the following lines if your proxy requires authentication
 9     proxy_user_pass = "USERNAME:PASSWORD"
10     # setup basic authentication for the proxy
11     encoded_user_pass = base64.b64encode(proxy_user_pass)
12     request.headers['Proxy-Authorization'] = 'Basic ' + encoded_user_pass

不少網上的答案使用base64.encodestring來編碼proxy_user_pass,有一種狀況,當username太長的時候,會出現錯誤,因此推薦使用b64encode編碼方式url

而後在setting.py中,在DOWNLOADER_MIDDLEWARES中把它打開,projectname.middlewares.ProxyMiddleware: 1就能夠了spa

相關文章
相關標籤/搜索