Scrapy框架中的 UA假裝

例如:百度輸入ip查看是本身本機的ip,經過UA假裝成其餘機器的ip,html

爬蟲代碼:dom

 1 import scrapy
 2 
 3 
 4 class UatestSpider(scrapy.Spider):
 5     name = 'UATest'
 6     # allowed_domains = ['www.xxx.com']
 7     start_urls = ['https://www.baidu.com/s?wd=ip']
 8     def parse(self, response):
 9         with open('./ip.html','w',encoding='utf-8')as fp:
10             fp.write(response.text)
11             print('over!!!')
爬蟲代碼

Middlewares中間件代碼:scrapy

 1 from scrapy import signals
 2 from scrapy.contrib.downloadermiddleware.useragent import UserAgentMiddleware
 3 import  random
 4 user_agent_list = [
 5         "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 "
 6         "(KHTML, like Gecko) Chrome/22.0.1207.1 Safari/537.1",
 7         "Mozilla/5.0 (X11; CrOS i686 2268.111.0) AppleWebKit/536.11 "
 8         "(KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11",
 9         "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.6 "
10         "(KHTML, like Gecko) Chrome/20.0.1092.0 Safari/536.6",
11         "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.6 "
12         "(KHTML, like Gecko) Chrome/20.0.1090.0 Safari/536.6",
13         "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.1 "
14         "(KHTML, like Gecko) Chrome/19.77.34.5 Safari/537.1",
15         "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.5 "
16         "(KHTML, like Gecko) Chrome/19.0.1084.9 Safari/536.5",
17         "Mozilla/5.0 (Windows NT 6.0) AppleWebKit/536.5 "
18         "(KHTML, like Gecko) Chrome/19.0.1084.36 Safari/536.5",
19         "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 "
20         "(KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3",
21         "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/536.3 "
22         "(KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3",
23         "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_0) AppleWebKit/536.3 "
24         "(KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3",
25         "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 "
26         "(KHTML, like Gecko) Chrome/19.0.1062.0 Safari/536.3",
27         "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 "
28         "(KHTML, like Gecko) Chrome/19.0.1062.0 Safari/536.3",
29         "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 "
30         "(KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3",
31         "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 "
32         "(KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3",
33         "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/536.3 "
34         "(KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3",
35         "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 "
36         "(KHTML, like Gecko) Chrome/19.0.1061.0 Safari/536.3",
37         "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.24 "
38         "(KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24",
39         "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/535.24 "
40         "(KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24"
41 ]
42 
43 class UAPool(UserAgentMiddleware):
44     def process_request(self,request,spider):
45         ua=random.choice(user_agent_list)
46         request.headers['User-Agent']=ua
47         print(request.headers['User-Agent'])
48 
49 proxy_http = ['125.27.10.150:56292','114.34.168.157:46160']
50 proxy_https = ['1.20.101.81:35454','113.78.254.156:9000']
51 class UapoolDownloaderMiddleware(object):
52     #request參數就是攔截到的 請求對象
53     #spider就是爬蟲對象
54     def process_request(self, request, spider):
55         if request.url.split(':')[0]=='https':
56             request.meta['proxy']='https://'+random.choice(proxy_https)
57         else:
58             request.meta['proxy'] = 'http://' + random.choice(proxy_http)
59         print(request.meta['proxy'])
60         return None
middlewares

注:setting須要解開中間件,並添加本身寫的中間件類ide

相關文章
相關標籤/搜索