Python爬蟲視頻教程零基礎小白到scrapy爬蟲高手-輕鬆入門python
https://item.taobao.com/item.htm?spm=a1z38n.10677092.0.0.482434a6EmUbbW&id=564564604865服務器
能夠每隔10秒更換ipapp
http://www.jb51.net/article/65513.htmdom
http://blog.csdn.net/yueguanghaidao/article/details/25246867scrapy
今天同事想測試WAF的頁面統計功能,因此須要模擬多個IP向多個域名發送請求,也就是須要修改源IP地址。這個若是使用socket庫就比較麻煩了,socket
須要使用raw socket,至關麻煩。還好咱有scapy,輕鬆搞定。工具
DOMAIN是我隨機構造的域名庫,SOURCE也是隨機構造的源IP地址。測試
-
- from scapy.all import *
- from threading import Thread
- from Queue import Queue
- import random
- import string
-
-
- USER_AGENTS = (
- "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_7_0; en-US) AppleWebKit/534.21 (KHTML, like Gecko) Chrome/11.0.678.0 Safari/534.21",
- "Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)",
- "Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.2) Gecko/20020508 Netscape6/6.1",
- "Mozilla/5.0 (X11;U; Linux i686; en-GB; rv:1.9.1) Gecko/20090624 Ubuntu/9.04 (jaunty) Firefox/3.5",
- "Opera/9.80 (X11; U; Linux i686; en-US; rv:1.9.2.3) Presto/2.2.15 Version/10.10"
- )
-
- TOP_DOMAIN = ('com','org','net','gov','edu','mil','info','name','biz')
-
- DOMAIN = ["www.%s.%s" %(
- '.'.join(''.join(random.sample(string.ascii_lowercase, random.randint(2,6))) for x in range(random.randint(1,2))),
- random.choice(TOP_DOMAIN))
- for _ in range(100)
- ]
-
-
- SOURCE = ['.'.join((str(random.randint(1,254)) for _ in range(4))) for _ in range(100)]
-
- class Scan(Thread):
- HTTPSTR = 'GET / HTTP/1.0\r\nHost: %s\r\nUser-Agent: %s\r\n\r\n'
- def run(self):
- for _ in xrange(100):
- domain = random.choice(DOMAIN)
- http = self.HTTPSTR % (domain,random.choice(USER_AGENTS))
- try:
- request = IP(src=random.choice(SOURCE),dst=domain) / TCP(dport=80) / http
-
- send(request)
- except:
- pass
-
- task = []
- for x in range(10):
- t = Scan()
- task.append(t)
-
- for t in task:
- t.start()
-
- for t in task:
- t.join()
-
- print 'all task done!'
但這將致使一個問題,因爲咱們域名是隨機構造的,發送請求確定首先查找DNS,極可能解析失敗。這裏有兩個方法解決這個問題:google
1.將全部域名添加到hosts本地文件中,IP能夠爲服務器地址spa
2. 因爲hosts文件不支持通配符表示,因此能夠使用DNS代理,或者本身寫小工具,想怎麼解析就怎麼解析,這裏有一個,http://code.google.com/p/marlon-tools/source/browse/tools/dnsproxy/dnsproxy.py