爬蟲是自動的爬取網站信息,實質上咱們也只是一段代碼,並非真正的瀏覽器用戶,加上User-Agent(用戶代理,簡稱UA)信息,只是讓咱們假裝成一個瀏覽器用戶去訪問網站,然而一個用戶頻繁的訪問一個網站很容易被察覺,既然咱們能夠假裝成瀏覽器,那麼也一樣能夠經過UA信息來變換咱們的身份。python
Opera
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36 OPR/26.0.1656.60
Opera/8.0 (Windows NT 5.1; U; en)
Mozilla/5.0 (Windows NT 5.1; U; en; rv:1.8.1) Gecko/20061208 Firefox/2.0.0 Opera 9.50
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 9.50chrome
Firefox
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0
Mozilla/5.0 (X11; U; Linux x86_64; zh-CN; rv:1.9.2.10) Gecko/20100922 Ubuntu/10.10 (maverick) Firefox/3.6.10瀏覽器
Safari
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534.57.2 (KHTML, like Gecko) Version/5.1.7 Safari/534.57.2 python爬蟲
chrome
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11
Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.16 (KHTML, like Gecko) Chrome/10.0.648.133 Safari/534.16dom
360
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.101 Safari/537.36
Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Geckoide
淘寶瀏覽器
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.11 TaoBrowser/2.0 Safari/536.11網站
獵豹瀏覽器
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.71 Safari/537.1 LBBROWSER
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; LBBROWSER)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E; LBBROWSER)" ui
QQ瀏覽器
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; QQBrowser/7.0.3698.400)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E) url
sogou瀏覽器
Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.84 Safari/535.11 SE 2.X MetaSr 1.0
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; SV1; QQDownload 732; .NET4.0C; .NET4.0E; SE 2.X MetaSr 1.0) spa
maxthon瀏覽器
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Maxthon/4.4.3.4000 Chrome/30.0.1599.101 Safari/537.36
UC瀏覽器
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.122 UBrowser/4.0.3214.0 Safari/537.36
------UA信息連接:https://blog.csdn.net/tao_627/article/details/42297443 ------
UA的添加方法有三種:1.實例化Request類時添加;2.調用Request類的實例方法add_header()動態添加;3.建立opener,賦值opener.addheaders修改。
方法一
1 import urllib.request 2 3 4 def load_message(): 5 url = 'https://www.baidu.com' 6 7 header = { 8 'user-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.92 \ 9 Safari/537.36', 10 'test': '123' 11 } 12 13 request = urllib.request.Request(url, headers=header) # 添加請求頭 14 15 response = urllib.request.urlopen(request) 16 response_str = response.read().decode('utf-8') 17 18 request_header_get = request.get_header('User-agent') # 坑:查看時必須首字母大寫,其餘字母均小寫,不然返回None 19 print(request_header_get) # 獲取請求頭的指定內容方式 20 21 # request_header_get = request.get_header('Test') 22 # print(request_header_get) 23 # 24 # request_header_get = request.get_header('test') 25 # print(request_header_get) 26 27 return response.headers, request.headers, response_str 28 29 30 response_header, request_header, response_data = load_message() 31 print(response_header) 32 print('------------------------------------') 33 print(request_header) 34 print('------------------------------------') 35 print(response_data)
方法二
1 import urllib.request 2 3 4 def load_message(): 5 url = 'https://www.baidu.com' 6 7 request = urllib.request.Request(url) 8 9 request.add_header('user-agent', 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko)\ 10 Chrome/69.0.3497.92Safari/537.36') # 動態添加請求頭信息 11 12 response = urllib.request.urlopen(request) 13 response_str = response.read().decode('utf-8') 14 15 request_header_get = request.get_header('User-agent') # 坑:查看時必須首字母大寫,其餘字母均小寫,不然返回None 16 print(request_header_get) # 獲取請求頭的指定內容方式 17 18 return response.headers, request.headers, response_str 19 20 21 response_header, request_header, response_data = load_message() 22 print(response_header) 23 print('------------------------------------') 24 print(request_header) 25 print('------------------------------------') 26 print(response_data)
方法三
1 import urllib.request 2 3 4 url= "http://blog.csdn.net/weiwei_pig/article/details/51178226" 5 headers=("User-Agent","Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.122 Safari/537.36 SE 2.X MetaSr 1.0") 6 7 opener = urllib.request.build_opener() 8 opener.addheaders = [headers] 9 data=opener.open(url).read()
1 #!/usr/bin/env python 2 # -*- coding=utf-8 -*- 3 # Author: Snow 4 5 import urllib.request 6 import random 7 8 9 def random_agent(): 10 url = 'https://www.baidu.com' 11 12 user_agent_list = [ 13 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36 ', 14 'Mozilla/5.0 (Windows NT 5.1; U; en; rv:1.8.1) Gecko/20061208 Firefox/2.0.0 Opera 9.50', 15 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0', 16 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534.57.2 (KHTML, like Gecko) Version/5.1.7 Safari/534.57.2', 17 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36', 18 'Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko', 19 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.11 TaoBrow\ 20 ser/2.0 Safari/536.11' 21 ] 22 23 user_agent_value = random.choice(user_agent_list) 24 25 request = urllib.request.Request(url) 26 request.add_header('User-Agent', user_agent_value) 27 28 request_user_agent = request.get_header('User-agent') 29 30 response = urllib.request.urlopen(request) 31 response_str = response.read().decode('utf-8') 32 33 return request_user_agent, response_str 34 35 36 cat_user_agent, _ = random_agent() 37 print(cat_user_agent)