python爬蟲之User-Agent用戶信息

python爬蟲之User-Agent用戶信息

  爬蟲是自動的爬取網站信息,實質上咱們也只是一段代碼,並非真正的瀏覽器用戶,加上User-Agent(用戶代理,簡稱UA)信息,只是讓咱們假裝成一個瀏覽器用戶去訪問網站,然而一個用戶頻繁的訪問一個網站很容易被察覺,既然咱們能夠假裝成瀏覽器,那麼也一樣能夠經過UA信息來變換咱們的身份。python

  整理部分UA信息

Opera
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36 OPR/26.0.1656.60
Opera/8.0 (Windows NT 5.1; U; en)
Mozilla/5.0 (Windows NT 5.1; U; en; rv:1.8.1) Gecko/20061208 Firefox/2.0.0 Opera 9.50
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 9.50chrome

Firefox
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0
Mozilla/5.0 (X11; U; Linux x86_64; zh-CN; rv:1.9.2.10) Gecko/20100922 Ubuntu/10.10 (maverick) Firefox/3.6.10瀏覽器

Safari
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534.57.2 (KHTML, like Gecko) Version/5.1.7 Safari/534.57.2 python爬蟲

chrome
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11
Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.16 (KHTML, like Gecko) Chrome/10.0.648.133 Safari/534.16dom

360
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.101 Safari/537.36
Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Geckoide

淘寶瀏覽器
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.11 TaoBrowser/2.0 Safari/536.11網站

獵豹瀏覽器
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.71 Safari/537.1 LBBROWSER
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; LBBROWSER) 
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E; LBBROWSER)" ui

QQ瀏覽器
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; QQBrowser/7.0.3698.400)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E) url

sogou瀏覽器
Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.84 Safari/535.11 SE 2.X MetaSr 1.0
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; SV1; QQDownload 732; .NET4.0C; .NET4.0E; SE 2.X MetaSr 1.0) spa

maxthon瀏覽器
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Maxthon/4.4.3.4000 Chrome/30.0.1599.101 Safari/537.36

UC瀏覽器
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.122 UBrowser/4.0.3214.0 Safari/537.36

------UA信息連接:https://blog.csdn.net/tao_627/article/details/42297443 ------

 User-Agent的添加方法

  UA的添加方法有三種:1.實例化Request類時添加;2.調用Request類的實例方法add_header()動態添加;3.建立opener,賦值opener.addheaders修改。

  方法一

 1 import urllib.request
 2 
 3 
 4 def load_message():
 5     url = 'https://www.baidu.com'
 6 
 7     header = {
 8         'user-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.92 \
 9 Safari/537.36',
10         'test': '123'
11     }
12 
13     request = urllib.request.Request(url, headers=header)  # 添加請求頭
14 
15     response = urllib.request.urlopen(request)
16     response_str = response.read().decode('utf-8')
17 
18     request_header_get = request.get_header('User-agent')  # 坑:查看時必須首字母大寫,其餘字母均小寫,不然返回None
19     print(request_header_get)  # 獲取請求頭的指定內容方式
20 
21     # request_header_get = request.get_header('Test')
22     # print(request_header_get)
23     #
24     # request_header_get = request.get_header('test')
25     # print(request_header_get)
26 
27     return response.headers, request.headers, response_str
28 
29 
30 response_header, request_header, response_data = load_message()
31 print(response_header)
32 print('------------------------------------')
33 print(request_header)
34 print('------------------------------------')
35 print(response_data)
View Code

  方法二

 1 import urllib.request
 2 
 3 
 4 def load_message():
 5     url = 'https://www.baidu.com'
 6 
 7     request = urllib.request.Request(url)
 8 
 9     request.add_header('user-agent', 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko)\
10 Chrome/69.0.3497.92Safari/537.36')  # 動態添加請求頭信息
11 
12     response = urllib.request.urlopen(request)
13     response_str = response.read().decode('utf-8')
14 
15     request_header_get = request.get_header('User-agent')  # 坑:查看時必須首字母大寫,其餘字母均小寫,不然返回None
16     print(request_header_get)  # 獲取請求頭的指定內容方式
17 
18     return response.headers, request.headers, response_str
19 
20 
21 response_header, request_header, response_data = load_message()
22 print(response_header)
23 print('------------------------------------')
24 print(request_header)
25 print('------------------------------------')
26 print(response_data)
View Code

方法三

1 import urllib.request
2 
3 
4 url= "http://blog.csdn.net/weiwei_pig/article/details/51178226"
5 headers=("User-Agent","Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.122 Safari/537.36 SE 2.X MetaSr 1.0")
6 
7 opener = urllib.request.build_opener()
8 opener.addheaders = [headers]
9 data=opener.open(url).read()
View Code

隨機UA信息添加至請求頭的案例

 1 #!/usr/bin/env python
 2 # -*- coding=utf-8 -*-
 3 # Author: Snow
 4 
 5 import urllib.request
 6 import random
 7 
 8 
 9 def random_agent():
10     url = 'https://www.baidu.com'
11 
12     user_agent_list = [
13         'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36 ',
14         'Mozilla/5.0 (Windows NT 5.1; U; en; rv:1.8.1) Gecko/20061208 Firefox/2.0.0 Opera 9.50',
15         'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0',
16         'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534.57.2 (KHTML, like Gecko) Version/5.1.7 Safari/534.57.2',
17         'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36',
18         'Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko',
19         'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.11 TaoBrow\
20 ser/2.0 Safari/536.11'
21     ]
22 
23     user_agent_value = random.choice(user_agent_list)
24 
25     request = urllib.request.Request(url)
26     request.add_header('User-Agent', user_agent_value)
27 
28     request_user_agent = request.get_header('User-agent')
29 
30     response = urllib.request.urlopen(request)
31     response_str = response.read().decode('utf-8')
32 
33     return request_user_agent, response_str
34 
35 
36 cat_user_agent, _ = random_agent()
37 print(cat_user_agent)
相關文章
相關標籤/搜索