遇到百度雲加速,網頁內容爬不到的快速解決

在爬網站時,發現網站作了百度雲加速,每次訪問首頁時要求輸入驗證碼,才能打開網站首頁cookie

沒采用網上自動解析驗證碼圖片的方案,快過年了,不想PIP,快速解決快速回家session

通過分析網站,發現若是你拿到一個當期可用的Cooikes後,你就能夠一直爬數據,且並不會觸發百度驗證輸入scrapy

代碼以下(注意:代碼中的網址、Cookies都是假的,若是想用代碼,把你本身的網址和Cookies替換上工具

   

   

import request 

from datetime import datetime, timedelta from scrapy.selector import Selector s=requests.session() headers = { 'cookie': '__cfduid=134343474e8d3f723cae541fb7d7f6b01f1546501720; _ga=GA1.2.573376275.1546501778; _gid=GA1.2.543022193.1549014020; cf_clearance=b19851c48ae560c62485879ac37a257a3f12df1e-1549086155-1800-250; ', 'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/536.34 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.34', } url = 'https://www.samle.com/news/page/2/' res = requests.get(url,headers=headers) hxs = Selector(res) #print(res.text) datePub = hxs.xpath('//main[@class="content"]//time/text()').extract() #print (datePub) links = hxs.xpath('//main[@class="content"]//h2/a') for index, link in enumerate(links): pubDateStr = datePub[index] pubDateStr = pubDateStr.strip() url = ''.join(link.xpath('./@href').extract()) item_pubDateStr = datetime.strftime(pubDate, '%Y-%m-%d') item_res = requests.get(url, headers=headers) item_hxs = Selector(item_res) item_title = item_hxs.xpath("//h2/text()").extract() item_content = item_hxs.xpath("//main//div[@class='econtent']/p//text()").extract() item_datePublish =item_pubDateStr item_linkAddress = url filename = datetime.now().strftime('%Y%m%d%H%M%S%f')+".txt" str_result = '{"linkAddress":["'+url+'"],' str_result = str_result + '"title":["'+item_title[0]+'"],' str_result = str_result + '"datePublish": ["'+item_pubDateStr+'"],' if len(item_content)>1: str_result = str_result + '"content": ["'+item_content[0]+'"]}' else: str_result = str_result + '"content": ["' + "" + '"]}' if len(str_result) >0: with open(filename, 'w',encoding='utf-8') as f: f.write(str_result) print(item_title)

 

如何獲取當期可用的Cookies的方法:網站

打開Chorme,打開「開發者工具」(按F12)ui

訪問網址後url

去開發發工具裏的「Network」Tab頁裏去找它的Cookies!spa

 

enjoy :Pcode

相關文章
相關標籤/搜索