如何科學的搶紅包：年底致富有新招，寫個程序搶紅包

時間 2019-11-13

標籤如何科學紅包年底致富新招程序简体版

原文原文鏈接

0×00 背景

今天拜讀了來自IDF實驗室的《如何科學的搶紅包：年底致富有新招，寫個程序搶紅包》，本身這段時間正在學習爬蟲的相關知識，對scrapy框架有所瞭解，就在此代碼基礎上加進了scrapy，利用scrapy對文章中的「0×04 爬取紅包列表」進行了重寫。

0×01 scrapy框架

Scrapy，Python開發的一個快速,高層次的屏幕抓取和web抓取框架，用於抓取web站點並從頁面中提取結構化的數據。Scrapy用途普遍，能夠用於數據挖掘、監測和自動化測試。

Scrapy吸引人的地方在於它是一個框架，任何人均可以根據需求方便的修改。它也提供了多種類型爬蟲的基類，如BaseSpider、sitemap爬蟲等，最新版本又提供了web2.0爬蟲的支持。

Scrach，是抓取的意思，這個Python的爬蟲框架叫Scrapy，大概也是這個意思吧，就叫它：小刮刮吧。

簡單的一句話：利用scrapy能夠很簡單的寫出爬蟲。

0×02 微博登入、紅包可用性檢查、指定紅包抓取模塊

這幾個模板我單獨放在一個weibo類中，方面後面的scrapy的調用分析

微博登入這塊，能夠參照http://www.tuicool.com/articles/ziyQFrb 這篇文章，裏面很詳細的記錄了微博登入的全過程

代碼copy大牛，並在此基礎上進行了簡單的修改：使用requests庫進行頁面的請求。php

#-------------------------------------------------------------------------------
# Name:        weibo
# Purpose:
#
# Author:      adrain
#
# Created:     14/02/2015
# Copyright:   (c) adrain 2015
# Licence:     <your licence>
#-------------------------------------------------------------------------------
#!/usr/bin/env python
import sys
import requests
import re
import base64
import urllib
import rsa
import binascii
import os
import json
class weibo():
    def __init__(self):
        reload(sys)
        sys.setdefaultencoding('utf-8&')
    def weibo_login(self,nick,pwd):
        print u'---weibo login----'
     
   pre_login_url='http://login.sina.com.cn/sso/prelogin.php?entry=weibo&callback=sinaSSOController.preloginCallBack&su=%s&rsakt=mod&checkpin=1&client=ssologin.js(v1.4.15)&_=1400822309846'
 % nick
        pre_logn_data=requests.get(pre_login_url).text;
        #print pre_logn_data
        servertime=re.findall('"servertime":(.*?),',pre_logn_data)[0]
        pubkey = re.findall('"pubkey":"(.*?)",' , pre_logn_data)[0]
        rsakv = re.findall('"rsakv":"(.*?)",' , pre_logn_data)[0]
        nonce = re.findall('"nonce":"(.*?)",' , pre_logn_data)[0]
        login_url='http://login.sina.com.cn/sso/login.php?client=ssologin.js(v1.4.15)'
        su=base64.b64encode(urllib.quote(self.nick))
        rsaPublickey= int(pubkey,16)
        key = rsa.PublicKey(rsaPublickey,65537)
        message = str(servertime) +'\t' + str(nonce) + '\n' + str(self.pwd)
        sp = binascii.b2a_hex(rsa.encrypt(message,key))
        post_data={
                    'entry': 'weibo',
                    'gateway': '1',
                    'from': '',
                    'savestate': '7',
                    'userticket': '1',
                    'pagereferer':'',
                    'vsnf': '1',
                    'su': su,
                    'servertime': servertime,
                    'nonce': nonce,
                    'pwencode': 'rsa2',
                    'rsakv' : rsakv,
                    'sp': sp,
                    'encoding': 'UTF-8',
                    'url': 
'http://weibo.com/ajaxlogin.php?framelogin=1&callback=parent.sinaSSOController.feedBackUrlCallBack',
                    'returntype': 'META',
                    'ssosimplelogin': '1',
                    'vsnval': '',
                    'service': 'miniblog',
                    }
        header = {'User-Agent' : 'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)'}
        login_data=requests.post(login_url,data=post_data,headers=header).text.decode("utf-8").encode("gbk",'ignore')
        try:
            suss_url=re.findall("location.replace\(\'(.*?)\'\);" , login_data)[0]
            login_cookie=requests.get(suss_url).cookies
            return login_cookie
            print '----login success---'
        except Exception,e:
            print '----login error----'
            exit(0)
    def log(self,type,text):
            fp = open(type+'.txt','a')
            fp.write(text)
            fp.write('\r\n')
            fp.close()
    def check(self,id):
        infoUrl='http://huodong.weibo.com/hongbao/'+str(id)
        html=requests.get(infoUrl).text
        if 'action-type="lottery"' in  html or True: #存在搶紅包按鈕
                 
   logUrl="http://huodong.weibo.com/aj_hongbao/detailmore?page=1&type=2&_t=0&__rnd=1423744829265&uid="+str(id)
                    param={}
                    header= {
                            'Cache-Control':'no-cache',
                            'Content-Type':'application/x-www-form-urlencoded',
                            'Pragma':'no-cache',
                            'Referer':'http://huodong.weibo.com/hongbao/detail?uid='+str(id),
                            'User-Agent':'Mozilla/5.0 (Windows 
NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) 
Chrome/33.0.1750.146 BIDUBrowser/6.x Safari/537.36',
                            'X-Requested-With':'XMLHttpRequest'
                            }
                    res=requests.post(logUrl,data=param,headers=header)
                    pMoney=re.compile(r'<span class="money">(\d+?.+?)\xd4\xaa</span>',re.DOTALL) #h獲取全部list_info的正則
                    luckyLog=pMoney.findall(html,re.DOTALL)
                    if len(luckyLog)==0:
                            maxMoney=0
                    else:
                            maxMoney=float(luckyLog[0])
                    if maxMoney<10: #記錄中最大紅包小於設定值
                            print u'-----紅包金額太少，不須要抽取-----'
                            return False
        else:
                    print u"---------手慢一步---------"
                    print  "----------......----------"
                    return False
        return True
    def getLucky(self,id):
        print u'-----抽取紅包中：'+str(id)+"-----"
        if self.check(id)==False:
            return
        luck_url='http://huodong.weibo.com/aj_hongbao/getlucky'
        lucky_data={'ouid':id,
                    'share':0,
                    '_t':0}
        header= {
                    'Cache-Control':'no-cache',
                    'Content-Type':'application/x-www-form-urlencoded',
                    'Origin':'http://huodong.weibo.com',
                    'Pragma':'no-cache',
                    'Referer':'http://huodong.weibo.com/hongbao/'+str(id),
                    'User-Agent':'Mozilla/5.0 (Windows NT 6.1; 
WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.146 
BIDUBrowser/6.x Safari/537.36',
                    'X-Requested-With':'XMLHttpRequest'
                    }
        cookie=self.weibo_login('xxxx','xxxx')
        res=requests.post(luck_url,data=lucky_data,cookies=cookie).text
        #print res
        res_json=json.loads(res)
        print res_json
        if res_json["code"]=='901114': #今天紅包已經搶完
            print u"---------已達上限---------"
            print  "----------......----------"
            self.log('lucky',str(id)+'---'+str(res_json["code"])+'---'+res_json["data"]["title"])
            exit(0)
        elif res_json["code"]=='100000':#成功
            print u"---------恭喜發財---------"
            print  "----------......----------"
            self.log('success',str(id)+'---'+res)
            exit(0)
        if res_json["data"] and res_json["data"]["title"]:
            print res_json["data"]["title"]
            print  "----------......----------"
            self.log('lucky',str(id)+'---'+str(res_json["code"])+'---'+res_json["data"]["title"])
        else:
            print u"---------請求錯誤---------"
            print  "----------......----------"
            self.log('lucky',str(id)+'---'+res)

裏面對是否抓取紅包僅僅使用了判斷了紅包的可用性，並無將紅包的權值放進檢查中

寫完後，可簡單使用一個id進行測試：html

wb=weibo()
python
wb.getLucky(11111111)web

0×03 抓取紅包連接

首先是使用scrapy建立一個新的工程：ajax

scrapy startproject weibo_spiderjson

將在目錄下建立一個weibo_spider的工程，目錄結構爲：
cookie

因爲是剛開始瞭解scrapy，簡單是隻是用到了items.py pipelines.py。
items.py說白了就是定義你要抓取那些東西。

好比咱們這個就是抓取發紅包的url及分析用到的id值，那麼該寫items文件內容：app

# -*- coding: utf-8 -*-
# Define here the models for your scraped items
#
# See documentation in:
# http://doc.scrapy.org/en/latest/topics/items.html
import scrapy
class WeiboSpiderItem(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    url=scrapy.Field()
    hongbao_id=scrapy.Field()
    pass

後面就是主要爬蟲程序的編寫，該核心程序放在spiders文件夾下面，運行的時候會運行spiders目錄下面的全部py文件
框架
首先把代碼貼出來，再慢慢說：
dom

from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.selector import Selector
from weibo_hongbao.items import WeiboHongbaoItem
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
import re
class weibo_spider(CrawlSpider):
        name='weibo_spider'
        allowed_domains=['weibo.com']
        start_urls=['http://huodong.weibo.com/hongbao/']
        rules=(
                        Rule(SgmlLinkExtractor(allow = (r'http://huodong.weibo.com/hongbao/special_.+?'))),
                        Rule(SgmlLinkExtractor(allow = (r'http://huodong.weibo.com/hongbao/top_.+?'))),
                        Rule(SgmlLinkExtractor(allow = (r'http://huodong.weibo.com/hongbao/cate?type=.+?'))),
                        Rule(SgmlLinkExtractor(allow = (r'http://huodong.weibo.com/hongbao/theme'))),
                        
Rule(SgmlLinkExtractor(allow=(r'http://huodong.weibo.com/hongbao/\d+?')),
 callback="parse_page",follow=True),
                )
        def parse_page(self,response):
                #ids=[]
                sel=Selector(response)
                item=WeiboHongbaoItem()
                try:
                        id=re.findall('hongbao/(\d+)',response.url)[0]
                        item['hongbao_id']=id
                        item['url']=response.url
                except Exception,e:
                        print 'the id is wrong!!'
                return item

前面的import主要是引入scrapy的一些核心庫
allowed_domains 容許爬蟲爬的domian域

start_urls 爬蟲開始的url地址，這裏就設置爲讓紅包飛的主頁

rules 是設定爬取的url的，allow是容許爬蟲的url類型，還有一些拒絕爬行的可參考官方文檔。這裏前幾個allow 用正則匹配處存在發紅包的主題列表和排行榜列表，也就是把大神文章中的themeUrl和topUrl放在這裏，這裏面其實有個默認的參數是 follow，默認是開啓的，也就是會跟進到這個url頁面裏面繼續爬行。最後一個是要分析的，紅包地址都是http://huodong.weibo.com/hongbao/ 數字id，這個後面加了個callback，也就是說爬行這個url返回的數據放進callback函數中進行處理

parse_page函數中的參數response，該參數的屬性參考官方文檔，很簡單。因爲咱們只是須要id值放在weibo.getLucky中，直接用到了正則去response.url中進行匹配，直接返回item
到此直接能夠抓取到發紅包的url,紅包的id值，可是直接運行是沒有結果的，咱們須要用到管道文件pipelines.py把結果展現出來

編寫pipelines.py：

from scrapy import signals

import json  
import codecs
from weibo_hongbao.weibo import weibo 
class WeiboHongbaoPipeline(object):
        def __init__(self):
                self.file=codecs.open('weibo.json', 'w', encoding='utf-8')  
        def process_item(self, item, spider):
                wb=weibo()
                wb.getLucky(item['hongbao_id'])
                line = json.dumps(dict(item), ensure_ascii=False) + "\n"  
                self.file.write(line)  
                return item
        def spider_closed(self,spider):
                self.file.close()

每找到一個item就把id號放進weibo中進行獲取紅包。最後把全部找到的紅包url及id放進了文件中

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。