模擬登錄百度併發帖

概述

用程序模擬提交表單登陸百度。javascript

意義

從實用意義上說,這種問題其實意義不大,而且也並不適合寫成博客。百度網頁在不斷變化,而此博客內容卻不會相應更新,沒法保證內容的正確性。 從學習知識方面說,這種問題適合做爲學習課題。這幾天學了下python,感觸良多。python確實比java靈活,語法也有許多漂亮的特性。好比多行字符串,raw字符串(無需轉義的字符串),在java中都沒有,好難受。 這種問題須要耐心,像破解密碼同樣,須要去嘗試,去理解,去猜測,耗費時間和精力,性價比較低,有這功夫就不如多學點別的。仍是應該多多學習,孔子曰:終日而思,不如須臾之所學也。意思是說:思考一天不如學習半晌。html

研究方法

chrome瀏覽器,ctrl+u打開源代碼,f12打開開發者工具。重點監測network,設置成preserve模式,實驗以前清空過去的cookie和緩存等信息,排除干擾。剩下的任務就是盯着network的同時,執行登陸,登出,發帖,評論等動做,而後查看cookie的變化及返回結果。從數據中發揮想象力,大膽猜想,尋找規律。java

登陸百度原理

三步走,登錄成功python

1. 首先,訪問百度的任何一個頁面,都會得到一個百度id(BAIDUID),這是一個cookie;正則表達式

2. 其次,訪問https://passport.baidu.com/v2/api/?getapi&tpl=mn&apiver=v3&class=login頁面獲取tokenchrome

3. 最後,對https://passport.baidu.com/v2/api/?login頁面提交表單apache

token頁的訪問

v2表示version2。此頁面get請求後面的參數不一樣返回的結果也不一樣。 一開始抓包時,在瀏覽器中看到的參數是這樣的:json

- getapi:python3.x

- tpl:mnapi

- apiver:v3

- tt:1461752974956 登陸時間

- class:login行爲是登陸,而非其它行爲

- gid:36400D4-3078-460D-ABD8-9DEFBA99604B

- logintype:dialogLogin 登陸類型,經過對話框登陸

- callback:bdcbsgyljq1 回調

返回的結果是這樣的:

bd__cbs__gyljq1({
    "errInfo": {
        "no": "0"
    },
    "data": {
        "rememberedUserName": "1661686074@qq.com",
        "codeString": "",
        "token": "dda98eb93a3011ca4165a01b342a4622",
        "cookie": "1",
        "usernametype": "2",
        "spLogin": "newuser",
        "disable": "",
        "loginrecord": {
            'email': [],
            'phone': []
        }
    }
})

這是一個不純的json串,有些冗餘。errInfo結果爲0表示一切順利,data是一個json串,裏面惟一有用的信息是token。 這個get請求參數有許可能是多餘的,請求參數不一樣,返回的結果不一樣,能夠直接在瀏覽器地址欄中測試。去掉callback參數以後,最後變成:getapi&apiver=v3參數,返回的json串就變得十分周正了。

{
    "errInfo": {
        "no": "0"
    },
    "data": {
        "rememberedUserName": "",
        "codeString": "",
        "token": "dda98eb93a3011ca4165a01b342a4622",
        "cookie": "1",
        "usernametype": "",
        "spLogin": "newuser",
        "disable": "",
        "loginrecord": {
            'email': [],
            'phone': []
        }
    }
}

再去掉apiver=v3(apiversion=3)屬性,參數變爲:getapi&class=login時,返回值就變成了鍵值對的方式:

var bdPass=bdPass||{};
bdPass.api=bdPass.api||{};
bdPass.api.params=bdPass.api.params||{};
bdPass.api.params.login_token='dda98eb93a3011ca4165a01b342a4622';
bdPass.api.params.login_tpl='mn';
document.write('<script type="text/javascript" charset="UTF-8" src="https://passport.baidu.com/js/pass_api_login.js?v=20131115"></script>');

因此,version2須要getapi和class=longin兩個屬性,version3須要getapi和apiver=v3兩個屬性,其中沒有tpl=mn屬性登陸會失敗,雖然能夠返回json串,可是用於登陸時,必需要有tpl=mn屬性。 version2好像log4j得配置文件有沒有。如何解析出來呢?能夠用正則表達式,也能夠用json解析version3的返回結果,還能夠用屬性解析version2的返回結果。正則表達式效果應該最好。 對於這種問題有兩個原則(雖然矛盾,要尋找一個平衡): * 能刪的參數儘可能刪掉 * 沒有必要費精力試參數,直接全弄上,多了總不會錯(但可能格式會難看一些)

向login頁post表單

在瀏覽器中看到的表單數據項不少,其中有許可能是毫無用處的,沒有它們照樣登錄成功。通過刪了測,測了刪,發現只有以下表單有用:

{
    "token": token,
    "tpl": "mn",
    "loginmerge": True,
    "username": username,
    "password": password
}

再刪就要出錯了,tpl=mn這個屬性仍是必不可少。在這一步裏,用到了第二步獲取的token。 登陸百度,不須要設置瀏覽器頭部假裝成瀏覽器,也不須要假裝referer等頭部,直接get,get,post三步走就登錄成功了。cookie也不須要管,由於api本身處理了cookie。本次請求會自動帶上上次得到的cookie。 登陸百度首頁以後,就能夠訪問百度的各個部分了(包括貼吧,知道等)。 如何驗證有沒有登錄成功呢?有兩種方法: 0. 訪問www.baidu.com,看看頁面裏面有沒有本身的名字 1. 查看cookie裏面有沒有PTOKEN和STOKEN等關鍵cookie

百度返回值說明,no表示錯誤碼(0爲正常),errorcode也表示錯誤碼,error表示錯誤信息,data表示數據:

"no": 40,
"err_code": 40,
"error": null,
"data"

登陸百度python實現

import json

from pip._vendor.requests.sessions import Session

global username, password, token
username = 'xxxxx'
password = 'xxxxx'
s = Session()


# python2.x與python3.x差異很是大
# 過去使用urllib,urllib2,如今使用request包

def showCookie(cookies):
    for i in cookies:
        print(i)
        i.domain = '*'
    print('*' * 20)


# 第一步,訪問百度,獲取cookie百度ID
s.get("http://www.baidu.com")
# 第二步,訪問密碼網頁,獲取token,此頁面返回一個json串。後面的參數不一樣返回的結果不一樣,抓包以後,嘗試着刪除了許多沒用的參數
resp = s.get("https://passport.baidu.com/v2/api/?getapi&tpl=mn&apiver=v3")
# json.dumps能夠識別包含單引號的json串,json.loads卻不能
t = json.loads(resp.text.replace('\'', '\"'))
token = t['data']['token']
# 第三步,提交表單。通過測試,只有下面五個數據是必需的
data = {
    "token": token,
    "tpl": "mn",
    "loginmerge": True,
    "username": username,
    "password": password
}
resp = s.post("https://passport.baidu.com/v2/api/?login", data)

登陸百度java實現

用到第三方庫fastjson進行json解析,apache httpclient進行網絡請求。

static HttpClient client = HttpClients.createDefault();

static void login(String username, String password)
        throws ClientProtocolException, IOException {
    HttpGet homePage = new HttpGet("http://tieba.baidu.com");
    client.execute(homePage);
    HttpGet getToken = new HttpGet(
            "https://passport.baidu.com/v2/api/?getapi&tpl=mn&apiver=v3&class=login");
    HttpResponse resp = client.execute(getToken);
    String json = EntityUtils.toString(resp.getEntity());
    JSONObject obj = JSON.parseObject(json);
    String token = obj.getJSONObject("data").getString("token");
    HttpPost loginPost = new HttpPost(
            "https://passport.baidu.com/v2/api/?login");
    List<NameValuePair> list = new ArrayList<>();
    list.add(new BasicNameValuePair("token", token));
    list.add(new BasicNameValuePair("username", username));
    list.add(new BasicNameValuePair("password", password));
    list.add(new BasicNameValuePair("tpl", "mn"));
    list.add(new BasicNameValuePair("loginmerge", "true"));
    UrlEncodedFormEntity loginData = new UrlEncodedFormEntity(list);
    loginPost.setEntity(loginData);
    client.execute(loginPost);
}

百度貼吧數據類型

終於登錄成功了,下一步就要發帖了。百度貼吧的數據類型十分重要,這個層次不分清楚就很差辦。

* 貼吧forum,forum是論壇的意思,貼吧是一個大話題,是一個比較大的類型。它下面是許多分支,是大話題的細化。好比:金庸吧包括不少thread,武功最高的人是誰?掃地僧和獨孤求敗誰厲害?....每個thread都會引起不少post(帖子),而每個帖子又會引來人們的評論。

* 話題thread,提出話題就至關於準備蓋個樓。

* 帖子post,每個post都爲thread添加了一層樓。

* 評論comment,每個post下面能夠有評論,這樣針對性才強,蓋樓是表達本身的觀點猶如重武器,長槍大戟發前人之所未發,評論就像小匕首短兵刃同樣更直接。

常見錯誤類型:

* 沒有tbs,返回230308,其中錯誤碼是308,230是前綴

* 265是錯誤碼,230是固定的前綴,這個大概是發帖太頻繁,禁止發帖

* 40是驗證碼:發帖太頻繁,須要驗證碼,若是能破解驗證碼,那天然是大大的好,下面就是返回的json串,str_reason表示「請點擊驗證碼完成發貼」。

{
    "no": 40,
    "err_code": 40,
    "error": null,
    "data": {
        "autoMsg": "",
        "fid": 1847502,
        "fname": "\u5927\u5b66\u751f\u52b1\u5fd7",
        "tid": 0,
        "is_login": 1,
        "content": "",
        "vcode": {
            "need_vcode": 1,
            "str_reason": "\u8bf7\u70b9\u51fb\u9a8c\u8bc1\u7801\u5b8c\u6210\u53d1\u8d34",
            "captcha_vcode_str": "captchaservice303662633978724c7655642b44707538683879667741516b6c2f4262726d36777477486b356749525449362b39495a426642746d6d744d5178716236766c4a575650742f6f4b4b57534d576656385534766158757678644979672b56742f56776237523631766d6e33754a567a654d62767a7238646a6632703447653477673568695544454a2b6146695a6651525763705657396b6c45614a334d6a75375664425452684977702b306d3866306a346350365755634f763835614f72426d4c5478596e41587749525773372b38746c66443949764156423478776a644d37476a746a674b4374396348574636644d617a457043714d796d48644641676c466a55716e5841587162646465624e4b6171356a733041502f456c7649636f5879326177514c67473164636638482f76487a55",
            "captcha_code_type": 4,
            "userstatevcode": 0
        },
        "mute_text": null
    }
}

重要的tbs

tbs從http://tieba.baidu.com/dc/common/tbs得到,只須要get一下,解析出json傳中的tbs便可。tbs至關於貼吧通行證,你發的每個評論,蓋的每一層樓提交時都須要提交tbs,它們的tbs能夠相同,這個tbs是你最近一次得到的tbs,服務器上維持着一份hashmap記錄用戶id和tbs值。

百度貼吧發帖

許多屬性是不必的,一次成功以後刪繁就簡,刪了測測了刪,發現header是沒有用的,許多表單域也是無關緊要的。 表單屬性介紹以下:

* kw:thread名稱,也就是話題名稱

* tid: threadId,也就是話題id,在地址欄中就能夠看見tid。

* fid:一個thread好像fid都是同樣的,大概跟tid差很少吧,反正據我觀察,在一個話題下發了不少貼它是不變的。 打開一個話題主頁,好比:http://tieba.baidu.com/p/4195311174,ctrl+u查看源代碼,ctrl+f查找關鍵詞fid,很容易發現整頁上的fid都是如出一轍的。 百度用什麼作主鍵:用long作主鍵!百度也不用uuid,博客園也不用uuid,很明顯能夠從網頁上看出來。

static void newPost() throws ClientProtocolException, IOException {
        HttpPost post = new HttpPost(
                "http://tieba.baidu.com/f/commit/post/add");
        HashMap<String, String> paramMap = new HashMap<String, String>();
        paramMap.put("kw", "大學生勵志");
        paramMap.put("fid", "1847502");
        paramMap.put("tid", "4135933166");
        paramMap.put("tbs", tbs);
        paramMap.put("content", "天下大勢爲我所控");
        List<NameValuePair> list = new ArrayList<>();
        for (Map.Entry<String, String> i : paramMap.entrySet()) {
            BasicNameValuePair pair = new BasicNameValuePair(i.getKey(),
                    i.getValue());
            System.out.println(pair.getName() + ":" + pair.getValue());
            list.add(pair);
        }
        RequestConfig config = RequestConfig.custom().setSocketTimeout(5000)
                .setConnectTimeout(5000).build();
        post.setConfig(config);
        post.setEntity(new UrlEncodedFormEntity(list, "utf-8"));
        HttpResponse resp = client.execute(post);
        System.out.println(EntityUtils.toString(resp.getEntity()));
}

java確實很長,仍是看看python吧,簡短有力更適合描述問題。java並非老是冗長,它要想簡單也很容易設計出簡潔的API。

java的複雜來源於三個個方面:

* 庫設計的不合理,」每次只作一件事,每步只作一件事「的哲學有點像彙編,有些冗長,而且java不屑於設計語法糖

* 問題自己就很複雜,須要進行許多配置,更靈活,python雖短,可能有些事情無法辦,由於封裝地太嚴密了,留下的接口太少了。

* 庫設計的不合理+問題自己就複雜。這裏面有一個機率問題,複雜問題不經常使用,你卻讓人們用大量的時間去考慮它們,這就不如預先設計一種簡單不完善的接口。寧肯簡單的缺憾,也不要複雜的完善。舉一個例子,選中多行按下tab鍵以後是應該縮進仍是應該替換,固然是縮進了,若是我要替換我是不會這麼操做的,縮進帶來的簡捷性很是大。

resp = s.get("http://tieba.baidu.com/dc/common/tbs")
tbs = json.loads(resp.text)['tbs']
data = {
    "kw": "大學生勵志",
    "fid": "1847502",  # first post id
    "tid": "4135933166",  # 貼吧id
    "tbs": tbs,  # 很重要
    "content": "如今下午兩點四十二"
}
resp = s.post("http://tieba.baidu.com/f/commit/post/add",data)
print(resp.text)
print("over")

百度貼吧發評論

static void newReply() throws ClientProtocolException, IOException {
    String data = "kw=大學生勵志&fid=1847502&tid=4135933166"e_id=78685072053&rich_text=1&tbs="
            + tbs
            + "&content=五樓的也能夠評論floornum無論用嗎&lp_type=0&lp_sub_type=0&new_vcode=1&tag=11&repostid=78685072053&anonymous=0";
    HttpPost post = new HttpPost(
            "http://tieba.baidu.com/f/commit/post/add");
    List<NameValuePair> list = new ArrayList<>();
    for (String i : data.split("&")) {
        int p = i.indexOf('=');
        BasicNameValuePair pair = new BasicNameValuePair(i.substring(0, p),
                i.substring(p + 1));
        System.out.println(pair.getName() + ":" + pair.getValue());
        list.add(pair);
    }
    post.setEntity(new UrlEncodedFormEntity(list, "utf-8"));
    HttpResponse resp = client.execute(post);
    System.out.println(EntityUtils.toString(resp.getEntity()));
}

把上面的代碼串聯起來,是這樣子的:

String username = "xxxxxx"; ;
String password = "xxxxxx"; 
login(username, password);
tbs = getTbs();
newPost();
newReply();

關鍵是tbs只須要獲取一次,而後做爲全局變量存在就能夠了,不須要反覆獲取。

獲取貼吧話題列表

tieba.baidu.com/f/index/feedlist?tagid=all&limit=2000000&offset=0 這個連接十分重要。有些連接須要複製到地址欄才能訪問而不能直接跳轉過去,由於服務器可能不容許跨域訪問。 它的參數 tagid=like | all 表示請求的標籤列表的類型,like表示只返回我喜歡的,all表示返回所有。 limit表示條數,offset表示偏移量。它還有許多其餘參數,好比last_tid最後一條的時間(用於加載更多),&_表示 這個連接是怎麼知道的,訪問tieba.baidu.com加載更多就會向這個feedlist發出請求。 經過jsoup解析html就能夠獲得好多貼吧及它們的tid了,而後點進去就能夠得到fid了,有了tid和fid就能夠蓋樓了。

 

關於java代碼如何寫短

apache的httpClient組件包含多個部分,好比httpAsycClient是帶回調函數的請求服務器;fluent部分是流暢版的httpclient,寫起來簡直溜溜溜。不信請看:

        Request.Get("http://somehost/")
                .connectTimeout(1000)
                .socketTimeout(1000)
                .execute().returnContent().asString();

        // Execute a POST with the 'expect-continue' handshake, using HTTP/1.1,
        // containing a request body as String and return response content as byte array.
        Request.Post("http://somehost/do-stuff")
                .useExpectContinue()
                .version(HttpVersion.HTTP_1_1)
                .bodyString("Important stuff", ContentType.DEFAULT_TEXT)
                .execute().returnContent().asBytes();

        // Execute a POST with a custom header through the proxy containing a request body
        // as an HTML form and save the result to the file
        Request.Post("http://somehost/some-form")
                .addHeader("X-Custom-header", "stuff")
                .viaProxy(new HttpHost("myproxy", 8080))
                .bodyForm(Form.form().add("username", "vip").add("password", "secret").build())
                .execute().saveContent(new File("result.dump"));

python完整代碼

import json

from pip._vendor.requests.sessions import Session

global username, password, token
username = 'xxxxx'
password = 'xxxxx'
s = Session()


# python2.x與python3.x差異很是大
# 過去使用urllib,urllib2,如今使用request包

def showCookie(cookies):
    for i in cookies:
        print(i)
        i.domain = '*'
    print('*' * 20)


# 第一步,訪問百度,獲取cookie百度UID
s.get("http://www.baidu.com")
# 第二步,訪問密碼網頁,獲取token,此頁面返回一個json串。後面的參數不一樣返回的結果不一樣,抓包以後,嘗試着刪除了許多沒用的參數
resp = s.get("https://passport.baidu.com/v2/api/?getapi&tpl=mn&apiver=v3")
# 必須把單引號轉化成雙引號,不然沒法經過json進行解析,python3.x開始走向嚴格和規範了
t = json.loads(resp.text.replace('\'', '\"'))
token = t['data']['token']
# 第三步,提交表單。通過測試,只有下面五個數據是必需的
data = {
    "token": token,
    "tpl": "mn",
    "loginmerge": True,
    "username": username,
    "password": password
}
resp = s.post("https://passport.baidu.com/v2/api/?login", data) 
resp = s.get("http://tieba.baidu.com/dc/common/tbs")
tbs = json.loads(resp.text)['tbs']
data = {
    "kw": "大學生勵志",
    "fid": "1847502",  # first post id
    "tid": "4135933166",  # 貼吧id
    "tbs": tbs,  # 很重要
    "content": "如今下午兩點四十二"
}
resp = s.post("http://tieba.baidu.com/f/commit/post/add",data)
print(resp.text)
print("over")

java完整代碼:依賴httpclient和fastjson第三方庫

public class Main {
    static HttpClient client = HttpClients.createDefault();
    static String tbs;
    static String host = "Host: tieba.baidu.com";
    static String useragent = "User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36";
    static void login(String username, String password)
            throws ClientProtocolException, IOException {
        HttpGet homePage = new HttpGet("http://tieba.baidu.com");
        client.execute(homePage);
        HttpGet getToken = new HttpGet(
                "https://passport.baidu.com/v2/api/?getapi&tpl=mn&apiver=v3&class=login");
        HttpResponse resp = client.execute(getToken);
        String json = EntityUtils.toString(resp.getEntity());
        JSONObject obj = JSON.parseObject(json);
        String token = obj.getJSONObject("data").getString("token");
        HttpPost loginPost = new HttpPost(
                "https://passport.baidu.com/v2/api/?login");
        List<NameValuePair> list = new ArrayList<>();
        list.add(new BasicNameValuePair("token", token));
        list.add(new BasicNameValuePair("username", username));
        list.add(new BasicNameValuePair("password", password));
        list.add(new BasicNameValuePair("tpl", "mn"));
        list.add(new BasicNameValuePair("loginmerge", "true"));
        UrlEncodedFormEntity loginData = new UrlEncodedFormEntity(list);
        loginPost.setEntity(loginData);
        client.execute(loginPost);
    }
    static String getTbs() throws ClientProtocolException, IOException {
        HttpGet get = new HttpGet("http://tieba.baidu.com/dc/common/tbs");
        HttpResponse resp = client.execute(get);
        String s = EntityUtils.toString(resp.getEntity());
        JSONObject json = JSON.parseObject(s);
        return json.getString("tbs");
    }
    static void newReply() throws ClientProtocolException, IOException {
        String data = "kw=大學生勵志&fid=1847502&tid=4135933166"e_id=78685072053&rich_text=1&tbs="
                + tbs
                + "&content=五樓的也能夠評論floornum無論用嗎&lp_type=0&lp_sub_type=0&new_vcode=1&tag=11&repostid=78685072053&anonymous=0";
        HttpPost post = new HttpPost(
                "http://tieba.baidu.com/f/commit/post/add");
        List<NameValuePair> list = new ArrayList<>();
        for (String i : data.split("&")) {
            int p = i.indexOf('=');
            BasicNameValuePair pair = new BasicNameValuePair(i.substring(0, p),
                    i.substring(p + 1));
            System.out.println(pair.getName() + ":" + pair.getValue());
            list.add(pair);
        }
        post.setEntity(new UrlEncodedFormEntity(list, "utf-8"));
        HttpResponse resp = client.execute(post);
        System.out.println(EntityUtils.toString(resp.getEntity()));
    }
    static void newPost() throws ClientProtocolException, IOException {
        HttpPost post = new HttpPost(
                "http://tieba.baidu.com/f/commit/post/add");
        HashMap<String, String> paramMap = new HashMap<String, String>();
        paramMap.put("kw", "大學生勵志");
        paramMap.put("fid", "1847502");
        paramMap.put("tid", "4135933166");
        paramMap.put("tbs", tbs);
        paramMap.put("content", "魏印福");
        List<NameValuePair> list = new ArrayList<>();
        for (Map.Entry<String, String> i : paramMap.entrySet()) {
            BasicNameValuePair pair = new BasicNameValuePair(i.getKey(),
                    i.getValue());
            System.out.println(pair.getName() + ":" + pair.getValue());
            list.add(pair);
        }
        RequestConfig config = RequestConfig.custom().setSocketTimeout(5000)
                .setConnectTimeout(5000).build();
        post.setConfig(config);
        post.setEntity(new UrlEncodedFormEntity(list, "utf-8"));
        HttpResponse resp = client.execute(post);
        System.out.println(EntityUtils.toString(resp.getEntity()));
    }
    public static void main(String[] args)
            throws ClientProtocolException, IOException {
        String username = "xxxxxx";
        String password = "xxxxxx";
        login(username, password);
        tbs = getTbs();
        newPost();
        newReply();
    }
}

我參考的代碼

這代碼確實是寫得好

class HttpUtils {

    /**
     * map轉換成entity
     * 
     * <a href="http://twitter.com/param">@param</a> map
     *            待處理的
     * <a href="http://twitter.com/return">@return</a> 處理後的數據
     */
    public static HttpEntity mapToEntity(HashMap<String, String> map)
            throws Exception {
        BasicNameValuePair pair = null;
        List<BasicNameValuePair> params = new ArrayList<BasicNameValuePair>();
        for (Map.Entry<String, String> m : map.entrySet()) {
            pair = new BasicNameValuePair(m.getKey(), m.getValue());
            params.add(pair);
        }
        HttpEntity entity = new UrlEncodedFormEntity(params, "UTF-8");
        return entity;
    }

    /**
     * 取文本之間的字符串
     * 
     * <a href="http://twitter.com/param">@param</a> string
     *            源字符串
     * <a href="http://twitter.com/param">@param</a> start
     *            開始字符串
     * <a href="http://twitter.com/param">@param</a> end
     *            結束字符串
     * <a href="http://twitter.com/return">@return</a> 成功返回中間子串,失敗返回null
     */
    public static String mid(String string, String start, String end) {
        int s = string.indexOf(start) + start.length();
        int e = string.indexOf(end, s);
        if (s > 0 && e > s)
            return string.substring(s, e);
        return null;
    }

    /**
     * 
     * <a href="http://twitter.com/param">@param</a> regex
     *            正則表達式
     * <a href="http://twitter.com/param">@param</a> input
     *            待匹配的字符串
     * <a href="http://twitter.com/return">@return</a> 返回的是匹配的list集合(可能因爲正則表達式的不一樣有多條記錄)
     */
    public static ArrayList<String> myRegex(String regex, String input) {
        ArrayList<String> list = new ArrayList<String>();
        Pattern p = Pattern.compile(regex);
        Matcher m = p.matcher(input);
        while (m.find()) {
            list.add(m.group());
        }
        return list;
    }

}
public class Baidu {
    private CloseableHttpClient httpClient; // 模擬客戶端
    private String postFid; // 發帖用的fid
    private String postName = ""; // 發帖指定的貼吧名
    private CloseableHttpResponse response; // 存儲請求返回的信息
    private String html; // 存儲返回的html頁面
    private boolean isQL = false; // 標記是否在搶二樓

    public boolean isQL() {
        return isQL;
    }
    public void setQL(boolean isQL) {
        this.isQL = isQL;
        if (isQL == true)
            System.out.println("開始搶二樓了。");
        else
            System.out.println("關閉搶二樓了。");
    }

    /**
     * 登陸
     **/
    public boolean login(String username, String password) {
        // 是否成功登錄的標記
        boolean isLogin = false;
        httpClient = HttpClients.createDefault();
        try {
            /** 1,BAIDUID **/
            String baiduId = null;
            HttpGet get_main = new HttpGet(
                    "http://tieba.baidu.com/dc/common/tbs/");
            response = httpClient.execute(get_main);
            get_main.abort();
            HeaderIterator it = response.headerIterator("Set-Cookie");
            while (it.hasNext())
                baiduId = it.next().toString();
            baiduId = HttpUtils.mid(baiduId, ":", ";");
            System.out.println("1,BAIDUID:" + baiduId);

            /** 2,token **/
            HttpGet get_token = new HttpGet(
                    "https://passport.baidu.com/v2/api/?getapi&tpl=mn");
            response = httpClient.execute(get_token);
            String token = EntityUtils.toString(response.getEntity(), "utf-8");
            get_token.abort();
            token = HttpUtils.mid(token, "_token='", "'");
            System.out.println("2,TOKEN:" + token);

            /** 3,Login **/
            HashMap<String, String> map = new HashMap<String, String>();
            map.put("username", username);
            map.put("password", password);
            map.put("token", token);
            map.put("isPhone", "false");
            map.put("quick_user", "0");
            map.put("tt", System.currentTimeMillis() + "");
            map.put("loginmerge", "true");
            map.put("logintype", "dialogLogin");
            map.put("splogin", "rate");
            map.put("mem_pass", "on");
            map.put("tpl", "mn");
            map.put("apiver", "v3");
            map.put("u", "http://www.baidu.com/");
            map.put("safeflg", "0");
            map.put("ppui_logintime", "43661");
            map.put("charset", "utf-8");

            // 封裝
            HttpEntity entity = HttpUtils.mapToEntity(map);
            HttpPost http_login = new HttpPost(
                    "https://passport.baidu.com/v2/api/?login");
            http_login.setEntity(entity);
            response = httpClient.execute(http_login);
            http_login.abort();

            it = response.headerIterator();
            while (it.hasNext()) {
                // 這裏是根據是否寫入的BDUSS-cookie判斷是否登陸成功
                if (it.next().toString().contains("BDUSS")) {
                    isLogin = true;
                    break;
                }
            }
            System.out.println("3,登陸狀態" + isLogin);
            return isLogin;
        } catch (Exception e) {
            throw new RuntimeException("未知錯誤");
        }

    }

    /**
     * 發佈帖子
     * 
     * <a href="http://twitter.com/throws">@throws</a> Exception
     */
    public String writeTiebaItem(String tiebaName, String title, String content)
            throws Exception {
        String tbs = null;
        HashMap<String, String> paramMap = new HashMap<String, String>();
        String nowTime = System.currentTimeMillis() + "";
        // 判斷是不是第一次在這個吧發帖,若是不是就獲取fid,反之沒必要,由於fid是固定不變的
        if (!postName.equals(tiebaName)) {
            postFid = getFid(tiebaName);
            postName = tiebaName;
        }

        System.out.println("fid:" + postFid);
        if (postFid == null) {
            System.err.println("未知錯誤");
            return "未知錯誤";
        }
        /** 拿到tbs */
        tbs = getTbs();
        System.out.println("tbs:" + tbs);
        paramMap.put("ie", "utf-8");
        paramMap.put("kw", postName);
        paramMap.put("fid", postFid);
        paramMap.put("tid", "0");
        paramMap.put("vcode_md5", "");
        paramMap.put("floor_num", "0");
        paramMap.put("rich_text", "1");
        paramMap.put("tbs", tbs);
        paramMap.put("content", content);
        paramMap.put("title", title);
        paramMap.put("prefix", "");
        paramMap.put("files", URLEncoder.encode("[]", "utf-8"));
        paramMap.put("sign_id", "24179251");
        paramMap.put("mouse_pwd",
                "45,46,39,51,46,39,45,42,22,46,51,47,51,46,51,47,51,46,51,47,51,46,"
                        + "51,47,51,46,51,47,22,46,41,39,38,47,22,46,44,41,41,51,40,41,39,"
                        + nowTime + "0");
        paramMap.put("mouse_pwd_t", nowTime);
        paramMap.put("mouse_pwd_isclick", "0");
        paramMap.put("__type__", "thread");
        HttpEntity entity = HttpUtils.mapToEntity(paramMap);
        HttpPost post = new HttpPost(
                "http://tieba.baidu.com/f/commit/thread/add");

        post.setEntity(entity);
        response = httpClient.execute(post);
        html = EntityUtils.toString(response.getEntity());
        if (html.contains("\"no\":0,\"err_code\":0")) {
            return "在" + tiebaName + "吧發帖成功";
        } else {
            return "發帖失敗了,錯誤碼信息:" + html;
        }

    }
    private String getFid(String tiebaName) throws Exception {
        HttpResponse response = null;
        String fid = null;
        ArrayList<String> urllist = getTieziUrl(tiebaName);
        if (urllist.size() == 0) {
            return null;
        }
        // 隨便進個帖子 拿到 fid
        HttpGet get = new HttpGet(urllist.get(1));
        response = httpClient.execute(get);
        html = EntityUtils.toString(response.getEntity());
        fid = HttpUtils.myRegex("fid(=|:')[0-9].+?(&|',)", html).get(0);
        if (fid.contains("=")) {
            fid = HttpUtils.mid(fid, "=", "&");
        }
        if (fid.contains(":")) {
            fid = HttpUtils.mid(fid, ":'", "',");
        }
        return fid;
    }
    private ArrayList<String> get0Answer(String tiebaName) throws Exception {
        HttpResponse response = null;
        ArrayList<String> urlList = new ArrayList<String>();
        ArrayList<String> topTidList;
        String tiebaUrl = "http://tieba.baidu.com/f?ie=utf-8&kw="
                + URLEncoder.encode(tiebaName, "UTF-8");
        HttpGet get = new HttpGet(tiebaUrl);
        response = httpClient.execute(get);
        html = EntityUtils.toString(response.getEntity());
        if (html.contains("抱歉,根據相關法律法規和政策,本吧暫不開放")) {
            return urlList;
        }
        Document doc = Jsoup.parse(html);
        // 獲得首頁除置頂帖以外的全部帖子
        Elements els = doc.select("li[class= j_thread_list clearfix]");
        for (Element e : els) {
            String str = e.text().toString();
            // System.out.println(str);
            // 如過開頭是0表明0個回覆。 str的內容是: 0 測試 陌生人左右丶 00:36
            if (str.startsWith("0")) {
                Elements els1 = e.getElementsByTag("a");
                for (Element e1 : els1) {
                    String url = e1.attr("href");
                    topTidList = HttpUtils.myRegex("/p/[0-9]{2,12}", url);
                    for (int i = 0; i < topTidList.size(); i++) {
                        url = topTidList.get(i);
                        urlList.add("http://tieba.baidu.com" + url);
                    }
                }
            }
        }
        return urlList;
    }
    /**
     * <a href="http://twitter.com/param">@param</a> tiebaName
     *            要獲取url的貼吧名稱
     * <a href="http://twitter.com/return">@return</a> 返回指定貼吧的首頁帖子url集合
     * <a href="http://twitter.com/throws">@throws</a> IOException
     */
    private ArrayList<String> getTieziUrl(String tiebaName) throws Exception {
        HttpResponse response = null;
        ArrayList<String> urlList = new ArrayList<String>();
        ArrayList<String> topTidList;
        try {
            String tiebaUrl = "http://tieba.baidu.com/f?ie=utf-8&kw="
                    + URLEncoder.encode(tiebaName, "UTF-8");
            HttpGet get = new HttpGet(tiebaUrl);
            response = httpClient.execute(get);
            html = EntityUtils.toString(response.getEntity());
            if (html.contains("抱歉,根據相關法律法規和政策,本吧暫不開放")) {
                return urlList;
            }
            Document doc = Jsoup.parse(html);
            Elements els = doc.select("li[class= j_thread_list clearfix]");
            for (Element e : els) {
                Elements els1 = e.getElementsByTag("a");
                for (Element e1 : els1) {
                    // 首先拿到指定貼吧的 首頁的和全部帖子連接" "/p/2777392166"而後拼接成完整的url
                    String url = e1.attr("href");
                    topTidList = HttpUtils.myRegex("/p/[0-9]{2,12}", url);
                    for (int i = 0; i < topTidList.size(); i++) {
                        url = topTidList.get(i);
                        urlList.add("http://tieba.baidu.com" + url);
                    }
                }
            }
        } catch (UnsupportedEncodingException e) {
            e.printStackTrace();
        }
        return urlList;
    }

    /** 拿到tbs (下面是一個獲取tbs的api) */
    private String getTbs() throws Exception {

        HttpGet get = new HttpGet("http://tieba.baidu.com/dc/common/tbs");
        response = httpClient.execute(get);
        html = EntityUtils.toString(response.getEntity());
        return HttpUtils.mid(html, ":\"", "\",");
    }

    /** 回覆帖 */
    public String replyPost(String tid, String content, String tiebaName)
            throws Exception {
        /** 暫時還沒想到辦法來獲取floor_num **/
        String floor_num = "1";
        if (!postName.equals(tiebaName)) {
            postFid = getFid(tiebaName);
            postName = tiebaName;
        }
        System.out.println("fid:" + postFid);
        if (postFid == null) {
            System.err.println("未知錯誤");
            return "未知錯誤";
        }
        String tbs = getTbs();
        String nowTime = System.currentTimeMillis() + "";
        // 構造map集合形式的回帖表單
        HashMap<String, String> paramMap = new HashMap<String, String>();
        paramMap.put("ie", "utf-8");
        paramMap.put("kw", postName);
        paramMap.put("fid", postFid);
        paramMap.put("tid", tid);
        paramMap.put("vcode_md5", "");
        paramMap.put("floor_num", floor_num);
        paramMap.put("rich_text", "1");
        paramMap.put("tbs", tbs);
        paramMap.put("content", content);
        paramMap.put("files", "[]");
        paramMap.put("mouse_pwd",
                "45,46,39,51,46,39,45,42,22,46,51,47,51,46,51,47,51,46,51,47,51,46,"
                        + "51,47,51,46,51,47,22,46,41,39,38,47,22,46,44,41,41,51,40,41,39,"
                        + nowTime + "0");
        paramMap.put("mouse_pwd_t", nowTime);
        paramMap.put("mouse_pwd_isclick", "0");
        paramMap.put("__type__", "reply");
        HttpEntity entity = HttpUtils.mapToEntity(paramMap);
        HttpPost post = new HttpPost(
                "http://tieba.baidu.com/f/commit/post/add");
        // 設置回帖延遲,否則會被百度斷定發帖過快
        RequestConfig config = RequestConfig.custom().setSocketTimeout(5000)
                .setConnectTimeout(5000).build();
        post.setConfig(config);
        post.setEntity(entity);
        response = httpClient.execute(post);

        html = EntityUtils.toString(response.getEntity());
        System.out.println(html);
        if (html.contains("\"no\":0,\"err_code\":0")) {
            return "在" + tiebaName + "吧成功搶到一個二樓";
        } else {
            return "回帖失敗了,錯誤碼信息:" + html;
        }
    }

    /** 搶二樓 **/
    public void TakeTheSecondFloor(final String tiebaName,
            final String contents[], final int time) {
        final int len = contents.length;
        new Thread(new Runnable() {
            <a href="http://twitter.com/Override">@Override</a>
            public void run() {
                while (isQL) {
                    try {
                        Random random = new Random();
                        int index = random.nextInt(len);
                        String tid;
                        ArrayList<String> linksList = get0Answer(tiebaName);
                        for (int i = 0; i < linksList.size()
                                && linksList.size() != 0; i++) {
                            tid = linksList.get(i).substring(25);
                            String message = replyPost(tid, contents[index],
                                    tiebaName);
                            System.out.println(message);
                        }
                        Thread.sleep(time);
                    } catch (Exception e) {
                        e.printStackTrace();
                    }
                }
            }
        }).start();
    }
}
相關文章
相關標籤/搜索