用程序模擬提交表單登陸百度。javascript
從實用意義上說,這種問題其實意義不大,而且也並不適合寫成博客。百度網頁在不斷變化,而此博客內容卻不會相應更新,沒法保證內容的正確性。 從學習知識方面說,這種問題適合做爲學習課題。這幾天學了下python,感觸良多。python確實比java靈活,語法也有許多漂亮的特性。好比多行字符串,raw字符串(無需轉義的字符串),在java中都沒有,好難受。 這種問題須要耐心,像破解密碼同樣,須要去嘗試,去理解,去猜測,耗費時間和精力,性價比較低,有這功夫就不如多學點別的。仍是應該多多學習,孔子曰:終日而思,不如須臾之所學也。意思是說:思考一天不如學習半晌。html
chrome瀏覽器,ctrl+u打開源代碼,f12打開開發者工具。重點監測network,設置成preserve模式,實驗以前清空過去的cookie和緩存等信息,排除干擾。剩下的任務就是盯着network的同時,執行登陸,登出,發帖,評論等動做,而後查看cookie的變化及返回結果。從數據中發揮想象力,大膽猜想,尋找規律。java
三步走,登錄成功python
1. 首先,訪問百度的任何一個頁面,都會得到一個百度id(BAIDUID),這是一個cookie;正則表達式
2. 其次,訪問https://passport.baidu.com/v2/api/?getapi&tpl=mn&apiver=v3&class=login頁面獲取tokenchrome
3. 最後,對https://passport.baidu.com/v2/api/?login頁面提交表單apache
v2表示version2。此頁面get請求後面的參數不一樣返回的結果也不一樣。 一開始抓包時,在瀏覽器中看到的參數是這樣的:json
- getapi:python3.x
- tpl:mnapi
- apiver:v3
- tt:1461752974956 登陸時間
- class:login行爲是登陸,而非其它行爲
- gid:36400D4-3078-460D-ABD8-9DEFBA99604B
- logintype:dialogLogin 登陸類型,經過對話框登陸
- callback:bdcbsgyljq1 回調
返回的結果是這樣的:
bd__cbs__gyljq1({ "errInfo": { "no": "0" }, "data": { "rememberedUserName": "1661686074@qq.com", "codeString": "", "token": "dda98eb93a3011ca4165a01b342a4622", "cookie": "1", "usernametype": "2", "spLogin": "newuser", "disable": "", "loginrecord": { 'email': [], 'phone': [] } } })
這是一個不純的json串,有些冗餘。errInfo結果爲0表示一切順利,data是一個json串,裏面惟一有用的信息是token。 這個get請求參數有許可能是多餘的,請求參數不一樣,返回的結果不一樣,能夠直接在瀏覽器地址欄中測試。去掉callback參數以後,最後變成:getapi&apiver=v3參數,返回的json串就變得十分周正了。
{ "errInfo": { "no": "0" }, "data": { "rememberedUserName": "", "codeString": "", "token": "dda98eb93a3011ca4165a01b342a4622", "cookie": "1", "usernametype": "", "spLogin": "newuser", "disable": "", "loginrecord": { 'email': [], 'phone': [] } } }
再去掉apiver=v3(apiversion=3)屬性,參數變爲:getapi&class=login時,返回值就變成了鍵值對的方式:
var bdPass=bdPass||{}; bdPass.api=bdPass.api||{}; bdPass.api.params=bdPass.api.params||{}; bdPass.api.params.login_token='dda98eb93a3011ca4165a01b342a4622'; bdPass.api.params.login_tpl='mn'; document.write('<script type="text/javascript" charset="UTF-8" src="https://passport.baidu.com/js/pass_api_login.js?v=20131115"></script>');
因此,version2須要getapi和class=longin兩個屬性,version3須要getapi和apiver=v3兩個屬性,其中沒有tpl=mn屬性登陸會失敗,雖然能夠返回json串,可是用於登陸時,必需要有tpl=mn屬性。 version2好像log4j得配置文件有沒有。如何解析出來呢?能夠用正則表達式,也能夠用json解析version3的返回結果,還能夠用屬性解析version2的返回結果。正則表達式效果應該最好。 對於這種問題有兩個原則(雖然矛盾,要尋找一個平衡): * 能刪的參數儘可能刪掉 * 沒有必要費精力試參數,直接全弄上,多了總不會錯(但可能格式會難看一些)
在瀏覽器中看到的表單數據項不少,其中有許可能是毫無用處的,沒有它們照樣登錄成功。通過刪了測,測了刪,發現只有以下表單有用:
{ "token": token, "tpl": "mn", "loginmerge": True, "username": username, "password": password }
再刪就要出錯了,tpl=mn這個屬性仍是必不可少。在這一步裏,用到了第二步獲取的token。 登陸百度,不須要設置瀏覽器頭部假裝成瀏覽器,也不須要假裝referer等頭部,直接get,get,post三步走就登錄成功了。cookie也不須要管,由於api本身處理了cookie。本次請求會自動帶上上次得到的cookie。 登陸百度首頁以後,就能夠訪問百度的各個部分了(包括貼吧,知道等)。 如何驗證有沒有登錄成功呢?有兩種方法: 0. 訪問www.baidu.com,看看頁面裏面有沒有本身的名字 1. 查看cookie裏面有沒有PTOKEN和STOKEN等關鍵cookie
百度返回值說明,no表示錯誤碼(0爲正常),errorcode也表示錯誤碼,error表示錯誤信息,data表示數據:
"no": 40, "err_code": 40, "error": null, "data"
import json from pip._vendor.requests.sessions import Session global username, password, token username = 'xxxxx' password = 'xxxxx' s = Session() # python2.x與python3.x差異很是大 # 過去使用urllib,urllib2,如今使用request包 def showCookie(cookies): for i in cookies: print(i) i.domain = '*' print('*' * 20) # 第一步,訪問百度,獲取cookie百度ID s.get("http://www.baidu.com") # 第二步,訪問密碼網頁,獲取token,此頁面返回一個json串。後面的參數不一樣返回的結果不一樣,抓包以後,嘗試着刪除了許多沒用的參數 resp = s.get("https://passport.baidu.com/v2/api/?getapi&tpl=mn&apiver=v3") # json.dumps能夠識別包含單引號的json串,json.loads卻不能 t = json.loads(resp.text.replace('\'', '\"')) token = t['data']['token'] # 第三步,提交表單。通過測試,只有下面五個數據是必需的 data = { "token": token, "tpl": "mn", "loginmerge": True, "username": username, "password": password } resp = s.post("https://passport.baidu.com/v2/api/?login", data)
用到第三方庫fastjson進行json解析,apache httpclient進行網絡請求。
static HttpClient client = HttpClients.createDefault(); static void login(String username, String password) throws ClientProtocolException, IOException { HttpGet homePage = new HttpGet("http://tieba.baidu.com"); client.execute(homePage); HttpGet getToken = new HttpGet( "https://passport.baidu.com/v2/api/?getapi&tpl=mn&apiver=v3&class=login"); HttpResponse resp = client.execute(getToken); String json = EntityUtils.toString(resp.getEntity()); JSONObject obj = JSON.parseObject(json); String token = obj.getJSONObject("data").getString("token"); HttpPost loginPost = new HttpPost( "https://passport.baidu.com/v2/api/?login"); List<NameValuePair> list = new ArrayList<>(); list.add(new BasicNameValuePair("token", token)); list.add(new BasicNameValuePair("username", username)); list.add(new BasicNameValuePair("password", password)); list.add(new BasicNameValuePair("tpl", "mn")); list.add(new BasicNameValuePair("loginmerge", "true")); UrlEncodedFormEntity loginData = new UrlEncodedFormEntity(list); loginPost.setEntity(loginData); client.execute(loginPost); }
終於登錄成功了,下一步就要發帖了。百度貼吧的數據類型十分重要,這個層次不分清楚就很差辦。
* 貼吧forum,forum是論壇的意思,貼吧是一個大話題,是一個比較大的類型。它下面是許多分支,是大話題的細化。好比:金庸吧包括不少thread,武功最高的人是誰?掃地僧和獨孤求敗誰厲害?....每個thread都會引起不少post(帖子),而每個帖子又會引來人們的評論。
* 話題thread,提出話題就至關於準備蓋個樓。
* 帖子post,每個post都爲thread添加了一層樓。
* 評論comment,每個post下面能夠有評論,這樣針對性才強,蓋樓是表達本身的觀點猶如重武器,長槍大戟發前人之所未發,評論就像小匕首短兵刃同樣更直接。
常見錯誤類型:
* 沒有tbs,返回230308,其中錯誤碼是308,230是前綴
* 265是錯誤碼,230是固定的前綴,這個大概是發帖太頻繁,禁止發帖
* 40是驗證碼:發帖太頻繁,須要驗證碼,若是能破解驗證碼,那天然是大大的好,下面就是返回的json串,str_reason表示「請點擊驗證碼完成發貼」。
{ "no": 40, "err_code": 40, "error": null, "data": { "autoMsg": "", "fid": 1847502, "fname": "\u5927\u5b66\u751f\u52b1\u5fd7", "tid": 0, "is_login": 1, "content": "", "vcode": { "need_vcode": 1, "str_reason": "\u8bf7\u70b9\u51fb\u9a8c\u8bc1\u7801\u5b8c\u6210\u53d1\u8d34", "captcha_vcode_str": "captchaservice303662633978724c7655642b44707538683879667741516b6c2f4262726d36777477486b356749525449362b39495a426642746d6d744d5178716236766c4a575650742f6f4b4b57534d576656385534766158757678644979672b56742f56776237523631766d6e33754a567a654d62767a7238646a6632703447653477673568695544454a2b6146695a6651525763705657396b6c45614a334d6a75375664425452684977702b306d3866306a346350365755634f763835614f72426d4c5478596e41587749525773372b38746c66443949764156423478776a644d37476a746a674b4374396348574636644d617a457043714d796d48644641676c466a55716e5841587162646465624e4b6171356a733041502f456c7649636f5879326177514c67473164636638482f76487a55", "captcha_code_type": 4, "userstatevcode": 0 }, "mute_text": null } }
tbs從http://tieba.baidu.com/dc/common/tbs得到,只須要get一下,解析出json傳中的tbs便可。tbs至關於貼吧通行證,你發的每個評論,蓋的每一層樓提交時都須要提交tbs,它們的tbs能夠相同,這個tbs是你最近一次得到的tbs,服務器上維持着一份hashmap記錄用戶id和tbs值。
許多屬性是不必的,一次成功以後刪繁就簡,刪了測測了刪,發現header是沒有用的,許多表單域也是無關緊要的。 表單屬性介紹以下:
* kw:thread名稱,也就是話題名稱
* tid: threadId,也就是話題id,在地址欄中就能夠看見tid。
* fid:一個thread好像fid都是同樣的,大概跟tid差很少吧,反正據我觀察,在一個話題下發了不少貼它是不變的。 打開一個話題主頁,好比:http://tieba.baidu.com/p/4195311174,ctrl+u查看源代碼,ctrl+f查找關鍵詞fid,很容易發現整頁上的fid都是如出一轍的。 百度用什麼作主鍵:用long作主鍵!百度也不用uuid,博客園也不用uuid,很明顯能夠從網頁上看出來。
static void newPost() throws ClientProtocolException, IOException { HttpPost post = new HttpPost( "http://tieba.baidu.com/f/commit/post/add"); HashMap<String, String> paramMap = new HashMap<String, String>(); paramMap.put("kw", "大學生勵志"); paramMap.put("fid", "1847502"); paramMap.put("tid", "4135933166"); paramMap.put("tbs", tbs); paramMap.put("content", "天下大勢爲我所控"); List<NameValuePair> list = new ArrayList<>(); for (Map.Entry<String, String> i : paramMap.entrySet()) { BasicNameValuePair pair = new BasicNameValuePair(i.getKey(), i.getValue()); System.out.println(pair.getName() + ":" + pair.getValue()); list.add(pair); } RequestConfig config = RequestConfig.custom().setSocketTimeout(5000) .setConnectTimeout(5000).build(); post.setConfig(config); post.setEntity(new UrlEncodedFormEntity(list, "utf-8")); HttpResponse resp = client.execute(post); System.out.println(EntityUtils.toString(resp.getEntity())); }
java確實很長,仍是看看python吧,簡短有力更適合描述問題。java並非老是冗長,它要想簡單也很容易設計出簡潔的API。
java的複雜來源於三個個方面:
* 庫設計的不合理,」每次只作一件事,每步只作一件事「的哲學有點像彙編,有些冗長,而且java不屑於設計語法糖
* 問題自己就很複雜,須要進行許多配置,更靈活,python雖短,可能有些事情無法辦,由於封裝地太嚴密了,留下的接口太少了。
* 庫設計的不合理+問題自己就複雜。這裏面有一個機率問題,複雜問題不經常使用,你卻讓人們用大量的時間去考慮它們,這就不如預先設計一種簡單不完善的接口。寧肯簡單的缺憾,也不要複雜的完善。舉一個例子,選中多行按下tab鍵以後是應該縮進仍是應該替換,固然是縮進了,若是我要替換我是不會這麼操做的,縮進帶來的簡捷性很是大。
resp = s.get("http://tieba.baidu.com/dc/common/tbs") tbs = json.loads(resp.text)['tbs'] data = { "kw": "大學生勵志", "fid": "1847502", # first post id "tid": "4135933166", # 貼吧id "tbs": tbs, # 很重要 "content": "如今下午兩點四十二" } resp = s.post("http://tieba.baidu.com/f/commit/post/add",data) print(resp.text) print("over")
static void newReply() throws ClientProtocolException, IOException { String data = "kw=大學生勵志&fid=1847502&tid=4135933166"e_id=78685072053&rich_text=1&tbs=" + tbs + "&content=五樓的也能夠評論floornum無論用嗎&lp_type=0&lp_sub_type=0&new_vcode=1&tag=11&repostid=78685072053&anonymous=0"; HttpPost post = new HttpPost( "http://tieba.baidu.com/f/commit/post/add"); List<NameValuePair> list = new ArrayList<>(); for (String i : data.split("&")) { int p = i.indexOf('='); BasicNameValuePair pair = new BasicNameValuePair(i.substring(0, p), i.substring(p + 1)); System.out.println(pair.getName() + ":" + pair.getValue()); list.add(pair); } post.setEntity(new UrlEncodedFormEntity(list, "utf-8")); HttpResponse resp = client.execute(post); System.out.println(EntityUtils.toString(resp.getEntity())); }
把上面的代碼串聯起來,是這樣子的:
String username = "xxxxxx"; ; String password = "xxxxxx"; login(username, password); tbs = getTbs(); newPost(); newReply();
關鍵是tbs只須要獲取一次,而後做爲全局變量存在就能夠了,不須要反覆獲取。
tieba.baidu.com/f/index/feedlist?tagid=all&limit=2000000&offset=0 這個連接十分重要。有些連接須要複製到地址欄才能訪問而不能直接跳轉過去,由於服務器可能不容許跨域訪問。 它的參數 tagid=like | all 表示請求的標籤列表的類型,like表示只返回我喜歡的,all表示返回所有。 limit表示條數,offset表示偏移量。它還有許多其餘參數,好比last_tid最後一條的時間(用於加載更多),&_表示 這個連接是怎麼知道的,訪問tieba.baidu.com加載更多就會向這個feedlist發出請求。 經過jsoup解析html就能夠獲得好多貼吧及它們的tid了,而後點進去就能夠得到fid了,有了tid和fid就能夠蓋樓了。
apache的httpClient組件包含多個部分,好比httpAsycClient是帶回調函數的請求服務器;fluent部分是流暢版的httpclient,寫起來簡直溜溜溜。不信請看:
Request.Get("http://somehost/") .connectTimeout(1000) .socketTimeout(1000) .execute().returnContent().asString(); // Execute a POST with the 'expect-continue' handshake, using HTTP/1.1, // containing a request body as String and return response content as byte array. Request.Post("http://somehost/do-stuff") .useExpectContinue() .version(HttpVersion.HTTP_1_1) .bodyString("Important stuff", ContentType.DEFAULT_TEXT) .execute().returnContent().asBytes(); // Execute a POST with a custom header through the proxy containing a request body // as an HTML form and save the result to the file Request.Post("http://somehost/some-form") .addHeader("X-Custom-header", "stuff") .viaProxy(new HttpHost("myproxy", 8080)) .bodyForm(Form.form().add("username", "vip").add("password", "secret").build()) .execute().saveContent(new File("result.dump"));
import json from pip._vendor.requests.sessions import Session global username, password, token username = 'xxxxx' password = 'xxxxx' s = Session() # python2.x與python3.x差異很是大 # 過去使用urllib,urllib2,如今使用request包 def showCookie(cookies): for i in cookies: print(i) i.domain = '*' print('*' * 20) # 第一步,訪問百度,獲取cookie百度UID s.get("http://www.baidu.com") # 第二步,訪問密碼網頁,獲取token,此頁面返回一個json串。後面的參數不一樣返回的結果不一樣,抓包以後,嘗試着刪除了許多沒用的參數 resp = s.get("https://passport.baidu.com/v2/api/?getapi&tpl=mn&apiver=v3") # 必須把單引號轉化成雙引號,不然沒法經過json進行解析,python3.x開始走向嚴格和規範了 t = json.loads(resp.text.replace('\'', '\"')) token = t['data']['token'] # 第三步,提交表單。通過測試,只有下面五個數據是必需的 data = { "token": token, "tpl": "mn", "loginmerge": True, "username": username, "password": password } resp = s.post("https://passport.baidu.com/v2/api/?login", data) resp = s.get("http://tieba.baidu.com/dc/common/tbs") tbs = json.loads(resp.text)['tbs'] data = { "kw": "大學生勵志", "fid": "1847502", # first post id "tid": "4135933166", # 貼吧id "tbs": tbs, # 很重要 "content": "如今下午兩點四十二" } resp = s.post("http://tieba.baidu.com/f/commit/post/add",data) print(resp.text) print("over")
public class Main { static HttpClient client = HttpClients.createDefault(); static String tbs; static String host = "Host: tieba.baidu.com"; static String useragent = "User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36"; static void login(String username, String password) throws ClientProtocolException, IOException { HttpGet homePage = new HttpGet("http://tieba.baidu.com"); client.execute(homePage); HttpGet getToken = new HttpGet( "https://passport.baidu.com/v2/api/?getapi&tpl=mn&apiver=v3&class=login"); HttpResponse resp = client.execute(getToken); String json = EntityUtils.toString(resp.getEntity()); JSONObject obj = JSON.parseObject(json); String token = obj.getJSONObject("data").getString("token"); HttpPost loginPost = new HttpPost( "https://passport.baidu.com/v2/api/?login"); List<NameValuePair> list = new ArrayList<>(); list.add(new BasicNameValuePair("token", token)); list.add(new BasicNameValuePair("username", username)); list.add(new BasicNameValuePair("password", password)); list.add(new BasicNameValuePair("tpl", "mn")); list.add(new BasicNameValuePair("loginmerge", "true")); UrlEncodedFormEntity loginData = new UrlEncodedFormEntity(list); loginPost.setEntity(loginData); client.execute(loginPost); } static String getTbs() throws ClientProtocolException, IOException { HttpGet get = new HttpGet("http://tieba.baidu.com/dc/common/tbs"); HttpResponse resp = client.execute(get); String s = EntityUtils.toString(resp.getEntity()); JSONObject json = JSON.parseObject(s); return json.getString("tbs"); } static void newReply() throws ClientProtocolException, IOException { String data = "kw=大學生勵志&fid=1847502&tid=4135933166"e_id=78685072053&rich_text=1&tbs=" + tbs + "&content=五樓的也能夠評論floornum無論用嗎&lp_type=0&lp_sub_type=0&new_vcode=1&tag=11&repostid=78685072053&anonymous=0"; HttpPost post = new HttpPost( "http://tieba.baidu.com/f/commit/post/add"); List<NameValuePair> list = new ArrayList<>(); for (String i : data.split("&")) { int p = i.indexOf('='); BasicNameValuePair pair = new BasicNameValuePair(i.substring(0, p), i.substring(p + 1)); System.out.println(pair.getName() + ":" + pair.getValue()); list.add(pair); } post.setEntity(new UrlEncodedFormEntity(list, "utf-8")); HttpResponse resp = client.execute(post); System.out.println(EntityUtils.toString(resp.getEntity())); } static void newPost() throws ClientProtocolException, IOException { HttpPost post = new HttpPost( "http://tieba.baidu.com/f/commit/post/add"); HashMap<String, String> paramMap = new HashMap<String, String>(); paramMap.put("kw", "大學生勵志"); paramMap.put("fid", "1847502"); paramMap.put("tid", "4135933166"); paramMap.put("tbs", tbs); paramMap.put("content", "魏印福"); List<NameValuePair> list = new ArrayList<>(); for (Map.Entry<String, String> i : paramMap.entrySet()) { BasicNameValuePair pair = new BasicNameValuePair(i.getKey(), i.getValue()); System.out.println(pair.getName() + ":" + pair.getValue()); list.add(pair); } RequestConfig config = RequestConfig.custom().setSocketTimeout(5000) .setConnectTimeout(5000).build(); post.setConfig(config); post.setEntity(new UrlEncodedFormEntity(list, "utf-8")); HttpResponse resp = client.execute(post); System.out.println(EntityUtils.toString(resp.getEntity())); } public static void main(String[] args) throws ClientProtocolException, IOException { String username = "xxxxxx"; String password = "xxxxxx"; login(username, password); tbs = getTbs(); newPost(); newReply(); } }
這代碼確實是寫得好
class HttpUtils { /** * map轉換成entity * * <a href="http://twitter.com/param">@param</a> map * 待處理的 * <a href="http://twitter.com/return">@return</a> 處理後的數據 */ public static HttpEntity mapToEntity(HashMap<String, String> map) throws Exception { BasicNameValuePair pair = null; List<BasicNameValuePair> params = new ArrayList<BasicNameValuePair>(); for (Map.Entry<String, String> m : map.entrySet()) { pair = new BasicNameValuePair(m.getKey(), m.getValue()); params.add(pair); } HttpEntity entity = new UrlEncodedFormEntity(params, "UTF-8"); return entity; } /** * 取文本之間的字符串 * * <a href="http://twitter.com/param">@param</a> string * 源字符串 * <a href="http://twitter.com/param">@param</a> start * 開始字符串 * <a href="http://twitter.com/param">@param</a> end * 結束字符串 * <a href="http://twitter.com/return">@return</a> 成功返回中間子串,失敗返回null */ public static String mid(String string, String start, String end) { int s = string.indexOf(start) + start.length(); int e = string.indexOf(end, s); if (s > 0 && e > s) return string.substring(s, e); return null; } /** * * <a href="http://twitter.com/param">@param</a> regex * 正則表達式 * <a href="http://twitter.com/param">@param</a> input * 待匹配的字符串 * <a href="http://twitter.com/return">@return</a> 返回的是匹配的list集合(可能因爲正則表達式的不一樣有多條記錄) */ public static ArrayList<String> myRegex(String regex, String input) { ArrayList<String> list = new ArrayList<String>(); Pattern p = Pattern.compile(regex); Matcher m = p.matcher(input); while (m.find()) { list.add(m.group()); } return list; } } public class Baidu { private CloseableHttpClient httpClient; // 模擬客戶端 private String postFid; // 發帖用的fid private String postName = ""; // 發帖指定的貼吧名 private CloseableHttpResponse response; // 存儲請求返回的信息 private String html; // 存儲返回的html頁面 private boolean isQL = false; // 標記是否在搶二樓 public boolean isQL() { return isQL; } public void setQL(boolean isQL) { this.isQL = isQL; if (isQL == true) System.out.println("開始搶二樓了。"); else System.out.println("關閉搶二樓了。"); } /** * 登陸 **/ public boolean login(String username, String password) { // 是否成功登錄的標記 boolean isLogin = false; httpClient = HttpClients.createDefault(); try { /** 1,BAIDUID **/ String baiduId = null; HttpGet get_main = new HttpGet( "http://tieba.baidu.com/dc/common/tbs/"); response = httpClient.execute(get_main); get_main.abort(); HeaderIterator it = response.headerIterator("Set-Cookie"); while (it.hasNext()) baiduId = it.next().toString(); baiduId = HttpUtils.mid(baiduId, ":", ";"); System.out.println("1,BAIDUID:" + baiduId); /** 2,token **/ HttpGet get_token = new HttpGet( "https://passport.baidu.com/v2/api/?getapi&tpl=mn"); response = httpClient.execute(get_token); String token = EntityUtils.toString(response.getEntity(), "utf-8"); get_token.abort(); token = HttpUtils.mid(token, "_token='", "'"); System.out.println("2,TOKEN:" + token); /** 3,Login **/ HashMap<String, String> map = new HashMap<String, String>(); map.put("username", username); map.put("password", password); map.put("token", token); map.put("isPhone", "false"); map.put("quick_user", "0"); map.put("tt", System.currentTimeMillis() + ""); map.put("loginmerge", "true"); map.put("logintype", "dialogLogin"); map.put("splogin", "rate"); map.put("mem_pass", "on"); map.put("tpl", "mn"); map.put("apiver", "v3"); map.put("u", "http://www.baidu.com/"); map.put("safeflg", "0"); map.put("ppui_logintime", "43661"); map.put("charset", "utf-8"); // 封裝 HttpEntity entity = HttpUtils.mapToEntity(map); HttpPost http_login = new HttpPost( "https://passport.baidu.com/v2/api/?login"); http_login.setEntity(entity); response = httpClient.execute(http_login); http_login.abort(); it = response.headerIterator(); while (it.hasNext()) { // 這裏是根據是否寫入的BDUSS-cookie判斷是否登陸成功 if (it.next().toString().contains("BDUSS")) { isLogin = true; break; } } System.out.println("3,登陸狀態" + isLogin); return isLogin; } catch (Exception e) { throw new RuntimeException("未知錯誤"); } } /** * 發佈帖子 * * <a href="http://twitter.com/throws">@throws</a> Exception */ public String writeTiebaItem(String tiebaName, String title, String content) throws Exception { String tbs = null; HashMap<String, String> paramMap = new HashMap<String, String>(); String nowTime = System.currentTimeMillis() + ""; // 判斷是不是第一次在這個吧發帖,若是不是就獲取fid,反之沒必要,由於fid是固定不變的 if (!postName.equals(tiebaName)) { postFid = getFid(tiebaName); postName = tiebaName; } System.out.println("fid:" + postFid); if (postFid == null) { System.err.println("未知錯誤"); return "未知錯誤"; } /** 拿到tbs */ tbs = getTbs(); System.out.println("tbs:" + tbs); paramMap.put("ie", "utf-8"); paramMap.put("kw", postName); paramMap.put("fid", postFid); paramMap.put("tid", "0"); paramMap.put("vcode_md5", ""); paramMap.put("floor_num", "0"); paramMap.put("rich_text", "1"); paramMap.put("tbs", tbs); paramMap.put("content", content); paramMap.put("title", title); paramMap.put("prefix", ""); paramMap.put("files", URLEncoder.encode("[]", "utf-8")); paramMap.put("sign_id", "24179251"); paramMap.put("mouse_pwd", "45,46,39,51,46,39,45,42,22,46,51,47,51,46,51,47,51,46,51,47,51,46," + "51,47,51,46,51,47,22,46,41,39,38,47,22,46,44,41,41,51,40,41,39," + nowTime + "0"); paramMap.put("mouse_pwd_t", nowTime); paramMap.put("mouse_pwd_isclick", "0"); paramMap.put("__type__", "thread"); HttpEntity entity = HttpUtils.mapToEntity(paramMap); HttpPost post = new HttpPost( "http://tieba.baidu.com/f/commit/thread/add"); post.setEntity(entity); response = httpClient.execute(post); html = EntityUtils.toString(response.getEntity()); if (html.contains("\"no\":0,\"err_code\":0")) { return "在" + tiebaName + "吧發帖成功"; } else { return "發帖失敗了,錯誤碼信息:" + html; } } private String getFid(String tiebaName) throws Exception { HttpResponse response = null; String fid = null; ArrayList<String> urllist = getTieziUrl(tiebaName); if (urllist.size() == 0) { return null; } // 隨便進個帖子 拿到 fid HttpGet get = new HttpGet(urllist.get(1)); response = httpClient.execute(get); html = EntityUtils.toString(response.getEntity()); fid = HttpUtils.myRegex("fid(=|:')[0-9].+?(&|',)", html).get(0); if (fid.contains("=")) { fid = HttpUtils.mid(fid, "=", "&"); } if (fid.contains(":")) { fid = HttpUtils.mid(fid, ":'", "',"); } return fid; } private ArrayList<String> get0Answer(String tiebaName) throws Exception { HttpResponse response = null; ArrayList<String> urlList = new ArrayList<String>(); ArrayList<String> topTidList; String tiebaUrl = "http://tieba.baidu.com/f?ie=utf-8&kw=" + URLEncoder.encode(tiebaName, "UTF-8"); HttpGet get = new HttpGet(tiebaUrl); response = httpClient.execute(get); html = EntityUtils.toString(response.getEntity()); if (html.contains("抱歉,根據相關法律法規和政策,本吧暫不開放")) { return urlList; } Document doc = Jsoup.parse(html); // 獲得首頁除置頂帖以外的全部帖子 Elements els = doc.select("li[class= j_thread_list clearfix]"); for (Element e : els) { String str = e.text().toString(); // System.out.println(str); // 如過開頭是0表明0個回覆。 str的內容是: 0 測試 陌生人左右丶 00:36 if (str.startsWith("0")) { Elements els1 = e.getElementsByTag("a"); for (Element e1 : els1) { String url = e1.attr("href"); topTidList = HttpUtils.myRegex("/p/[0-9]{2,12}", url); for (int i = 0; i < topTidList.size(); i++) { url = topTidList.get(i); urlList.add("http://tieba.baidu.com" + url); } } } } return urlList; } /** * <a href="http://twitter.com/param">@param</a> tiebaName * 要獲取url的貼吧名稱 * <a href="http://twitter.com/return">@return</a> 返回指定貼吧的首頁帖子url集合 * <a href="http://twitter.com/throws">@throws</a> IOException */ private ArrayList<String> getTieziUrl(String tiebaName) throws Exception { HttpResponse response = null; ArrayList<String> urlList = new ArrayList<String>(); ArrayList<String> topTidList; try { String tiebaUrl = "http://tieba.baidu.com/f?ie=utf-8&kw=" + URLEncoder.encode(tiebaName, "UTF-8"); HttpGet get = new HttpGet(tiebaUrl); response = httpClient.execute(get); html = EntityUtils.toString(response.getEntity()); if (html.contains("抱歉,根據相關法律法規和政策,本吧暫不開放")) { return urlList; } Document doc = Jsoup.parse(html); Elements els = doc.select("li[class= j_thread_list clearfix]"); for (Element e : els) { Elements els1 = e.getElementsByTag("a"); for (Element e1 : els1) { // 首先拿到指定貼吧的 首頁的和全部帖子連接" "/p/2777392166"而後拼接成完整的url String url = e1.attr("href"); topTidList = HttpUtils.myRegex("/p/[0-9]{2,12}", url); for (int i = 0; i < topTidList.size(); i++) { url = topTidList.get(i); urlList.add("http://tieba.baidu.com" + url); } } } } catch (UnsupportedEncodingException e) { e.printStackTrace(); } return urlList; } /** 拿到tbs (下面是一個獲取tbs的api) */ private String getTbs() throws Exception { HttpGet get = new HttpGet("http://tieba.baidu.com/dc/common/tbs"); response = httpClient.execute(get); html = EntityUtils.toString(response.getEntity()); return HttpUtils.mid(html, ":\"", "\","); } /** 回覆帖 */ public String replyPost(String tid, String content, String tiebaName) throws Exception { /** 暫時還沒想到辦法來獲取floor_num **/ String floor_num = "1"; if (!postName.equals(tiebaName)) { postFid = getFid(tiebaName); postName = tiebaName; } System.out.println("fid:" + postFid); if (postFid == null) { System.err.println("未知錯誤"); return "未知錯誤"; } String tbs = getTbs(); String nowTime = System.currentTimeMillis() + ""; // 構造map集合形式的回帖表單 HashMap<String, String> paramMap = new HashMap<String, String>(); paramMap.put("ie", "utf-8"); paramMap.put("kw", postName); paramMap.put("fid", postFid); paramMap.put("tid", tid); paramMap.put("vcode_md5", ""); paramMap.put("floor_num", floor_num); paramMap.put("rich_text", "1"); paramMap.put("tbs", tbs); paramMap.put("content", content); paramMap.put("files", "[]"); paramMap.put("mouse_pwd", "45,46,39,51,46,39,45,42,22,46,51,47,51,46,51,47,51,46,51,47,51,46," + "51,47,51,46,51,47,22,46,41,39,38,47,22,46,44,41,41,51,40,41,39," + nowTime + "0"); paramMap.put("mouse_pwd_t", nowTime); paramMap.put("mouse_pwd_isclick", "0"); paramMap.put("__type__", "reply"); HttpEntity entity = HttpUtils.mapToEntity(paramMap); HttpPost post = new HttpPost( "http://tieba.baidu.com/f/commit/post/add"); // 設置回帖延遲,否則會被百度斷定發帖過快 RequestConfig config = RequestConfig.custom().setSocketTimeout(5000) .setConnectTimeout(5000).build(); post.setConfig(config); post.setEntity(entity); response = httpClient.execute(post); html = EntityUtils.toString(response.getEntity()); System.out.println(html); if (html.contains("\"no\":0,\"err_code\":0")) { return "在" + tiebaName + "吧成功搶到一個二樓"; } else { return "回帖失敗了,錯誤碼信息:" + html; } } /** 搶二樓 **/ public void TakeTheSecondFloor(final String tiebaName, final String contents[], final int time) { final int len = contents.length; new Thread(new Runnable() { <a href="http://twitter.com/Override">@Override</a> public void run() { while (isQL) { try { Random random = new Random(); int index = random.nextInt(len); String tid; ArrayList<String> linksList = get0Answer(tiebaName); for (int i = 0; i < linksList.size() && linksList.size() != 0; i++) { tid = linksList.get(i).substring(25); String message = replyPost(tid, contents[index], tiebaName); System.out.println(message); } Thread.sleep(time); } catch (Exception e) { e.printStackTrace(); } } } }).start(); } }