java HttpClient+Jsoup打造灌水利器不再怕起火了

不知道多久之前就有過寫個自動回帖的小軟件一直沒有實現,最近閒下來了遂研究了下,本人小菜對於HTTP協議只知其一;不知其二隻能在請教google大神了,把個人想法跟google大神說了以後,google大神說這小子不錯,這是爲防火事業作貢獻啊!特賜予小弟如下神器:php

一、HttpClient 4.3.1 (GA)

如下列出的是 HttpClient 提供的主要的功能,要知道更多詳細的功能能夠參見 HttpClient 的主頁。html

  • 實現了全部 HTTP 的方法(GET,POST,PUT,HEAD 等)
  • 支持自動轉向
  • 支持 HTTPS 協議
  • 支持代理服務器等

二、Jsoup

jsoup 的主要功能以下java

  • 從一個 URL,文件或字符串中解析 HTML
  • 使用 DOM 或 CSS 選擇器來查找、取出數據
  • 可操做 HTML 元素、屬性、文本
  • 使用與jquery幾乎同樣的語法

廢話很少說直接進入正題,在HTTPClient源碼包內包含example文件夾此文件夾內包含一些基本用法這些例子入門足夠了找到ClientFormLogin.java具體解釋註釋已經很清楚了大體意思就是模擬HTTP請求存儲cookies。jquery

測試網站:http://bbs.dakele.com/web

由於此網站對登陸作了特殊處理因此與標準的DZ論壇可能會有些出入請自行修改ajax

對網站的分析使用的chrome自帶的審查元素,這個折騰了很多時間chrome

登陸地址:http://passport.dakele.com/login.do?product=bbsjson

輸入錯誤的用戶名和密碼會發現實際登陸地址爲http://passport.dakele.com/logon.do 注意【i/n的區別剛開始沒注意覺得見鬼了】服務器

返回錯誤信息cookie

{"err_msg":"賬號或密碼錯誤"}

輸入正確信息返回

直接輸入rediret鏈接和正常登陸

獲取跳轉連接:

private LoginResult getRedirectUrl(){
        LoginResult loginResult = null;
        CloseableHttpClient httpClient = HttpClients.createDefault();
        HttpPost httpost = new HttpPost(LOGINURL);
        httpost.setHeader("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8");
        httpost.setHeader("Accept-Language", "zh-CN,zh;q=0.8");
        httpost.setHeader("Cache-Control", "max-age=0");
        httpost.setHeader("Connection", "keep-alive");
        httpost.setHeader("Host", "passport.dakele.com");
        httpost.setHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.101 Safari/537.36");
        List <NameValuePair> nvps = new ArrayList <NameValuePair>();
        nvps.add(new BasicNameValuePair("product", "bbs"));
        nvps.add(new BasicNameValuePair("surl", "http://bbs.dakele.com/"));
        nvps.add(new BasicNameValuePair("username", "yourname"));//用戶名
        nvps.add(new BasicNameValuePair("password", "yourpass"));//密碼
        nvps.add(new BasicNameValuePair("remember", "0"));

        httpost.setEntity(new UrlEncodedFormEntity(nvps, Consts.UTF_8));
        CloseableHttpResponse response2 = null;
        try {
            response2 = httpClient.execute(httpost);
            if(response2.getStatusLine().getStatusCode()==200){
                HttpEntity entity = response2.getEntity();
                String entityString = EntityUtils.toString(entity);
                JSONArray jsonArray = JSONArray.fromObject("["+entityString+"]");
                JsonConfig jsonConfig=new JsonConfig();
                jsonConfig.setArrayMode(JsonConfig.MODE_OBJECT_ARRAY);
                jsonConfig.setRootClass(LoginResult.class);
                LoginResult[] results= (LoginResult[]) JSONSerializer.toJava( jsonArray, jsonConfig );
                if(results.length==1){
                    loginResult = results[0];
                }
            }
        } catch (ClientProtocolException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }finally{
            try {
                response2.close();
                httpClient.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
        return loginResult;
    }

登陸代碼:

public boolean login(){
        boolean flag = false;
        LoginResult loginResult = getRedirectUrl();
        if(loginResult.getResult().equals("true")){
            cookieStore = new BasicCookieStore();
            globalClient = HttpClients.custom().setDefaultCookieStore(cookieStore).build();
            HttpGet httpGet = new HttpGet(loginResult.getRedirect());
            httpGet.setHeader("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8");
            httpGet.setHeader("Accept-Language", "zh-CN,zh;q=0.8");
            httpGet.setHeader("Connection", "keep-alive");
            httpGet.setHeader("Host", HOST);
            httpGet.setHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.101 Safari/537.36");
           try {
            globalClient.execute(httpGet);
        } catch (ClientProtocolException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
            List<Cookie> cookies2 = cookieStore.getCookies();
            if (cookies2.isEmpty()) {
                log.error("cookie is empty");
            } else {
                for (int i = 0; i < cookies2.size(); i++) {
                    
                }
            }
        }
        
        return flag;
    }

到此已經登陸成功能夠進行只有登陸號才能作的事了,什麼?你不知道固然是滅火了

首先取得須要回覆的帖子地址,列表頁比較有規律全部沒有寫自動發現的因此寫了個循環@1

for(int i=1;i<200;i++){
            String basurl="http://bbs.dakele.com/forum-43-"+i+".html";
            log.info(basurl);
            List<String> urls = dakele.getThreadURLs(basurl);
            for(String url:urls){
                //log.info(url);
                ReplayContent content = dakele.preReplay(url);
                if(content!=null){
                    log.info(content.getUrl());
                    log.info(content.getMessage());
                    //dakele.replay( content);
                    //Thread.sleep(15300);
                }
            }
        }

在列表頁內獲取帖子地址:

String html = EntityUtils.toString(entity);
            Document document = Jsoup.parse(html,HOST);
            Elements elements=document.select("tbody[id^=normalthread_] > tr > td.new > a.xst");
            for(int i=0;i<elements.size();i++){
                Element e = elements.get(i);
                urList.add(e.attr("abs:href"));
            }

在須要回覆的帖子內得到須要提交的form表單地址以及構造回覆內容

public ReplayContent preReplay(String url){
        ReplayContent content = null;
        HttpGet get  = new HttpGet(url);
        get.setHeader("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8");
        get.setHeader("Accept-Language", "zh-CN,zh;q=0.8");
        get.setHeader("Connection", "keep-alive");
        get.setHeader("Host", HOST);
        get.setHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.101 Safari/537.36");
        try {
            CloseableHttpResponse response = globalClient.execute(get);
            HttpEntity entity = response.getEntity();
            String html = EntityUtils.toString(entity);
            Document document = Jsoup.parse(html, HOST);
            Element postForm = document.getElementById("fastpostform");
            if(!postForm.toString().contains("您如今無權發帖")){
                content = new ReplayContent();
                content.setUrl(url);
                
                log.debug(postForm.attr("abs:action"));
                content.setAction(postForm.attr("abs:action"));
                
                
                ////////
                Elements teElements = document.select("td[id^=postmessage_]");
                String message = "";
                for(int i=0;i<teElements.size();i++){
                    String temp = teElements.get(i).html().replaceAll( "(?is)<.*?>", "");
                    if(temp.contains("發表於")){
                        String[] me = temp.split("\\s+");
                        temp = me[me.length-1];
                    }
                    message+=temp.replaceAll("\\s+", "");
                }
                log.debug(message.replaceAll("\\s+", ""));
                ///////////////
                /*取最後一條評論
                Element messageElement= document.select("td[id^=postmessage_]").last();
//                String message = messageElement.html().replaceAll("\\&[a-zA-Z]{1,10};", "").replaceAll("<[^>]*>", "").replaceAll("[(/>)<]", "");
                String message = messageElement.html().replaceAll( "(?is)<.*?>", "");
                */
                if(message.contains("發表於")){
                    String[] me = message.split("\\s+");
                    message = me[me.length-1];
                }
                content.setMessage(message.replaceAll("&nbsp;", "").replaceAll("上傳", "").replaceAll("附件", "").replaceAll("下載", ""));
                Elements inputs = postForm.getElementsByTag("input");
                for(Element input:inputs){
                    log.debug(input.attr("name")+":"+input.attr("value"));
                    if(input.attr("name").equals("posttime")){
                        content.setPosttime(input.attr("value"));
                    }else if(input.attr("name").equals("formhash")){
                        content.setFormhash(input.attr("value"));
                    }else if(input.attr("name").equals("usesig")){
                        content.setUsesig(input.attr("value"));
                    }else if(input.attr("name").equals("subject")){
                        content.setSubject(input.attr("value"));
                    }
                }
            }else{
                log.warn("您如今無權發帖:"+url);
            }
        } catch (ClientProtocolException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
        return content;
    }

地址有了,內容有了接下來開始放水了

public void replay(ReplayContent content){
        
        HttpPost httpost = new HttpPost(content.getAction());
        httpost.setHeader("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8");
        httpost.setHeader("Accept-Language", "zh-CN,zh;q=0.8");
        httpost.setHeader("Cache-Control", "max-age=0");
        httpost.setHeader("Connection", "keep-alive");
        httpost.setHeader("Host", HOST);
        httpost.setHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.101 Safari/537.36");
        List <NameValuePair> nvps = new ArrayList <NameValuePair>();
        nvps.add(new BasicNameValuePair("posttime", content.getPosttime()));
        nvps.add(new BasicNameValuePair("formhash", content.getFormhash()));
        nvps.add(new BasicNameValuePair("usesig", content.getUsesig()));
        nvps.add(new BasicNameValuePair("subject", content.getSubject()));
        nvps.add(new BasicNameValuePair("message", content.getMessage()));

        httpost.setEntity(new UrlEncodedFormEntity(nvps, Consts.UTF_8));
        //HTTP 三次握手 必須處理響應剛開始沒注意卡在這了
        CloseableHttpResponse response2 = null;
       
        try {
            response2 = globalClient.execute(httpost);
            //log.info(content.getAction());
            //log.info(content.getMessage());
            HttpEntity entity = response2.getEntity();
            EntityUtils.consume(entity);
//            BufferedWriter bw= new BufferedWriter(new FileWriter("d:/tt1.html"));
//            bw.write(EntityUtils.toString(response2.getEntity()));
//            bw.flush();
//            bw.close();
            //System.out.println(EntityUtils.toString(response2.getEntity()));
        } catch (ClientProtocolException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
        
    }

固然這隻適用於沒有驗證碼的論壇對於有驗證碼的只能繞道了,

灌水有害,通過一番轟炸這就是結果QQ截圖20140109224028

對於回覆內容剛開始只取了當前帖子內最後一條評論而後進行回覆,被警告!而後使用IK分詞獲取關鍵字代碼是貼來的請移步

參考鏈接:

缺點:沒有使用多線程、沒有進行充分測試

代碼整理中儘快提供

後期計劃:加入簽到、作任務功能、把@1循環改成自動發現

小弟第一次發帖其中有不足之處望批評指正

------------------------------------------

下載地址http://pan.baidu.com/s/1jGjwA5g

早上把代碼整理了下,如今分享給你們,直接對Myeclipse工程進行的打包解壓後可直接導入

修改IKFenci.java 內用戶名和密碼可直接運行

相關文章
相關標籤/搜索