此次主要記錄蛋疼的爬取某網站信息的一天,以前用node.js模擬登陸,不太好用,模擬登陸後是準備使用到android客戶端中,因此用java,期間遇到不少問題,大概記錄以下:html
(1),用httpclient,get訪問網頁不成功。java
(2),get請求訪問學校官方網站是能成功的,可是不能訪問交管局的網站,(由於他的網站是https的,因此須要進行SSL處理)。node
(3),能夠訪問首頁之後,這裏須要使用fiddler模擬登陸,查看登錄時候的URl時候提交的用戶名密碼android
(4),須要下載驗證碼,把驗證碼經過命令行輸進去,而後放到模擬登陸的參數鏈表中去。apache
(5),登陸成功後,由於httpclient自身有管理session的功能,你須要查找那個信息就經過對應的get活着post訪問對應的URl就能夠了json
1,先認識httpclient,瞭解到它是java程序模擬訪問網站,首先咱們用httpclient訪問一個網站,就從學校NCHU網站首頁開始吧。安全
首先引入httpclient-4.3.6.jar和httpcore-4.4.4.jar,而後寫程序服務器
public static void main(String[] args) throws ClientProtocolException, IOException { String url = "http://www.nchu.edu.cn/"; HttpClient httpclient = new DefaultHttpClient(); StringBuffer result = null; HttpResponse response = null; HttpGet request = new HttpGet(url); response = httpclient.execute(request); BufferedReader rd = null; rd = new BufferedReader(new InputStreamReader(response.getEntity() .getContent(), "UTF-8")); result = new StringBuffer(); String line = ""; while ((line = rd.readLine()) != null) { result.append(line + "\n"); } rd.close(); System.out.println(result); }
運行本覺得能夠打印出網頁的html的,可是發成了錯誤cookie
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory at org.apache.http.impl.client.CloseableHttpClient.<init>(CloseableHttpClient.java:60) at org.apache.http.impl.client.AbstractHttpClient.<init>(AbstractHttpClient.java:271) at org.apache.http.impl.client.DefaultHttpClient.<init>(DefaultHttpClient.java:146) at getTest.main(getTest.java:22)
因而去百度,發現還須要加一個common.logging.jar,因而我引入它,而後就獲取網頁成功了session
2,好的,如今第一步訪問學校官網成功了,可是如今試試https://jx.122.gov.cn/發現它報錯了。說什麼安全證書錯誤SSL,而後就去百度唄。發現了這篇博客解決了問題,好了交管局的首頁也能夠進來了。(http://blog.csdn.net/rongyongfeikai2/article/details/41659353/)(當我寫博客的時候,我發現http://jx.122.gov.cn/也是能成功的,臥槽,第一次怎麼沒發現,一個勁解決https問題,然後乾脆把他們都改爲http的)
Exception in thread "main" javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
3:獲取首頁成功了,若是在網頁上操做,我就就須要輸入用戶名,密碼,驗證碼 讓後點擊登陸按鈕,發送post請求登陸成功,我沒先用fiddler看一下咱們登陸的過程,查看提交的參數。驗證碼也就是一個圖片,咱們能夠把它下載到本地,而後手動錄入進去,
爲了方便處理,咱們把get post 還有獲取驗證碼都放到工具類裏面。當咱們訪問首頁的時候,服務器就分配一個session,用於客戶端和服務器交互時候作記錄。確實是這個用戶。(我也不太懂)
import java.io.BufferedInputStream; import java.io.BufferedOutputStream; import java.io.BufferedReader; import java.io.File; import java.io.FileOutputStream; import java.io.IOException; import java.io.InputStream; import java.io.InputStreamReader; import java.io.OutputStream; import java.util.ArrayList; import java.util.List; import java.util.Map; import java.util.Map.Entry; import java.util.zip.ZipEntry; import java.util.zip.ZipInputStream; import javax.swing.text.html.HTMLDocument.Iterator; import org.apache.http.Header; import org.apache.http.HttpEntity; import org.apache.http.HttpResponse; import org.apache.http.NameValuePair; import org.apache.http.client.ClientProtocolException; import org.apache.http.client.HttpClient; import org.apache.http.client.entity.GzipDecompressingEntity; import org.apache.http.client.entity.UrlEncodedFormEntity; import org.apache.http.client.methods.CloseableHttpResponse; import org.apache.http.client.methods.HttpGet; import org.apache.http.client.methods.HttpPost; import org.apache.http.impl.client.CloseableHttpClient; import org.apache.http.impl.client.DefaultHttpClient; import org.apache.http.impl.client.HttpClients; import org.apache.http.impl.conn.PoolingHttpClientConnectionManager; import org.apache.http.message.BasicNameValuePair; import org.apache.http.util.EntityUtils; public class HttpUtil { public static HttpClient httpclient = null; private static String cookies; static { // PoolingHttpClientConnectionManager connManager = new PoolingHttpClientConnectionManager(); // // connManager.setMaxTotal(100); // connManager.setDefaultMaxPerRoute(50); try { httpclient = new DefaultHttpClient(); } catch (Exception e) { // TODO Auto-generated catch block e.printStackTrace(); } } /** * get請求 */ public static String GetPageContent(String url, String cookie) throws ClientProtocolException, IOException{ StringBuffer result = null; HttpResponse response = null; HttpGet request = new HttpGet(url); System.out.println(cookies); request.setHeader("Cookie", cookie); response = httpclient.execute(request); BufferedReader rd = null; rd = new BufferedReader(new InputStreamReader(response.getEntity().getContent(),"UTF-8")); result = new StringBuffer(); String line = ""; while ((line = rd.readLine()) != null) { result.append(line+"\n"); } rd.close(); // set cookies setCookies( response.getFirstHeader("Set-Cookie") == null ? "" : response.getFirstHeader("Set-Cookie").toString()); // response.close(); // System.out.println(response.getFirstHeader("Set-Cookie")); return result.toString(); } //獲取驗證碼 public static void GetPhotoContent(String url, String cookie) throws ClientProtocolException, IOException{ HttpResponse response = null; HttpGet request = new HttpGet(url); System.out.println(cookies); request.setHeader("Cookie", cookie); response = httpclient.execute(request); // entity = response.getEntity(); // InputStream inputStream = new GzipDecompressingEntity(response.getEntity()).getContent(); // write the inputStream to a FileOutputStream OutputStream out = new FileOutputStream(new File("c:\\newfile2.png")); response.getEntity().writeTo(out); // int read = 0; // byte[] bytes = new byte[1024]; // // while ((read = inputStream.read(bytes)) != -1) { // out.write(bytes, 0, read); // } // inputStream.close(); out.flush(); out.close(); System.out.println("Check file c:\\newfile2.png"); // set cookies setCookies( response.getFirstHeader("Set-Cookie") == null ? "" : response.getFirstHeader("Set-Cookie").toString()); // response.close(); // System.out.println(response.getFirstHeader("Set-Cookie")); } //post請求提交參數 @SuppressWarnings("unchecked") public static String postWithParameters(Map<String,String> map, String postUrl, String cookie) throws IOException { StringBuffer result = null; HttpResponse response = null; HttpPost httpost = new HttpPost(postUrl); httpost.setHeader("Cookie", cookie); //參數列表 List<NameValuePair> list = new ArrayList<NameValuePair>(); java.util.Iterator<Entry<String, String>> iterator = map.entrySet().iterator(); while(iterator.hasNext()){ Entry<String,String> elem = (Entry<String, String>) iterator.next(); list.add(new BasicNameValuePair(elem.getKey(),elem.getValue())); } if(list.size() > 0){ UrlEncodedFormEntity entity = new UrlEncodedFormEntity(list,"UTF-8"); httpost.setEntity(entity); } //httpost.setEntity(new UrlEncodedFormEntity(map, "UTF-8")); response = httpclient.execute(httpost); Header[] myheader=response.getAllHeaders(); for(int i=0;i<myheader.length;i++){ System.out.println(myheader[i]); } BufferedReader rd = null; rd = new BufferedReader(new InputStreamReader(response.getEntity().getContent(),"UTF-8")); result = new StringBuffer(); String line = ""; while ((line = rd.readLine()) != null) { result.append(line+"\n"); System.out.println(line); } System.out.println(result); setCookies(response.getFirstHeader("Set-Cookie") == null ? "" : response.getFirstHeader("Set-Cookie").toString()); rd.close(); return result.toString(); } /** 設置cookie */ public static String getCookies() { return cookies; } public static void setCookies(String cookies) { HttpUtil.cookies = cookies; } }
工具類搞好了,獲取驗證碼的URL後面須要拼接一個時間戳,能夠用一個隨機變量。而後獲取驗證碼下載到本地。還有post注意post請求的參數問題。
而後如今寫一個測試類:
public class getTest { /** * @param args * @throws IOException * @throws ClientProtocolException */ public static void main(String[] args) throws ClientProtocolException, IOException { //主頁面 String mainurl = "http://jx.122.gov.cn/"; //獲取驗證碼頁面 String yanzhengmaurl="http://jx.122.gov.cn/captcha1?nocache="+new Random().nextInt(1000); //post 用戶名密碼頁面 String postloginurl="http://jx.122.gov.cn/user/m/login"; //登陸成功後的頁面 String loginsucess="http://jx.122.gov.cn/views/member"; String mainurlStr=HttpUtil.GetPageContent(mainurl, ""); //System.out.println(mainurlStr); //獲取圖片驗證碼 HttpUtil.GetPhotoContent(yanzhengmaurl,""); System.out.print("輸入驗證碼:"); Scanner scan = new Scanner(System.in); String read = scan.nextLine(); System.out.println("輸入數據:"+read); //post登陸 Map<String,String> createMap = new HashMap<String,String>(); createMap.put("usertype","1"); createMap.put("systemid","main"); //用戶名; createMap.put("username",""); //密碼 createMap.put("password",""); createMap.put("captcha",read); String mypostresult=HttpUtil.postWithParameters(createMap, postloginurl, ""); // System.out.println(mypostresult); String loginsucessstr=HttpUtil.GetPageContent(loginsucess, ""); System.out.println(loginsucessstr); } }
而後輸入驗證碼:打印出結果:
在c盤查看驗證碼。錄入驗證碼。
查看登陸成功頁面;
加入咱們如今想得到咱們本身的歷史記錄信息。咱們post //違法歷史記錄url
String breakLowHistory="https://jx.122.gov.cn/user/m/uservio/vehssuris";就能夠了,查看json數據結果,fidder看到的結果是同樣的,