HTTP 協議多是如今 Internet 上使用得最多、最重要的協議了,愈來愈多的 Java 應用程序須要直接經過 HTTP 協議來訪問網絡資源。雖然在 JDK 的 java net包中已經提供了訪問 HTTP 協議的基本功能,可是對於大部分應用程序來講,JDK 庫自己提供的功能還不夠豐富和靈活。HttpClient 是 Apache Jakarta Common 下的子項目,用來提供高效的、最新的、功能豐富的支持 HTTP 協議的客戶端編程工具包,而且它支持 HTTP 協議最新的版本和建議。HttpClient 已經應用在不少的項目中,好比 Apache Jakarta 上很著名的另外兩個開源項目 Cactus 和 HTMLUnit 都使用了 HttpClient。如今HttpClient最新版本爲 HttpClient 4.3.4(2014-06-22).html
-----引自百度百科java
簡單的說,HttpClient就是一個Apache的一個對於Http封裝的一個jar包.git
下面將介紹使用GET/POST請求,登陸中國聯通網站並抓取用戶的基本信息和帳單數據.github
我這裏的環境是jdk1.7+Intelij idea 13.0+ubuntu12.04+maven+HttpClient 4.3.4 .下面首先建一個maven項目:apache
如圖所示,選擇quickstart編程
而後next下去便可.json
建好項目後,以下圖所示:ubuntu
雙擊pom.xml文件並添加所須要的jar包:cookie
<dependency> <groupId>org.apache.httpcomponents</groupId> <artifactId>httpclient</artifactId> <version>4.3.4</version> </dependency>
maven會自動將須要的其它jar包下載好,實際上所須要的jar包以下圖所示:網絡
中國聯通有兩種登陸方式:
上面兩圖的區別一個是帶驗證碼,一個是不帶驗證碼,下面將先解決不帶驗證碼的登陸.
package com.amos; import org.apache.http.Header; import org.apache.http.HttpEntity; import org.apache.http.HttpResponse; import org.apache.http.client.HttpClient; import org.apache.http.client.methods.HttpGet; import org.apache.http.client.methods.HttpPost; import org.apache.http.impl.client.DefaultHttpClient; import org.apache.http.util.EntityUtils; import java.io.File; import java.io.FileOutputStream; import java.io.InputStream; /** * @author amosli * 登陸並抓取中國聯通數據 */ public class LoginChinaUnicom { /** * @param args * @throws Exception */ public static void main(String[] args) throws Exception { String name = "中國聯通手機號碼"; String pwd = "手機服務密碼"; String url = "https://uac.10010.com/portal/Service/MallLogin?callback=jQuery17202691898950318097_1403425938090&redirectURL=http%3A%2F%2Fwww.10010.com&userName=" + name + "&password=" + pwd + "&pwdType=01&productType=01&redirectType=01&rememberMe=1"; HttpClient httpClient = new DefaultHttpClient(); HttpGet httpGet = new HttpGet(url); HttpResponse loginResponse = httpClient.execute(httpGet); if (loginResponse.getStatusLine().getStatusCode() == 200) { for (Header head : loginResponse.getAllHeaders()) { System.out.println(head); } HttpEntity loginEntity = loginResponse.getEntity(); String loginEntityContent = EntityUtils.toString(loginEntity); System.out.println("登陸狀態:" + loginEntityContent); //若是登陸成功 if (loginEntityContent.contains("resultCode:\"0000\"")) { //月份 String months[] = new String[]{"201401", "201402", "201403", "201404", "201405"}; for (String month : months) { String billurl = "http://iservice.10010.com/ehallService/static/historyBiil/execute/YH102010002/QUERY_YH102010002.processData/QueryYH102010002_Data/" + month + "/undefined"; HttpPost httpPost = new HttpPost(billurl); HttpResponse billresponse = httpClient.execute(httpPost); if (billresponse.getStatusLine().getStatusCode() == 200) { saveToLocal(billresponse.getEntity(), "chinaunicom.bill." + month + ".2.html"); } } } } }
找到要登陸的url以及要傳的參數,這裏手機號碼服務密碼這裏就不提供了.
new一個DefaultHttpClient,而後使用Get方式發出請求,若是登陸成功,其返回代碼是0000.
再用HttpPost方式將返回值寫到本地.
/** * 寫文件到本地 * * @param httpEntity * @param filename */ public static void saveToLocal(HttpEntity httpEntity, String filename) { try { File dir = new File("/home/amosli/workspace/chinaunicom/"); if (!dir.isDirectory()) { dir.mkdir(); } File file = new File(dir.getAbsolutePath() + "/" + filename); FileOutputStream fileOutputStream = new FileOutputStream(file); InputStream inputStream = httpEntity.getContent(); if (!file.exists()) { file.createNewFile(); } byte[] bytes = new byte[1024]; int length = 0; while ((length = inputStream.read(bytes)) > 0) { fileOutputStream.write(bytes, 0, length); } inputStream.close(); fileOutputStream.close(); } catch (Exception e) { e.printStackTrace(); } }
這裏若是隻是想輸出一下可使用EntityUtils.toString(HttpEntity entity)方法,其源碼以下:
public static String toString( final HttpEntity entity, final Charset defaultCharset) throws IOException, ParseException { Args.notNull(entity, "Entity"); final InputStream instream = entity.getContent(); if (instream == null) { return null; } try { Args.check(entity.getContentLength() <= Integer.MAX_VALUE, "HTTP entity too large to be buffered in memory"); int i = (int)entity.getContentLength(); if (i < 0) { i = 4096; } Charset charset = null; try { final ContentType contentType = ContentType.get(entity); if (contentType != null) { charset = contentType.getCharset(); } } catch (final UnsupportedCharsetException ex) { throw new UnsupportedEncodingException(ex.getMessage()); } if (charset == null) { charset = defaultCharset; } if (charset == null) { charset = HTTP.DEF_CONTENT_CHARSET; } final Reader reader = new InputStreamReader(instream, charset); final CharArrayBuffer buffer = new CharArrayBuffer(i); final char[] tmp = new char[1024]; int l; while((l = reader.read(tmp)) != -1) { buffer.append(tmp, 0, l); } return buffer.toString(); } finally { instream.close(); } }
這裏能夠發現其實現方式仍是比較容易看懂的,能夠指定編碼,也能夠不指定.
package com.amos; import org.apache.http.HttpResponse; import org.apache.http.client.CookieStore; import org.apache.http.client.HttpClient; import org.apache.http.client.methods.CloseableHttpResponse; import org.apache.http.client.methods.HttpGet; import org.apache.http.client.methods.HttpPost; import org.apache.http.cookie.Cookie; import org.apache.http.impl.client.*; import org.apache.http.util.EntityUtils; import java.io.BufferedReader; import java.io.InputStream; import java.io.InputStreamReader; /** * Created by amosli on 14-6-22. */ public class LoginWithCaptcha { public static void main(String args[]) throws Exception { //生成驗證碼的連接 String createCaptchaUrl = "http://uac.10010.com/portal/Service/CreateImage"; HttpClient httpClient = new DefaultHttpClient(); String name = "中國聯通手機號碼"; String pwd = "手機服務密碼"; //這裏可自定義所須要的cookie CookieStore cookieStore = new BasicCookieStore(); CloseableHttpClient httpclient = HttpClients.custom() .setDefaultCookieStore(cookieStore) .build(); //get captcha,獲取驗證碼 HttpGet captchaHttpGet = new HttpGet(createCaptchaUrl); HttpResponse capthcaResponse = httpClient.execute(captchaHttpGet); if (capthcaResponse.getStatusLine().getStatusCode() == 200) { //將驗證碼寫入本地 LoginChinaUnicom.saveToLocal(capthcaResponse.getEntity(), "chinaunicom.capthca." + System.currentTimeMillis()); } //手工輸入驗證碼並驗證 HttpResponse verifyResponse = null; String capthca = null; String uvc = null; do { //輸入驗證碼,讀入鍵盤輸入 //1) InputStream inputStream = System.in; BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream)); System.out.println("請輸入驗證碼:"); capthca = bufferedReader.readLine(); //2) //Scanner scanner = new Scanner(System.in); //capthca = scanner.next(); String verifyCaptchaUrl = "http://uac.10010.com/portal/Service/CtaIdyChk?verifyCode=" + capthca + "&verifyType=1"; HttpGet verifyCapthcaGet = new HttpGet(verifyCaptchaUrl); verifyResponse = httpClient.execute(verifyCapthcaGet); AbstractHttpClient abstractHttpClient = (AbstractHttpClient) httpClient; for (Cookie cookie : abstractHttpClient.getCookieStore().getCookies()) { System.out.println(cookie.getName() + ":" + cookie.getValue()); if (cookie.getName().equals("uacverifykey")) { uvc = cookie.getValue(); } } } while (!EntityUtils.toString(verifyResponse.getEntity()).contains("true")); //登陸 String loginurl = "https://uac.10010.com/portal/Service/MallLogin?userName=" + name + "&password=" + pwd + "&pwdType=01&productType=01&verifyCode=" + capthca + "&redirectType=03&uvc=" + uvc; HttpGet loginGet = new HttpGet(loginurl); CloseableHttpResponse loginResponse = httpclient.execute(loginGet); System.out.print("loginResponse:" + EntityUtils.toString(loginResponse.getEntity())); //抓取基本信息數據 HttpPost basicHttpGet = new HttpPost("http://iservice.10010.com/ehallService/static/acctBalance/execute/YH102010005/QUERY_AcctBalance.processData/Result"); LoginChinaUnicom.saveToLocal(httpclient.execute(basicHttpGet).getEntity(), "chinaunicom.basic.html"); } }
這裏有兩個難點,一是驗證碼,二uvc碼;
驗證碼,這裏將其寫到本地,而後人工輸入,這個還比較好解決.
uvc碼,很重要,這個是在cookie裏的,httpclient操做cookie的方法網上找了好久都沒有找到,後來看其源碼纔看到.
帳單數據(這裏是json格式的數據,可能不太方便查看):