這兩天開始準備作一個本身的網絡爬蟲,因此就各類找資料,找到了一個資料,講的挺好的,用的就是HttpClient來寫的,就在apache上下了jar包,準備本身編寫,可是硬是找不到對應的類。上了apache官網看了看,原來資料上用的是apache原來的一個開源工程,叫Commons HttpClient,改項目已經早已被apache棄用,並再也不更新新版本,取而代之的是Apache HttpComponents這個工程的HttpClient和HttpCore,由於其提供更好的性能和靈活性,雖然我尚未任何體會,既然原來的被棄用,那就學新的吧,追趕潮流吧.......javascript
從官網上下了pdf版教程,開始第一個例子吧...html
首先要導入httpclient-4.3.3.jar,httpcore-4.3.2.jar,common-logging.jar這三個jar包。java
開始寫代碼........web
GET請求:
apache
package com.lu.test; import java.io.InputStream; import java.net.URI; import java.util.Scanner; import org.apache.http.Header; import org.apache.http.HeaderIterator; import org.apache.http.HttpEntity; import org.apache.http.StatusLine; import org.apache.http.client.methods.CloseableHttpResponse; import org.apache.http.client.methods.HttpGet; import org.apache.http.client.utils.URIBuilder; import org.apache.http.impl.client.CloseableHttpClient; import org.apache.http.impl.client.HttpClients; public class HttpClientTest { public static void main(String[] args) throws Exception { URI uri = new URIBuilder().setScheme("HTTP").setHost("www.baidu.com") .setPath("/").setParameter("name", "****").build(); // 建立客戶端對象,至關於打開一個瀏覽器 CloseableHttpClient client = HttpClients.createDefault(); try { // 建立一個get請求 HttpGet httpGet = new HttpGet(uri); // 執行這個請求,改方法返回一個response對象 CloseableHttpResponse response = client.execute(httpGet); try { // 獲得請求的方式 System.out.println("request method : " + httpGet.getMethod()); System.out.println("-------------------------------"); // 獲得返回的狀態行,StatusLine爲接口,getStatusLine()返回一個實現該接口的對象 StatusLine statusLine = response.getStatusLine(); System.out.println(statusLine.getProtocolVersion()); System.out.println(statusLine.getStatusCode()); System.out.println(statusLine.getReasonPhrase()); System.out.println("-------------------------------"); // getAllHeaders()方法將獲得全部的響應頭,並返回一個數組 // Header[] headers = response.getAllHeaders(); // for (Header h : response.getAllHeaders()) { // System.out.println(h.getName() + " : " + h.getValue()); // } HeaderIterator iter = response.headerIterator(); while (iter.hasNext()) { Header header = iter.nextHeader(); System.out.println(header.getName() + " : " + header.getValue()); } } finally { response.close(); } } finally { client.close(); } } }
輸出結果: request method : GET ------------------------------- HTTP/1.1 200 OK ------------------------------- Date : Sat, 15 Mar 2014 15:26:21 GMT Content-Type : text/html Transfer-Encoding : chunked Connection : Keep-Alive Vary : Accept-Encoding Set-Cookie : BAIDUID=98D1A9B265CFFD5D549FAF1B3AF80EFA:FG=1; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com Set-Cookie : BDSVRTM=11; path=/ Set-Cookie : H_PS_PSSID=5489_5229_1431_5223_5460_4261_5568_4760_5516; path=/; domain=.baidu.com P3P : CP=" OTI DSP COR IVA OUR IND COM " Expires : Sat, 15 Mar 2014 15:26:21 GMT Cache-Control : private Server : BWS/1.1 BDPAGETYPE : 1 BDQID : 0xe06a0fbd00182376 BDUSERID : 0
POST請求數組
package com.lu.test; import java.util.ArrayList; import java.util.List; import org.apache.http.HttpEntity; import org.apache.http.NameValuePair; import org.apache.http.client.entity.UrlEncodedFormEntity; import org.apache.http.client.methods.CloseableHttpResponse; import org.apache.http.client.methods.HttpPost; import org.apache.http.impl.client.CloseableHttpClient; import org.apache.http.impl.client.HttpClients; import org.apache.http.message.BasicNameValuePair; import org.apache.http.util.EntityUtils; public class HttpClientTest { public static void main(String[] args) throws Exception { CloseableHttpClient httpClient = HttpClients.createDefault(); // 用於存儲表單數據 List<NameValuePair> form = new ArrayList<NameValuePair>(); try { // 添加鍵值對 form.add(new BasicNameValuePair("username", "****")); form.add(new BasicNameValuePair("password", "********")); // 把表單轉換成entity UrlEncodedFormEntity entity = new UrlEncodedFormEntity(form, "UTF-8"); HttpPost httpPost = new HttpPost( "http://localhost:8080/spiderweb/RirectServlet"); // 將entity Set到post請求中 httpPost.setEntity(entity); CloseableHttpResponse httpResponse = httpClient.execute(httpPost); try { HttpEntity responseEntity = httpResponse.getEntity(); String content = EntityUtils.toString(responseEntity); System.out.println(content); } finally { httpResponse.close(); } } finally { httpClient.close(); } } }
使用ResponseHandler來處理響應,ResponseHandler可以保證在任何狀況下都會將底層的HTTP鏈接釋放回鏈接管理器,從而簡化了編碼。瀏覽器
package com.lu.test; import java.io.IOException; import org.apache.http.HttpEntity; import org.apache.http.HttpResponse; import org.apache.http.StatusLine; import org.apache.http.client.ClientProtocolException; import org.apache.http.client.ResponseHandler; import org.apache.http.client.methods.HttpGet; import org.apache.http.impl.client.CloseableHttpClient; import org.apache.http.impl.client.HttpClients; import org.apache.http.util.EntityUtils; public class HttpClientTest { public static void main(String[] args) throws Exception { CloseableHttpClient httpClient = HttpClients.createDefault(); HttpGet httpGet = new HttpGet("http://www.baidu.com"); // 建立一個ResponseHandler來處理響應 ResponseHandler<String> handler = new ResponseHandler<String>() { @Override public String handleResponse(HttpResponse response) throws ClientProtocolException, IOException { StatusLine statusLine = response.getStatusLine(); System.out.println(statusLine.getStatusCode()); HttpEntity entity = response.getEntity(); if (null != entity) { return EntityUtils.toString(entity); } return null; } }; // 執行請求,並傳入ResponseHandler來處理響應 String content = httpClient.execute(httpGet, handler); System.out.println(content); } }
HttpClient,我在註釋上說建立HttpClient對象就至關於打開一個瀏覽器,實際上是不許確的。網絡
看官方教程上的說明。
dom
HttpClient is NOT a browser. It is a client side HTTP transport library. HttpClient's purpose iside
to transmit and receive HTTP messages. HttpClient will not attempt to process content, execute
javascript embedded in HTML pages, try to guess content type, if not explicitly set, or reformat
request / redirect location URIs, or other functionality unrelated to the HTTP transport.
大體意思就是 HttpClient不是一個瀏覽器。是一個客戶端的HTTP傳輸庫。它的目的就是傳送和接受HTTP信息。HttpClient不會嘗試去處理內容,執行內嵌在HTML頁面中的javascript代碼,去猜想內容類型,從新格式化請求或者重定向URI,以及其餘一些與HTTP傳輸無關的功能。
關於其餘的函數,包括請求對象,響應對象,都很是簡單,就不詳細解釋。