HttpClient鏈接池的使用

1. HttpClient 簡介

HttpClient 是Apache的一個子項目,是能夠提供支持HTTP協議的Java客戶端編程工具包。在實際項目的使用過程當中,常常都是多線程訪問,所以可能存在多個線程都須要調用HttpClient對象的狀況,這相似於數據庫鏈接,因此咱們須要對鏈接進行池化管理,以便於提升性能。
HttpClient從4.2開始拋棄了先前的SingleClientConnManagerThreadSafeConnManger,取而代之的是BasicClientConnectionManagerPoolingClientConnectionManager,本文使用的是HttpClient 4.5 版本。html

2. 編碼和測試

2.1 建立PoolingHttpClientConnectionManager

PoolingHttpClientConnectionManager實現了HttpClientConnectionManager接口,顧名思義,它就是用來對鏈接進行池化管理的,首先建立一個對象,並在類加載時初始化它:java

public class HttpClientTest {

    // 池化管理
    private static PoolingHttpClientConnectionManager poolConnManager = null;

    private static CloseableHttpClient httpClient;
    //請求器的配置
    private static RequestConfig requestConfig;

    static {

        try {
            System.out.println("初始化HttpClientTest~~~開始");
            SSLContextBuilder builder = new SSLContextBuilder();
            builder.loadTrustMaterial(null, new TrustSelfSignedStrategy());
            SSLConnectionSocketFactory sslsf = new SSLConnectionSocketFactory(
                    builder.build());
            // 配置同時支持 HTTP 和 HTPPS
            Registry<ConnectionSocketFactory> socketFactoryRegistry = RegistryBuilder.<ConnectionSocketFactory> create().register(
                    "http", PlainConnectionSocketFactory.getSocketFactory()).register(
                    "https", sslsf).build();
            // 初始化鏈接管理器
            poolConnManager = new PoolingHttpClientConnectionManager(
                    socketFactoryRegistry);
            // 將最大鏈接數增長到200,實際項目最好從配置文件中讀取這個值
            poolConnManager.setMaxTotal(200);
            // 設置最大路由
            poolConnManager.setDefaultMaxPerRoute(2);
            // 根據默認超時限制初始化requestConfig
            int socketTimeout = 10000;
            int connectTimeout = 10000;
            int connectionRequestTimeout = 10000;
            requestConfig = RequestConfig.custom().setConnectionRequestTimeout(
                    connectionRequestTimeout).setSocketTimeout(socketTimeout).setConnectTimeout(
                    connectTimeout).build();

            // 初始化httpClient
            httpClient = getConnection();

            System.out.println("初始化HttpClientTest~~~結束");
        } catch (NoSuchAlgorithmException e) {
            e.printStackTrace();
        } catch (KeyStoreException e) {
            e.printStackTrace();
        } catch (KeyManagementException e) {
            e.printStackTrace();
        }
    }

    ......

}

上面的代碼中咱們只保存了一個全局的HttpClient對象,在多線程狀況下,每一個線程都會用這個對象去發HTTP請求,乍一看彷佛有線程安全問題,可是查看了官方文檔並作實驗驗證後,發現這是沒有問題的:git

When equipped with a pooling connection manager such as PoolingClientConnectionManager, HttpClient can be used to execute multiple requests simultaneously using multiple threads of execution.github

The PoolingClientConnectionManager will allocate connections based on its configuration. If all connections for a given route have already been leased, a request for a connection will block until a connection is released back to the pool. One can ensure the connection manager does not block indefinitely in the connection request operation by setting 'http.conn-manager.timeout' to a positive value. If the connection request cannot be serviced within the given time period ConnectionPoolTimeoutException will be thrown.數據庫

While HttpClient instances are thread safe and can be shared between multiple threads of execution, it is highly recommended that each thread maintains its own dedicated instance of HttpContext .apache

getConnection() 方法是根據咱們建立好的PoolingHttpClientConnectionManager對象去建立一個線程安全的HttpClient對象,具體代碼以下:編程

public static CloseableHttpClient getConnection() {
    CloseableHttpClient httpClient = HttpClients.custom()
            // 設置鏈接池管理
            .setConnectionManager(poolConnManager)
            // 設置請求配置
            .setDefaultRequestConfig(requestConfig)
            // 設置重試次數
            .setRetryHandler(new DefaultHttpRequestRetryHandler(0, false))
            .build();

    if (poolConnManager != null && poolConnManager.getTotalStats() != null)
    {
        System.out.println("now client pool "
                + poolConnManager.getTotalStats().toString());
    }

    return httpClient;
}

能夠看到咱們用一些HttpClients的靜態方法配置好了ConnectionManager和RequestConfig,最後build了一個HttpClient的對象,這個就是一個全局帶池化管理的對象,假如其中的某個線程調用了HttpClient的close() 方法,那麼接下來的線程就別想再用這個對象去發起HTTP請求了。安全

建立好了對象,咱們再提供一個發起GET請求的方法,其它線程能夠直接調用這個方法去發起一個GET請求,代碼以下:多線程

public static void httpGet(String url) {
    HttpGet httpGet = new HttpGet(url);
    CloseableHttpResponse response = null;
    try {
        response = httpClient.execute(httpGet);
        HttpEntity entity = response.getEntity();
        String result = EntityUtils.toString(entity, "utf-8");
        EntityUtils.consume(entity);
        System.out.println(result);
    } catch (IOException e) {
        e.printStackTrace();
    } finally {
        try {
            if (response != null)
                response.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

再建立一個用於測試的線程類,具體代碼和上面的httpGet() 方法相似,只不過增長了一些循環和打印的語句等:併發

static class GetThread extends Thread {
    private CloseableHttpClient httpClient;
    private String url;

    public GetThread(CloseableHttpClient client, String url) {
        httpClient = client;
        this.url = url;
    }

    public void run() {
        for(int i = 0; i < 3; i++) {
            HttpGet httpGet = new HttpGet(url);
            CloseableHttpResponse response = null;
            try {
                response = httpClient.execute(httpGet);
                HttpEntity entity = response.getEntity();
                String result = EntityUtils.toString(entity, "utf-8");
                // EntityUtils.consume(entity);
                System.out.println(Thread.currentThread().getName() + " Finished");
            } catch (IOException e) {
                e.printStackTrace();
            } finally {
                try {
                    if (response != null) {
                        response.close();
                    }
                    if (httpGet != null) {
                        httpGet.releaseConnection();
                    }
                } catch (IOException e) {
                    e.printStackTrace();
                }

            }

        }

    }
}

如今,只要在main函數中添加測試代碼就能夠完成測試了:

public static void main(String[] args) {
    HttpClientTest.httpGet("https://kmg343.gitbooks.io/httpcl-ient4-4-no2/content/233_lian_jie_chi_guan_li_qi.html");
    String[] urisToGet = {
            "https://kmg343.gitbooks.io/httpcl-ient4-4-no2/content/24_duo_xian_cheng_zhi_xing_qing_qiu.html",
            "https://kmg343.gitbooks.io/httpcl-ient4-4-no2/content/24_duo_xian_cheng_zhi_xing_qing_qiu.html",
            "https://kmg343.gitbooks.io/httpcl-ient4-4-no2/content/24_duo_xian_cheng_zhi_xing_qing_qiu.html",
            "https://kmg343.gitbooks.io/httpcl-ient4-4-no2/content/24_duo_xian_cheng_zhi_xing_qing_qiu.html",
            "https://kmg343.gitbooks.io/httpcl-ient4-4-no2/content/24_duo_xian_cheng_zhi_xing_qing_qiu.html"
    };

    GetThread[] threads = new GetThread[urisToGet.length];
    for (int i = 0; i < threads.length; i++) {
        threads[i] = new GetThread(httpClient, urisToGet[i]);
    }

    for (Thread tmp : threads) {
        tmp.start();
    }
}

這裏先試着用了一次靜態的httpGet() 方法去發送GET請求,而後啓動了5個線程去發送請求,假如咱們把static初始化塊中的poolConnManager.setMaxTotal(200); 改成 poolConnManager.setMaxTotal(2); 咱們能夠在控制檯看到以下的輸出:

15:59:40,628 DEBUG PoolingHttpClientConnectionManager:314 - Connection [id: 0][route: {s}->https://kmg343.gitbooks.io:443] can be kept alive indefinitely
15:59:40,629 DEBUG PoolingHttpClientConnectionManager:320 - Connection released: [id: 0][route: {s}->https://kmg343.gitbooks.io:443][total kept alive: 1; route allocated: 1 of 2; total allocated: 1 of 2]
......
15:59:47,401 DEBUG PoolingHttpClientConnectionManager:314 - Connection [id: 0][route: {s}->https://kmg343.gitbooks.io:443] can be kept alive indefinitely
15:59:47,401 DEBUG PoolingHttpClientConnectionManager:320 - Connection released: [id: 0][route: {s}->https://kmg343.gitbooks.io:443][total kept alive: 1; route allocated: 2 of 2; total allocated: 2 of 2]
Thread-4 Finished

而假如仍然是分配200個鏈接的話,total allocated:就會顯示爲2 of 200,能夠看到鏈接池生效了,多線程併發調用沒有問題。假如咱們在測試線程的for循環中添加了httpClient.close(); 語句,則能夠看到控制檯在發起了一些鏈接後拋出異常,提示鏈接池已經關閉。請記得每一個請求返回處理後,調用EntityUtils.toString或者EntityUtils.consume()關閉流,如下是官網的解釋:

The difference between closing the content stream and closing the response is that the former will attempt to keep the underlying connection alive by consuming the entity content while the latter immediately shuts down and discards the connection.
When working with streaming entities, one can use the EntityUtils#consume(HttpEntity) method to ensure that the entity content has been fully consumed and the underlying stream has been closed. There can be situations, however, when only a small portion of the entire response content needs to be retrieved and the performance penalty for consuming the remaining content and making the connection reusable is too high, in which case one can terminate the content stream by closing the response.

因爲咱們在實際應用中,訪問的連接可能都是事先在代碼中定義好的,因此咱們不必關閉鏈接,只須要把流關閉或消費完,讓鏈接keep-alive,這樣下一個線程重用這個鏈接時就能夠省去TCP的三次握手過程了。

3. 一些總結

3.1 每一個路由(route)最大鏈接數

上面咱們有一行這樣的代碼:poolConnManager.setDefaultMaxPerRoute(2); 用於設置最大路由,這裏route的概念能夠理解爲 運行環境機器目標機器 的一條線路。舉例來講,咱們使用HttpClient的實現來分別請求 www.baidu.com 的資源和 www.bing.com 的資源那麼他就會產生兩個route。如下是網上對設置這個參數的一些解釋:

這裏爲何要特別提到route最大鏈接數這個參數呢,由於這個參數的默認值爲2,若是不設置這個參數,值默認狀況下對於同一個目標機器的最大併發鏈接只有2個!這意味着若是你正在執行一個針對某一臺目標機器的抓取任務的時候,哪怕你設置鏈接池的最大鏈接數爲200,可是實際上仍是隻有2個鏈接在工做,其餘剩餘的198個鏈接都在等待,都是爲別的目標機器服務的。

3.2 BasicClientConnectionManager

BasicClientConnectionManager內部只維護一個活動的connection,儘管這個類是線程安全的,可是最好在一個單獨的線程中重複使用它。若是在同一個BasicClientConnectionManager對象中,屢次執行http請求,後繼請求與先前請求是同一個route,那麼BasicClientConnectionManager會使用同一個鏈接完成後續請求,不然,BasicClientConnectionManager會將先前的connection關閉,而後爲後續請求建立一個新的鏈接。換句話說,BasicClientConnectionManager會盡力複用先前的鏈接(注意:建立鏈接和銷燬鏈接都是不小的開銷)

4. 參考

1. HttpClient Tutorial
2. HttpClient使用
3. HttpClient4.4中文教程
4. HttpClient使用詳解

相關文章
相關標籤/搜索