JedisPool又搞出了故障?

1、故障簡述

公司在訪問Redis時使用了JedisPool。當Redis實例不可達時,會將該實例放入黑名單。後臺線程週期性掃描黑名單列表,若是可達,則恢復。在檢測時會新建新的JedisPool,經過jedisPool.getResource().close()的方式檢測可達性。因爲是週期性檢測,每次檢測都會new一個新的JedisPool,並且在建立JedisPool時,配置了minIdle爲1。這樣就埋下隱患。若是Redis長時間不可達,會new不少的JedisPool,當Redis恢復時,因爲JedisPool有後臺的週期性驅逐線程(若是鏈接長時間空閒,則銷燬;爲保證該pool內有足夠minIdle鏈接,又會建立新的鏈接),這樣會建立大量的鏈接。達到Redis的最大鏈接數限制,正常請求的鏈接會收到服務端返回的ERR max number of clients reached錯誤而拋出異常。注意,雖然客戶端收到了錯誤,可是對於客戶端而言鏈接是創建上了,客戶端將請求發送到了服務端,在讀取服務端請求的返回值時,服務端返回了ERR max number of clients reached錯誤。對於Redis服務端而言,對於形成服務端達到「最大鏈接數限制」的鏈接,服務端會直接關閉。java

Caused by: redis.clients.jedis.exceptions.JedisDataException: ERR max number of clients reached
        at redis.clients.jedis.Protocol.processError(Protocol.java:130)
        at redis.clients.jedis.Protocol.process(Protocol.java:164)
        at redis.clients.jedis.Protocol.read(Protocol.java:218)
        at redis.clients.jedis.Connection.readProtocolWithCheckingBroken(Connection.java:341)
        at redis.clients.jedis.Connection.getBinaryMultiBulkReply(Connection.java:277)
        at redis.clients.jedis.BinaryJedis.mget(BinaryJedis.java:606)
複製代碼

有個疑問: 爲何日誌中還有寫失敗的請求呢?不該該是正常創建的那些鏈接,能夠正常寫數據嗎?由於被「達到最大鏈接數異常」的鏈接已經被回收了,不可能再被客戶端使用了。難道服務端有清理鏈接的邏輯?redis

Caused by: java.net.SocketException: Connection reset by peer (Write failed) at java.base/java.net.SocketOutputStream.socketWrite0(Native Method) at java.base/java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:110) at java.base/java.net.SocketOutputStream.write(SocketOutputStream.java:150) at redis.clients.util.RedisOutputStream.flushBuffer(RedisOutputStream.java:52) at redis.clients.util.RedisOutputStream.flush(RedisOutputStream.java:216) at redis.clients.jedis.Connection.flush(Connection.java:332) ... 30 more 複製代碼

2、驅逐線程

1. 驅逐線程的建立

/** * Create a new <code>GenericObjectPool</code> using a specific * configuration. * * @param factory The object factory to be used to create object instances * used by this pool * @param config The configuration to use for this pool instance. The * configuration is used by value. Subsequent changes to * the configuration object will not be reflected in the * pool. */
public GenericObjectPool(PooledObjectFactory<T> factory, GenericObjectPoolConfig config) {
    // 還記得以前的JMX問題嗎?
    super(config, ONAME_BASE, config.getJmxNamePrefix());

    if (factory == null) {
        jmxUnregister(); // tidy up
        throw new IllegalArgumentException("factory may not be null");
    }
    this.factory = factory;

    idleObjects = new LinkedBlockingDeque<PooledObject<T>>(config.getFairness());

    setConfig(config);
    // 這裏開啓驅逐線程
    startEvictor(getTimeBetweenEvictionRunsMillis());
}
複製代碼

能夠看到,驅逐線程是在構造函數中建立開啓的。也就是說,每new一個JedisPool都會有一個對應的驅逐線程在週期性執行。 回憶一下,也是在這個構造函數裏往JMX進行了註冊,並引起了另一個問題: new JedisPool可能會很慢markdown

2. 驅逐線程的實現

/** * <p>Starts the evictor with the given delay. If there is an evictor * running when this method is called, it is stopped and replaced with a * new evictor with the specified delay.</p> * * <p>This method needs to be final, since it is called from a constructor. * See POOL-195.</p> * * @param delay time in milliseconds before start and between eviction runs */
final void startEvictor(long delay) {
    synchronized (evictionLock) {
        if (null != evictor) {
            EvictionTimer.cancel(evictor);
            evictor = null;
            evictionIterator = null;
        }
        if (delay > 0) {
            evictor = new Evictor();
            EvictionTimer.schedule(evictor, delay, delay);
        }
    }
}
複製代碼

註釋寫的很清楚,兩點:併發

  • 若是驅逐任務已經被建立,那麼就取消。
    • 這種狀況,delay參數通常是-1,僅僅是取消驅逐任務,而不開啓新的驅逐任務。
    • 想一下,在coding過程當中,取消過嗎?若是沒有,有啥問題?
  • 若是沒有驅逐任務,那麼按照週期調度驅逐任務。
    • 週期默認是30s。

3. 驅逐週期

public static final long DEFAULT_TIME_BETWEEN_EVICTION_RUNS_MILLIS = -1L;

private volatile long timeBetweenEvictionRunsMillis =
            BaseObjectPoolConfig.DEFAULT_TIME_BETWEEN_EVICTION_RUNS_MILLIS;

/** * Returns the number of milliseconds to sleep between runs of the idle * object evictor thread. When non-positive, no idle object evictor thread * will be run. * * @return number of milliseconds to sleep between evictor runs * * @see #setTimeBetweenEvictionRunsMillis */
public final long getTimeBetweenEvictionRunsMillis() {
    return timeBetweenEvictionRunsMillis;
}
複製代碼

註釋寫的也很清楚:若是是非正數(包括負數或0),那麼就不會有空閒對象的驅逐線程被建立。app

能夠看到上面的默認值是-1,也就是不開啓驅逐線程。可是JedisPoolConfig卻給出了JedisPool的默認值,每30s調度一次驅逐線程:less

public class JedisPoolConfig extends GenericObjectPoolConfig {
  public JedisPoolConfig() {
    // defaults to make your life with connection pool easier :)
    setTestWhileIdle(true);
    setMinEvictableIdleTimeMillis(60000);
    setTimeBetweenEvictionRunsMillis(30000);
    setNumTestsPerEvictionRun(-1);
  }
}
複製代碼

上面的註釋說:這些默認值會使得你鏈接池的生命週期更容易。這個life是鏈接池的仍是coder的life?socket

4. 驅逐任務

/** * The idle object evictor {@link TimerTask}. * * @see GenericKeyedObjectPool#setTimeBetweenEvictionRunsMillis */
class Evictor extends TimerTask {
    /** * Run pool maintenance. Evict objects qualifying for eviction and then * ensure that the minimum number of idle instances are available. * Since the Timer that invokes Evictors is shared for all Pools but * pools may exist in different class loaders, the Evictor ensures that * any actions taken are under the class loader of the factory * associated with the pool. */
    @Override
    public void run() {
        
        try {
            // 省略關於class loader部分的代碼

            // Evict from the pool
            try {
                evict();
            } catch(Exception e) {
                swallowException(e);
            } catch(OutOfMemoryError oome) {
                // Log problem but give evictor thread a chance to continue
                // in case error is recoverable
                oome.printStackTrace(System.err);
            }
            // Re-create idle instances.
            try {
                ensureMinIdle();
            } catch (Exception e) {
                swallowException(e);
            }
        } finally {
            // Restore the previous CCL
            Thread.currentThread().setContextClassLoader(savedClassLoader);
        }
    }
}
複製代碼

註釋寫的也很清楚:ide

  • 驅逐知足驅逐條件的對象
  • 確保池子內有minIdle數量的對象

5. evict

/** * <p>Perform <code>numTests</code> idle object eviction tests, evicting * examined objects that meet the criteria for eviction. If * <code>testWhileIdle</code> is true, examined objects are validated * when visited (and removed if invalid); otherwise only objects that * have been idle for more than <code>minEvicableIdleTimeMillis</code> * are removed.</p> * * @throws Exception when there is a problem evicting idle objects. */

@Override
public void evict() throws Exception {

    PooledObject<T> underTest = null;
    EvictionPolicy<T> evictionPolicy = getEvictionPolicy();

    
    EvictionConfig evictionConfig = new EvictionConfig(
            getMinEvictableIdleTimeMillis(),
            getSoftMinEvictableIdleTimeMillis(),
            getMinIdle());

    boolean testWhileIdle = getTestWhileIdle();

    for (int i = 0, m = getNumTests(); i < m; i++) {
        // 從idleObjects中獲取要檢測的對象

        // 是否能夠驅逐該對象
        boolean evict;
        try {
            evict = evictionPolicy.evict(evictionConfig, underTest,
                    idleObjects.size());
        } catch (Throwable t) {
            // 
        }

        if (evict) {
            // 驅逐,銷燬該對象
            destroy(underTest);
        } else {
            // 該對象還不知足驅逐條件
            // 若是須要探測,則進行探測邏輯
            if (testWhileIdle) {
                boolean active = false;
                try {
                    factory.activateObject(underTest);
                    active = true;
                } catch (Exception e) {
                    destroy(underTest);
                    destroyedByEvictorCount.incrementAndGet();
                }
                if (active) {
                    // 此處進行ping探測:失敗則銷燬,成功則什麼都不作
                    if (!factory.validateObject(underTest)) {
                        destroy(underTest);
                        destroyedByEvictorCount.incrementAndGet();
                    } else {
                        // 什麼都不作
                    }
                }
            }
        }
    }
}
複製代碼

註釋內容:函數

  • 只檢測numTests個對象(有可能被驅逐,有可能進行ping探活)
  • 根據驅逐策略和驅逐配置判斷是否須要驅逐該對象
    • 若是驅逐條件知足,則驅逐
    • 若是不知足驅逐條件
      • 是否須要testWhileIdle,須要則進行ping探活

numTestspost

/** * Returns the maximum number of objects to examine during each run (if any) * of the idle object evictor thread. When positive, the number of tests * performed for a run will be the minimum of the configured value and the * number of idle instances in the pool. When negative, the number of tests * performed will be <code>ceil({@link #getNumIdle}/ * abs({@link #getNumTestsPerEvictionRun}))</code> which means that when the * value is <code>-n</code> roughly one nth of the idle objects will be * tested per run. */
private int getNumTests() {
    int numTestsPerEvictionRun = getNumTestsPerEvictionRun();
    if (numTestsPerEvictionRun >= 0) {
        return Math.min(numTestsPerEvictionRun, idleObjects.size());
    } else {
        return (int) (Math.ceil(idleObjects.size() /
                Math.abs((double) numTestsPerEvictionRun)));
    }
}

public class JedisPoolConfig extends GenericObjectPoolConfig {
  public JedisPoolConfig() {
    // defaults to make your life with connection pool easier :)
    setTestWhileIdle(true);
    setMinEvictableIdleTimeMillis(60000);
    setTimeBetweenEvictionRunsMillis(30000);
    // 這裏給的是-1
    setNumTestsPerEvictionRun(-1);
  }
}

複製代碼

能夠看到:

  • 若是numTestsPerEvictionRun是非負數,則返回它和全部空閒對象的最小值
  • 若是numTestsPerEvictionRun是負數,則取了個倍數

其實,在JedisPool中默認就是檢測全部的空閒對象

驅逐策略

/** * Provides the default implementation of {@link EvictionPolicy} used by the * pools. Objects will be evicted if the following conditions are met: * <ul> * <li>the object has been idle longer than * {@link GenericObjectPool#getMinEvictableIdleTimeMillis()} / * {@link GenericKeyedObjectPool#getMinEvictableIdleTimeMillis()}</li> * <li>there are more than {@link GenericObjectPool#getMinIdle()} / * {@link GenericKeyedObjectPoolConfig#getMinIdlePerKey()} idle objects in * the pool and the object has been idle for longer than * {@link GenericObjectPool#getSoftMinEvictableIdleTimeMillis()} / * {@link GenericKeyedObjectPool#getSoftMinEvictableIdleTimeMillis()} * </ul> * This class is immutable and thread-safe. * */
public class DefaultEvictionPolicy<T> implements EvictionPolicy<T> {

    @Override
    public boolean evict(EvictionConfig config, PooledObject<T> underTest, int idleCount) {

        if ((config.getIdleSoftEvictTime() < underTest.getIdleTimeMillis() &&
                config.getMinIdle() < idleCount) ||
                config.getIdleEvictTime() < underTest.getIdleTimeMillis()) {
            return true;
        }
        return false;
    }
}
複製代碼

註釋內容,知足下面兩個條件之一就驅逐:

  • 對象的空閒時間超過soft驅逐時間且當前的空閒對象數超過了最小的空閒數
  • 對象的空閒時間超過了驅逐時間

JedisPool默認值:

// 空閒60s就驅逐
setMinEvictableIdleTimeMillis(60000);

// soft空閒時間是-1。也就是當前池子裏只要空閒對象數超過了minIdle就能夠驅逐
private volatile long softMinEvictableIdleTimeMillis =
            BaseObjectPoolConfig.DEFAULT_SOFT_MIN_EVICTABLE_IDLE_TIME_MILLIS;
public static final long DEFAULT_SOFT_MIN_EVICTABLE_IDLE_TIME_MILLIS = -1;
複製代碼

idleTime

這個時間是怎麼計算的?

@Override
public long getIdleTimeMillis() {
    final long elapsed = System.currentTimeMillis() - lastReturnTime;
    // elapsed may be negative if:
    // - another thread updates lastReturnTime during the calculation window
    // - System.currentTimeMillis() is not monotonic (e.g. system time is set back)
    return elapsed >= 0 ? elapsed : 0;
}
複製代碼

當前時間距離上次該對象歸還給線程池的時間,就是空閒時間。

那若是在歸還以後,驅逐以前,這個對象又被borrow了怎麼辦?lastReturnTime會更新嗎? 這種狀況是不會發生的。 驅逐會從idleObjects(LinkedBlockingDeque)中獲取對象,而borrow時會將該對象從idleObjects中移除,好像是有併發的風險。

destroy

private void destroy(PooledObject<T> toDestory) throws Exception {
    toDestory.invalidate();
    idleObjects.remove(toDestory);
    allObjects.remove(new IdentityWrapper<T>(toDestory.getObject()));
    try {
        factory.destroyObject(toDestory);
    } finally {
        destroyedCount.incrementAndGet();
        createCount.decrementAndGet();
    }
}
複製代碼

探活邏輯

若是不知足驅逐條件,也會對該對象進行探活檢測:發ping命令。

@Override
public boolean validateObject(PooledObject<Jedis> pooledJedis) {
  final BinaryJedis jedis = pooledJedis.getObject();
  try {
    HostAndPort hostAndPort = this.hostAndPort.get();

    String connectionHost = jedis.getClient().getHost();
    int connectionPort = jedis.getClient().getPort();

    return hostAndPort.getHost().equals(connectionHost)
        && hostAndPort.getPort() == connectionPort && jedis.isConnected()
        && jedis.ping().equals("PONG");
  } catch (final Exception e) {
    return false;
  }
}
複製代碼

6. ensureMinIdle

雖然被驅逐了,可是還要保證池子裏有足夠的minIdle對象。

/** * Tries to ensure that {@code idleCount} idle instances exist in the pool. * <p> * Creates and adds idle instances until either {@link #getNumIdle()} reaches {@code idleCount} * or the total number of objects (idle, checked out, or being created) reaches * {@link #getMaxTotal()}. If {@code always} is false, no instances are created unless * there are threads waiting to check out instances from the pool. * * @param idleCount the number of idle instances desired * @param always true means create instances even if the pool has no threads waiting * @throws Exception if the factory's makeObject throws */
private void ensureIdle(int idleCount, boolean always) throws Exception {
    if (idleCount < 1 || isClosed() || (!always && !idleObjects.hasTakeWaiters())) {
        return;
    }

    while (idleObjects.size() < idleCount) {
        PooledObject<T> p = create();
        if (p == null) {
            // Can't create objects, no reason to think another call to
            // create will work. Give up.
            break;
        }
        if (getLifo()) {
            idleObjects.addFirst(p);
        } else {
            idleObjects.addLast(p);
        }
    }
    if (isClosed()) {
        // Pool closed while object was being added to idle objects.
        // Make sure the returned object is destroyed rather than left
        // in the idle object pool (which would effectively be a leak)
        clear();
    }
}
複製代碼

註釋內容,建立線程的條件,要麼是池子裏的idle對象數達到最小空閒數,要麼池子裏的對象數超過了最大對象數。

也就是說,即便池子裏idle數量不夠,可是已經超過了池子中最大對象數,也不會建立新的對象。

7. minIdle vs maxIdle

爲何既有minIdle又有maxIdle?

上面全部的闡述都是圍繞minIdle來說的。池中必需要有minIdle個空閒對象備用。

maxIdle用在哪裏?

returnObject時會被用到:若是歸還時,發現池子中已經有足夠的空閒對象,那麼直接銷燬吧。

/** * <p> * If {@link #getMaxIdle() maxIdle} is set to a positive value and the * number of idle instances has reached this value, the returning instance * is destroyed. * <p> * If {@link #getTestOnReturn() testOnReturn} == true, the returning * instance is validated before being returned to the idle instance pool. In * this case, if validation fails, the instance is destroyed. * <p> */
@Override
public void returnObject(T obj) {
    // 1. 更新對象狀態爲RETURNING
    synchronized(p) {
        final PooledObjectState state = p.getState();
        if (state != PooledObjectState.ALLOCATED) {
            throw new IllegalStateException(
                    "Object has already been returned to this pool or is invalid");
        } else {
            p.markReturning(); // Keep from being marked abandoned
        }
    }

    long activeTime = p.getActiveTimeMillis();
    // 2. 歸還時探測該對象的活性
    if (getTestOnReturn()) {
        if (!factory.validateObject(p)) {
            try {
                // ping失敗,則銷燬
                destroy(p);
            } catch (Exception e) {
                swallowException(e);
            }
            try {
                // 銷燬後,還要保證minIdle
                ensureIdle(1, false);
            } catch (Exception e) {
                swallowException(e);
            }
            updateStatsReturn(activeTime);
            return;
        }
    }

    // 3. 更新對象狀態爲IDLE;更新歸還時間
    if (!p.deallocate()) {
        throw new IllegalStateException(
                "Object has already been returned to this pool or is invalid");
    }

    // 4. 是否到達最大空閒數
    int maxIdleSave = getMaxIdle();
    if (isClosed() || maxIdleSave > -1 && maxIdleSave <= idleObjects.size()) {
        try {
            // 直接銷燬
            destroy(p);
        } catch (Exception e) {
            swallowException(e);
        }
    } else {
        // 放入空閒列表,正常歸還給池子
        if (getLifo()) {
            idleObjects.addFirst(p);
        } else {
            idleObjects.addLast(p);
        }
        if (isClosed()) {
            // Pool closed while object was being added to idle objects.
            // Make sure the returned object is destroyed rather than left
            // in the idle object pool (which would effectively be a leak)
            clear();
        }
    }
    updateStatsReturn(activeTime);
}
複製代碼

作了兩件大事:

  • 歸還以前,若是要探活
    • 探活失敗直接銷燬
    • 銷燬後須要保證池子中有足夠的空閒對象
  • 若是達到最大空閒數maxIdle直接銷燬

3、問題復現

public class PoolLeak {
    public static void main(String[] args) {
        GenericObjectPoolConfig config = new JedisPoolConfig();
        config.setMinIdle(1);
        for (int i = 0; i < 5; i++) {
            JedisPool jedisPool = new JedisPool(config, "localhost");
            try {
                jedisPool.getResource().close();
            } catch (Exception e) {
                e.printStackTrace();
                // jedisPool.destroy();
            }
        }

        System.out.println("over...");
        try {
            System.in.read();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}
複製代碼

在本地Redis server關閉的狀況下,執行上面的代碼。

1. Redis未啓動

image.png

能夠看到,直接被RST了。

2. 啓動Redis

開啓redis server。

image.png

  • 第一次因爲Redis server未啓動,因此失敗
  • 第二次,30s一次的驅逐週期,因爲池子中沒有空閒的,因此不驅逐,可是要ensureMinIdle,Redis server未啓動,鏈接失敗
  • 第三次,Redis server啓動,達到驅逐週期,在ensureMinIdle時,建立鏈接成功

3. 驅逐

image.png

能夠看到上面,客戶端發送了quit命令。

4. 修復

其實destroy鏈接池就行了。也就是把上面代碼中被註釋的代碼:// jedisPool.destroy();

/** * Closes the pool. Once the pool is closed, {@link #borrowObject()} will * fail with IllegalStateException, but {@link #returnObject(Object)} and * {@link #invalidateObject(Object)} will continue to work, with returned * objects destroyed on return. * <p> * Destroys idle instances in the pool by invoking {@link #clear()}. */
@Override
public void close() {
    if (isClosed()) {
        return;
    }

    synchronized (closeLock) {
        if (isClosed()) {
            return;
        }

        // Stop the evictor before the pool is closed since evict() calls
        // assertOpen()
        // 關閉驅逐調度任務
        startEvictor(-1L);

        closed = true;
        // This clear removes any idle objects
        // 移除全部空閒任務
        clear();
        
        // 註銷JMX 
        jmxUnregister();

        // Release any threads that were waiting for an object
        idleObjects.interuptTakeWaiters();
    }
}
複製代碼
  • 池子一旦關閉,borrowObject就會失敗
  • returnObjectinvalidateObject還能工做,這樣被歸還的對象立馬被銷燬。

4、總結

  • minIdle最好是0。
  • pool若是不用了,記得主動銷燬。
相關文章
相關標籤/搜索