Redis scan命令的一次坑

Redis做爲當前服務架構不可或缺的Cache,其支持豐富多樣的數據結構,Redis在使用中其實也有不少坑,本次博主遇到的坑或許說是Java程序員會遇到的多一點,下面就聽博主詳細道來。java

線上服務堵塞

String key = keyOf(appid);
int retryCount = 3;
int socketRetryCount = 3;
Exception ex = null;
while(retryCount > 0 && socketRetryCount > 0) {
    try {
        return redisDao.getMap(key);
    }catch (Exception e) {

    }
}

12月2日被告知服務出現異常,查看日誌發現其運行到上述代碼getMap方法處後日志就沒有內容了。程序員

問題分析

"pool-13-thread-6" prio=10 tid=0x00007f754800e800 nid=0x71b5 waiting on condition [0x00007f758f0ee000]
    java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x0000000779b75f40> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
    at org.apache.commons.pool2.impl.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:583)
    at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:442)
    at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363)
    at redis.clients.util.Pool.getResource(Pool.java:49)
    at redis.clients.jedis.JedisPool.getResource(JedisPool.java:99)
    at org.reborndb.reborn.RoundRobinJedisPool.getResource(RoundRobinJedisPool.java:300)
    at com.le.smartconnect.adapter.spring.RebornConnectionFactory.getConnection(RebornConnectionFactory.java:43)
    at org.springframework.data.redis.core.RedisConnectionUtils.doGetConnection(RedisConnectionUtils.java:128)
    at org.springframework.data.redis.core.RedisConnectionUtils.getConnection(RedisConnectionUtils.java:91)
    at org.springframework.data.redis.core.RedisConnectionUtils.getConnection(RedisConnectionUtils.java:78)
    at xxx.run(xxx.java:80)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

    Locked ownable synchronizers:
- <0x000000074f529b08> (a java.util.concurrent.ThreadPoolExecutor$Worker)

從線程日誌能夠看出服務堵塞在獲取redis鏈接處.redis

分析:spring

  • 代碼配置中redis最大鏈接爲3000
  • redis配置中session_max_timeout爲0,即永不斷開鏈接

一次修改分析

從以上兩點分析得出,redis鏈接被耗盡,因而查找代碼得知因爲重寫spring-data-redis中的hscan方面致使,代碼以下:apache

RedisConnection rc = redisTemplate.getConnectionFactory().getConnection();
if (rc instanceof JedisConnection) {
    JedisConnection JedisConnection = (JedisConnection) rc;
    return new ConvertingCursor<Map.Entry<byte[], byte[]>, Map.Entry<String, String>>(
            JedisConnection.hScan(rawValue(key), cursor, scanOptions),
            new Converter<Map.Entry<byte[], byte[]>, Map.Entry<String, String>>() {

            @Override
            public Entry<String, String> convert(final Entry<byte[], byte[]> source) {

                return new Map.Entry<String, String>() {

                @Override
                public String getKey() {
                    return hashKeySerializer.deserialize(source.getKey());
                }

                @Override
                public String getValue() {
                    return hashValueSerializer.deserialize(source.getValue());
                }

                @Override
                public String setValue(String value) {
                    throw new UnsupportedOperationException(
                        "Values cannot be set when scanning through entries.");
                }
            };

        }
    });
} else {
    return hashOps.scan(key, scanOptions);
}

上述代碼返回ConvertingCursor後未釋放鏈接,導出鏈接被佔滿。session

二次修改分析

因而修改代碼爲正常釋放鏈接數據結構

try {
    ...
}finally {
    RedisConnectionUtils.releaseConnection(rc, factory);
}

代碼通過上線,再次跑程序查看線上日誌發現報了大量的Connection time out.多線程

因而博主就思考是否是因爲重寫代碼不對,嘗試使用spring-data-redis的原生代碼,即直接調用hashOps.scan(key, scanOptions)方法,再次上線。架構

上線後觀察日誌:發現此次不是報Connection time out,日誌中大量報Unknown reply:錯誤。app

分析以下:

因爲代碼是在多線程環境下運行,有幾百個線程去調用hscan操做,spring-data-redis原生的代碼執行完一次hscan操做後就會關閉鏈接並返回一個迭代器Cursor,可是遍歷Cursor時在本次count後會再次根據遊標從新使用該鏈接進行查詢,但是鏈接卻已經被關閉,這時會使用新的鏈接是能夠正常迭代的,可是一旦複用到其餘線程使用的鏈接則會致使報錯Unknown reply.

三次修改分析

通過思考後得出結論,redis在執行scan操做時一旦鏈接被釋放,那麼scan操做將不會進行下去,則報Connection time out.

查閱官方文檔得出結論,redis的scan操做須要full iteration,即最優方式是一個鏈接將以此scan任務執行徹底後釋放該鏈接。

redis-scan-doc

修改代碼以下:

RedisConnectionFactory factory = redisTemplate.getConnectionFactory();
RedisConnection rc = factory.getConnection();
if (rc instanceof JedisConnection) {
    JedisConnection JedisConnection = (JedisConnection) rc;
    Cursor<Map.Entry<String, String>> cursorResult = new ConvertingCursor<Map.Entry<byte[], byte[]>, Map.Entry<String, String>>(
            JedisConnection.hScan(rawValue(key), cursor, scanOptions),
            new Converter<Map.Entry<byte[], byte[]>, Map.Entry<String, String>>() {
            ...
            });
return new ScanResult<Map.Entry<String, String>>(cursorResult, factory, rc);}


public void releaseConnection() throws IOException{
    IOException ex = null;
    if(cursor != null) {
        try {
            cursor.close();
        } catch (IOException e) {
            ex = e;
        }
    }
    try {
        RedisConnectionUtils.releaseConnection(rc, factory);
    } catch (Exception e) {

    }
    if(ex != null) {
        throw ex;
    }
}

將鏈接返回給業務代碼,並在業務代碼執行完畢後將鏈接釋放,問題解決。

總結

  1. 鏈接一旦開啓就必須釋放,不然形成內存泄漏或服務堵塞不可用
  2. 重寫代碼時須要謹記仔細查閱官方文檔給出的方案並實施
  3. 多線程下使用redis的scan操做須要使用一個鏈接遍歷完Cursor,而不能複用鏈接,不然致使報錯Unknown reply.
相關文章
相關標籤/搜索