解碼Redis最易被忽視的CPU和內存佔用高問題

時間 2019-11-30

標籤解碼 redis 易被忽視 cpu 內存佔用問題欄目 Redis 简体版

原文原文鏈接

做者介紹
node

張鵬義，騰訊雲數據庫高級工程師，曾參與華爲Taurus分佈式數據研發及騰訊CynosDB for PG研發工做，現從事騰訊雲Redis數據庫研發工做。git

咱們在使用Redis時，總會碰到一些redis-server端CPU及內存佔用比較高的問題。下面以幾個實際案例爲例，來討論一下在使用Redis時容易忽視的幾種情形。github

1、短鏈接致使CPU高

某用戶反映QPS不高，從監控看CPU確實偏高。既然QPS不高，那麼redis-server自身極可能在作某些清理工做或者用戶在執行復雜度較高的命令，經排查無沒有進行key過時刪除操做，沒有執行復雜度高的命令。golang

上機器對redis-server進行perf分析，發現函數listSearchKey佔用CPU比較高，分析調用棧發如今釋放鏈接時會頻繁調用listSearchKey，且用戶反饋說是使用的短鏈接，因此推斷是頻繁釋放鏈接致使CPU佔用有所升高。redis

一、對比實例

下面使用redis-benchmark工具分別使用長鏈接和短鏈接作一個對比實驗，redis-server爲社區版4.0.10。數據庫

1）長鏈接測試緩存

使用10000個長鏈接向redis-server發送50w次ping命令：網絡

./redis-benchmark -h host -p port -t ping -c 10000 -n 500000 -k 1（k=1表示使用長鏈接，k=0表示使用短鏈接)app

最終QPS：socket

PING_INLINE: 92902.27 requests per second

PING_BULK: 93580.38 requests per second

對redis-server分析，發現佔用CPU最高的是readQueryFromClient，即主要是在處理來自用戶端的請求。

2）短鏈接測試

使用10000個短鏈接向redis-server發送50w次ping命令：

./redis-benchmark -h host -p port -t ping -c 10000 -n 500000 -k 0

最終QPS：

PING_INLINE: 15187.18 requests per second

PING_BULK: 16471.75 requests per second

對redis-server分析，發現佔用CPU最高的確實是listSearchKey，而readQueryFromClient所佔CPU的比例比listSearchKey要低得多，也就是說CPU有點「遊手好閒」了，處理用戶請求變成了副業，而搜索list卻成爲了主業。因此在一樣的業務請求量下，使用短鏈接會增長CPU的負擔。

從QPS上看，短鏈接與長鏈接差距比較大，緣由來自兩方面：

每次從新建鏈接引入的網絡開銷。
釋放鏈接時，redis-server需消耗額外的CPU週期作清理工做。（這一點能夠嘗試從redis-server端作優化）

二、Redis鏈接釋放

咱們從代碼層面來看下redis-server在用戶端發起鏈接釋放後都會作哪些事情，redis-server在收到用戶端的斷連請求時會直接進入到freeClient。

void freeClient(client *c) {

listNode *ln;

/* .........*/

/* Free the query buffer */

sdsfree(c->querybuf);

sdsfree(c->pending_querybuf);

c->querybuf = NULL;

/* Deallocate structures used to block on blocking ops. */

if (c->flags & CLIENT_BLOCKED) unblockClient(c);

dictRelease(c->bpop.keys);

/* UNWATCH all the keys */

unwatchAllKeys(c);

listRelease(c->watched_keys);

/* Unsubscribe from all the pubsub channels */

pubsubUnsubscribeAllChannels(c,0);

pubsubUnsubscribeAllPatterns(c,0);

dictRelease(c->pubsub_channels);

listRelease(c->pubsub_patterns);

/* Free data structures. */

listRelease(c->reply);

freeClientArgv(c);

/* Unlink the client: this will close the socket, remove the I/O

* handlers, and remove references of the client from different

* places where active clients may be referenced. */

/* redis-server維護了一個server.clients鏈表，當用戶端創建鏈接後，新建一個client對象並追加到server.clients上，

當鏈接釋放時，需求從server.clients上刪除client對象 */

unlinkClient(c);

/* ...........*/

}

void unlinkClient(client *c) {

listNode *ln;

/* If this is marked as current client unset it. */

if (server.current_client == c) server.current_client = NULL;

/* Certain operations must be done only if the client has an active socket.

* If the client was already unlinked or if it's a "fake client" the

* fd is already set to -1. */

if (c->fd != -1) {

/* 搜索server.clients鏈表，而後刪除client節點對象，這裏複雜爲O(N) */

ln = listSearchKey(server.clients,c);

serverAssert(ln != NULL);

listDelNode(server.clients,ln);

/* Unregister async I/O handlers and close the socket. */

aeDeleteFileEvent(server.el,c->fd,AE_READABLE);

aeDeleteFileEvent(server.el,c->fd,AE_WRITABLE);

close(c->fd);

c->fd = -1;

}

/* ......... */

因此在每次鏈接斷開時，都存在一個O(N)的運算。對於redis這樣的內存數據庫，咱們應該儘可能避開O(N)運算，特別是在鏈接數比較大的場景下，對性能影響比較明顯。雖然用戶只要不使用短鏈接就能避免，但在實際的場景中，用戶端鏈接池被打滿後，用戶也可能會創建一些短鏈接。

三、優化

從上面的分析看，每次鏈接釋放時都會進行O(N)的運算，那能不能降複雜度降到O(1)呢？

這個問題很是簡單，server.clients是個雙向鏈表，只要當client對象在建立時記住本身的內存地址，釋放時就不須要遍歷server.clients。接下來嘗試優化下：

client *createClient(int fd) {

client *c = zmalloc(sizeof(client));

/* ........ */

listSetFreeMethod(c->pubsub_patterns,decrRefCountVoid);

listSetMatchMethod(c->pubsub_patterns,listMatchObjects);

if (fd != -1) {

/* client記錄自身所在list的listNode地址 */

c->client_list_node = listAddNodeTailEx(server.clients,c);

}

initClientMultiState(c);

return c;

}

void unlinkClient(client *c) {

listNode *ln;

/* If this is marked as current client unset it. */

if (server.current_client == c) server.current_client = NULL;

/* Certain operations must be done only if the client has an active socket.

* If the client was already unlinked or if it's a "fake client" the

* fd is already set to -1. */

if (c->fd != -1) {

/* 這時再也不需求搜索server.clients鏈表 */

//ln = listSearchKey(server.clients,c);

//serverAssert(ln != NULL);

//listDelNode(server.clients,ln);

listDelNode(server.clients, c->client_list_node);

/* Unregister async I/O handlers and close the socket. */

aeDeleteFileEvent(server.el,c->fd,AE_READABLE);

aeDeleteFileEvent(server.el,c->fd,AE_WRITABLE);

close(c->fd);

c->fd = -1;

}

/* ......... */

優化後短鏈接測試

使用10000個短鏈接向redis-server發送50w次ping命令：

./redis-benchmark -h host -p port -t ping -c 10000 -n 500000 -k 0

最終QPS：

PING_INLINE: 21884.23 requests per second

PING_BULK: 21454.62 requests per second

與優化前相比，短鏈接性能可以提高30+%，因此可以保證存在短鏈接的狀況下，性能不至於太差。

2、info命令致使CPU高

有用戶經過按期執行info命令監視redis的狀態，這會在必定程度上致使CPU佔用偏高。頻繁執行info時經過perf分析發現getClientsMaxBuffers、getClientOutputBufferMemoryUsage及getMemoryOverheadData這幾個函數佔用CPU比較高。

經過Info命令，能夠拉取到redis-server端的以下一些狀態信息（未列全）：

client

connected_clients:1

client_longest_output_list:0 // redis-server端最長的outputbuffer列表長度

client_biggest_input_buf:0. // redis-server端最長的inputbuffer字節長度

blocked_clients:0

Memory

used_memory:848392

used_memory_human:828.51K

used_memory_rss:3620864

used_memory_rss_human:3.45M

used_memory_peak:619108296

used_memory_peak_human:590.43M

used_memory_peak_perc:0.14%

used_memory_overhead:836182 // 除dataset外，redis-server爲維護自身結構所額外佔用的內存量

used_memory_startup:786552

used_memory_dataset:12210

used_memory_dataset_perc:19.74%

爲了獲得client_longest_output_list、client_longest_output_list狀態，須要遍歷redis-server端全部的client, 如getClientsMaxBuffers所示，可能看到這裏也是存在一樣的O(N)運算。

void getClientsMaxBuffers(unsigned long *longest_output_list,

unsigned long *biggest_input_buffer) {

client *c;

listNode *ln;

listIter li;

unsigned long lol = 0, bib = 0;

/* 遍歷全部client, 複雜度O(N) */

listRewind(server.clients,&li);

while ((ln = listNext(&li)) != NULL) {

c = listNodeValue(ln);

if (listLength(c->reply) > lol) lol = listLength(c->reply);

if (sdslen(c->querybuf) > bib) bib = sdslen(c->querybuf);

}

*longest_output_list = lol;

*biggest_input_buffer = bib;

}

爲了獲得used_memory_overhead狀態，一樣也須要遍歷全部client計算全部client的outputBuffer所佔用的內存總量，如getMemoryOverheadData所示：

struct redisMemOverhead *getMemoryOverheadData(void) {

/* ......... */

mem = 0;

if (server.repl_backlog)

mem += zmalloc_size(server.repl_backlog);

mh->repl_backlog = mem;

mem_total += mem;

/* ...............*/

mem = 0;

if (listLength(server.clients)) {

listIter li;

listNode *ln;

/* 遍歷全部的client, 計算全部client outputBuffer佔用的內存總和，複雜度爲O(N) */

listRewind(server.clients,&li);

while((ln = listNext(&li))) {

client *c = listNodeValue(ln);

if (c->flags & CLIENT_SLAVE)

continue;

mem += getClientOutputBufferMemoryUsage(c);

mem += sdsAllocSize(c->querybuf);

mem += sizeof(client);

}

mh->clients_normal = mem;

mem_total+=mem;

mem = 0;

if (server.aof_state != AOF_OFF) {

mem += sdslen(server.aof_buf);

mem += aofRewriteBufferSize();

}

mh->aof_buffer = mem;

mem_total+=mem;

/* ......... */

return mh;

}

實驗

從上面的分析知道，當鏈接數較高時（O(N)的N大），若是頻率執行info命令，會佔用較多CPU。

1）創建一個鏈接，不斷執行info命令

func main() {

c, err := redis.Dial("tcp", addr)

if err != nil {

fmt.Println("Connect to redis error:", err)

return

}

for {

c.Do("info")

}

return

}

實驗結果代表，CPU佔用僅爲20%左右。

2）創建9999個空閒鏈接，及一個鏈接不斷執行info

func main() {

clients := []redis.Conn{}

for i := 0; i < 9999; i++ {

c, err := redis.Dial("tcp", addr)

if err != nil {

fmt.Println("Connect to redis error:", err)

return

}

clients = append(clients, c)

}

c, err := redis.Dial("tcp", addr)

if err != nil {

fmt.Println("Connect to redis error:", err)

return

}

for {

_, err = c.Do("info")

if err != nil {

panic(err)

}

return

}

實驗結果代表CPU可以達到80%，因此在鏈接數較高時，儘可能避免使用info命令。

3）pipeline致使內存佔用高

有用戶發如今使用pipeline作只讀操做時，redis-server的內存容量偶爾也會出現明顯的上漲, 這是對pipeline的使不當形成的。下面先以一個簡單的例子來講明Redis的pipeline邏輯是怎樣的。

下面經過golang語言實現以pipeline的方式從redis-server端讀取key一、key二、key3。

import (

"fmt"

"github.com/garyburd/redigo/redis"

)

func main(){

c, err := redis.Dial("tcp", "127.0.0.1:6379")

if err != nil {

panic(err)

}

c.Send("get", "key1") //緩存到client端的buffer中

c.Send("get", "key2") //緩存到client端的buffer中

c.Send("get", "key3") //緩存到client端的buffer中

c.Flush() //將buffer中的內容以一特定的協議格式發送到redis-server端

fmt.Println(redis.String(c.Receive()))

}

而此時server端收到的內容爲：

*2 $3 get $4 key1 *2 $3 get $4 key2 *2 $3 get $4 key3

下面是一段redis-server端非正式的代碼處理邏輯，redis-server端從接收到的內容依次解析出命令、執行命令、將執行結果緩存到replyBuffer中，並將用戶端標記爲有內容須要寫出。等到下次事件調度時再將replyBuffer中的內容經過socket發送到client，因此並非處理完一條命令就將結果返回用戶端。

readQueryFromClient(client* c) {

read(c->querybuf) // c->query="*2 $3 get $4 key1 *2 $3 get $4 key2 *2 $3 get $4 key3 "

cmdsNum = parseCmdNum(c->querybuf) // cmdNum = 3

while(cmsNum--) {

cmd = parseCmd(c->querybuf) // cmd: get key一、get key二、get key3

reply = execCmd(cmd)

appendReplyBuffer(reply)

markClientPendingWrite(c)

}

考慮這樣一種狀況：

若是用戶端程序處理比較慢，未能及時經過c.Receive()從TCP的接收buffer中讀取內容或者由於某些BUG致使沒有執行c.Receive()，當接收buffer滿了後，server端的TCP滑動窗口爲0，致使server端沒法發送replyBuffer中的內容，因此replyBuffer因爲遲遲得不到釋放而佔用額外的內存。當pipeline一次打包的命令數太多，以及包含如mget、hgetall、lrange等操做多個對象的命令時，問題會更突出。

小結

上面幾種狀況，都是很是簡單的問題，沒有複雜的邏輯，在大部分場景下都不算問題，可是在一些極端場景下要把Redis用好，開發者仍是須要關注這些細節。建議：

儘可能不要使用短鏈接；
儘可能不要在鏈接數比較高的場景下頻繁使用info；
使用pipeline時，要及時接收請求處理結果，且pipeline不宜一次打包太多請求。

江湖召集令
9月27日-11月6日，騰訊雲數據庫王者挑戰賽（點擊查看詳情）等你挑戰！花幾分鐘參加比賽免費將☟☟抱回家！

MacBook/iPhone 11/AirPods

25臺Kindle

8萬元騰訊雲創業基金

MySQL之父 Michael Widenius 面對面交流

擁有與殿堂級大神的合影和親筆簽名書籍的你，
能讓隔壁碼農羨慕到流淚！

轉發下方海報參與活動能夠得到騰訊公仔和騰訊雲數據庫無門檻代金券，詳情請添加海報上機器人二維碼諮詢。

比賽詳情&報名入口
請掃下方二維碼

點擊享受Reids入門體驗