摘要:本文經過對Redis Sentinel源碼的理解,詳細說明Sentinel的代碼實現方式。
Redis Sentinel 是Redis提供的高可用模型解決方案。Sentinel能夠自動監測一個或多個Redis主備實例,並在主實例宕機的狀況下自動實行主備倒換。本文經過對Redis Sentinel源碼的理解,詳細說明Sentinel的代碼實現方式。node
Sentinel使用Redis內核相同的事件驅動代碼框架, 但Sentinel有本身獨特的初始化步驟。在這篇文章裏,會從Sentinel的初始化、Sentinel主時間事件函數、Sentinel 網絡鏈接和Tilt模式三部分進行講解。git
咱們能夠經過redis-sentinel <path-to-configfile> 或者 redis-server <path-to-configfile> --sentinel 這兩種方式啓動並運行Sentinel實例,這兩種方式是等價的。在Redis server.c 的main函數中,咱們會看到Redis如何判斷用戶指定以Sentinel方式運行的邏輯:github
int main(int argc, char **argv) { .......... server.sentinel_mode = checkForSentinelMode(argc,argv); .......... }
- 程序使用redis-sentinel可執行文件執行。
- 程序參數列表中有--sentinel 標誌。
/* Returns 1 if there is --sentinel among the arguments or if * argv[0] contains "redis-sentinel". */ int checkForSentinelMode(int argc, char **argv) { int j; if (strstr(argv[0],"redis-sentinel") != NULL) return 1; for (j = 1; j < argc; j++) if (!strcmp(argv[j],"--sentinel")) return 1; return 0; }
在Redis 判斷是否以Sentinel的方式運行之後,咱們會看到以下代碼段:網絡
int main(int argc, char **argv) { struct timeval tv; int j; ............ /* We need to init sentinel right now as parsing the configuration file * in sentinel mode will have the effect of populating the sentinel * data structures with master nodes to monitor. */ if (server.sentinel_mode) { initSentinelConfig(); initSentinel(); } ............
/* This function overwrites a few normal Redis config default with Sentinel * specific defaults. */ void initSentinelConfig(void) { server.port = REDIS_SENTINEL_PORT; server.protected_mode = 0; /* Sentinel must be exposed. */ }
/* Perform the Sentinel mode initialization. */ void initSentinel(void) { unsigned int j; /* Remove usual Redis commands from the command table, then just add * the SENTINEL command. */ dictEmpty(server.commands,NULL); for (j = 0; j < sizeof(sentinelcmds)/sizeof(sentinelcmds[0]); j++) { int retval; struct redisCommand *cmd = sentinelcmds+j; retval = dictAdd(server.commands, sdsnew(cmd->name), cmd); serverAssert(retval == DICT_OK); /* Translate the command string flags description into an actual * set of flags. */ if (populateCommandTableParseFlags(cmd,cmd->sflags) == C_ERR) serverPanic("Unsupported command flag"); } /* Initialize various data structures. */ sentinel.current_epoch = 0; sentinel.masters = dictCreate(&instancesDictType,NULL); sentinel.tilt = 0; sentinel.tilt_start_time = 0; sentinel.previous_time = mstime(); ............. }
一、使用Sentinel自帶的命令表去替代Redis服務器原生的命令. Sentinel 支持的命令表以下:less
struct redisCommand sentinelcmds[] = { {"ping",pingCommand,1,"",0,NULL,0,0,0,0,0}, {"sentinel",sentinelCommand,-2,"",0,NULL,0,0,0,0,0}, {"subscribe",subscribeCommand,-2,"",0,NULL,0,0,0,0,0}, {"unsubscribe",unsubscribeCommand,-1,"",0,NULL,0,0,0,0,0}, {"psubscribe",psubscribeCommand,-2,"",0,NULL,0,0,0,0,0}, {"punsubscribe",punsubscribeCommand,-1,"",0,NULL,0,0,0,0,0}, {"publish",sentinelPublishCommand,3,"",0,NULL,0,0,0,0,0}, {"info",sentinelInfoCommand,-1,"",0,NULL,0,0,0,0,0}, {"role",sentinelRoleCommand,1,"ok-loading",0,NULL,0,0,0,0,0}, {"client",clientCommand,-2,"read-only no-script",0,NULL,0,0,0,0,0}, {"shutdown",shutdownCommand,-1,"",0,NULL,0,0,0,0,0}, {"auth",authCommand,2,"no-auth no-script ok-loading ok-stale fast",0,NULL,0,0,0,0,0}, {"hello",helloCommand,-2,"no-auth no-script fast",0,NULL,0,0,0,0,0} };
/* Main state. */ struct sentinelState { char myid[CONFIG_RUN_ID_SIZE+1]; /* This sentinel ID. */ uint64_t current_epoch; /* Current epoch. */ dict *masters; /* Dictionary of master sentinelRedisInstances. Key is the instance name, value is the sentinelRedisInstance structure pointer. */ int tilt; /* Are we in TILT mode? */ int running_scripts; /* Number of scripts in execution right now. */ mstime_t tilt_start_time; /* When TITL started. */ mstime_t previous_time; /* Last time we ran the time handler. */ list *scripts_queue; /* Queue of user scripts to execute. */ char *announce_ip; /* IP addr that is gossiped to other sentinels if not NULL. */ int announce_port; /* Port that is gossiped to other sentinels if non zero. */ unsigned long simfailure_flags; /* Failures simulation. */ int deny_scripts_reconfig; /* Allow SENTINEL SET ... to change script paths at runtime? */ } sentinel;
在讀取配置信息後,Redis服務器主函數會調用sentinelIsRunning函數, 作如下幾個工做:
- 檢查配置文件是否被設置,而且檢查程序對配置文件是否有寫權限,由於若是Sentinel狀態改變的話,會不斷將本身當前狀態記錄在配置文件中。
- 若是在配置文件中指定運行ID,Sentinel 會使用這個ID做爲運行ID,相反地,若是沒有指定運行ID,Sentinel會生成一個ID用來做爲Sentinel的運行ID。
- 對全部的Sentinel監測實例產生初始監測事件。
/* This function gets called when the server is in Sentinel mode, started, * loaded the configuration, and is ready for normal operations. */ void sentinelIsRunning(void) { int j; if (server.configfile == NULL) { serverLog(LL_WARNING, "Sentinel started without a config file. Exiting..."); exit(1); } else if (access(server.configfile,W_OK) == -1) { serverLog(LL_WARNING, "Sentinel config file %s is not writable: %s. Exiting...", server.configfile,strerror(errno)); exit(1); } /* If this Sentinel has yet no ID set in the configuration file, we * pick a random one and persist the config on disk. From now on this * will be this Sentinel ID across restarts. */ for (j = 0; j < CONFIG_RUN_ID_SIZE; j++) if (sentinel.myid[j] != 0) break; if (j == CONFIG_RUN_ID_SIZE) { /* Pick ID and persist the config. */ getRandomHexChars(sentinel.myid,CONFIG_RUN_ID_SIZE); sentinelFlushConfig(); } /* Log its ID to make debugging of issues simpler. */ serverLog(LL_WARNING,"Sentinel ID is %s", sentinel.myid); /* We want to generate a +monitor event for every configured master * at startup. */ sentinelGenerateInitialMonitorEvents(); }
Sentinel 使用和Redis服務器相同的事件處理機制:分爲文件事件和時間事件。文件事件處理機制使用I/O 多路複用來處理服務器端的網絡I/O 請求,例如客戶端鏈接,讀寫等操做。時間處理機制則在主循環中週期性調用時間函數來處理定時操做,例如服務器端的維護,定時更新,刪除等操做。Redis服務器主時間函數是在server.c中定義的serverCron函數,在默認狀況下,serverCron會每100ms被調用一次。在這個函數中,咱們看到以下代碼:
int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData) { int j; UNUSED(eventLoop); UNUSED(id); UNUSED(clientData); ........... /* Run the Sentinel timer if we are in sentinel mode. */ if (server.sentinel_mode) sentinelTimer(); ........... }
void sentinelTimer(void) { sentinelCheckTiltCondition(); sentinelHandleDictOfRedisInstances(sentinel.masters); sentinelRunPendingScripts(); sentinelCollectTerminatedScripts(); sentinelKillTimedoutScripts(); /* We continuously change the frequency of the Redis "timer interrupt" * in order to desynchronize every Sentinel from every other. * This non-determinism avoids that Sentinels started at the same time * exactly continue to stay synchronized asking to be voted at the * same time again and again (resulting in nobody likely winning the * election because of split brain voting). */ server.hz = CONFIG_DEFAULT_HZ + rand() % CONFIG_DEFAULT_HZ; }
Sentinel Timer函數會作以下幾個操做:
- 檢查Sentinel當前是否在Tilt 模式(Tilt模式將會在稍後章節介紹)。
- 檢查Sentinel與其監控主備實例,以及其餘Sentinel實例的鏈接,更新當前狀態,並在主實例下線的時候自動作主備倒換操做。
- 檢查回調腳本狀態,並作相應操做。
- 更新服務器頻率(調用serverCron函數的頻率),加上一個隨機因子,做用是防止監控相同主節點的Sentinel在選舉Leader的時候時間衝突,致使選舉沒法產生絕對多的票數。
/* Perform scheduled operations for all the instances in the dictionary. * Recursively call the function against dictionaries of slaves. */ void sentinelHandleDictOfRedisInstances(dict *instances) { dictIterator *di; dictEntry *de; sentinelRedisInstance *switch_to_promoted = NULL; /* There are a number of things we need to perform against every master. */ di = dictGetIterator(instances); while((de = dictNext(di)) != NULL) { sentinelRedisInstance *ri = dictGetVal(de); sentinelHandleRedisInstance(ri); if (ri->flags & SRI_MASTER) { sentinelHandleDictOfRedisInstances(ri->slaves); sentinelHandleDictOfRedisInstances(ri->sentinels); if (ri->failover_state == SENTINEL_FAILOVER_STATE_UPDATE_CONFIG) { switch_to_promoted = ri; } } } if (switch_to_promoted) sentinelFailoverSwitchToPromotedSlave(switch_to_promoted); dictReleaseIterator(di); }
調用sentinelHandleDictOfRedisInstance函數處理Sentinel與其它特定實例鏈接,狀態更 新,以及主備倒換工做。
- 若是當前處理實例爲主實例,遞歸調用SentinelHandleDictOfRedisInstances函數處理其下屬的從實例以及其餘監控這個主實例的Sentinel。
- 在主備倒換成功的狀況下,更新主實例爲升級爲主實例的從實例。
/* Perform scheduled operations for the specified Redis instance. */ void sentinelHandleRedisInstance(sentinelRedisInstance *ri) { /* ========== MONITORING HALF ============ */ /* Every kind of instance */ sentinelReconnectInstance(ri); sentinelSendPeriodicCommands(ri); /* ============== ACTING HALF ============= */ /* We don't proceed with the acting half if we are in TILT mode. * TILT happens when we find something odd with the time, like a * sudden change in the clock. */ if (sentinel.tilt) { if (mstime()-sentinel.tilt_start_time < SENTINEL_TILT_PERIOD) return; sentinel.tilt = 0; sentinelEvent(LL_WARNING,"-tilt",NULL,"#tilt mode exited"); } /* Every kind of instance */ sentinelCheckSubjectivelyDown(ri); /* Masters and slaves */ if (ri->flags & (SRI_MASTER|SRI_SLAVE)) { /* Nothing so far. */ } /* Only masters */ if (ri->flags & SRI_MASTER) { sentinelCheckObjectivelyDown(ri); if (sentinelStartFailoverIfNeeded(ri)) sentinelAskMasterStateToOtherSentinels(ri,SENTINEL_ASK_FORCED); sentinelFailoverStateMachine(ri); sentinelAskMasterStateToOtherSentinels(ri,SENTINEL_NO_FLAGS); } }
一、檢查Sentinel和其餘實例(主備實例以及其餘Sentinel)的鏈接,若是鏈接沒有設置或已經斷開鏈接,Sentinel會重試相對應的鏈接,並定時發送響應命令。 須要注意的是:Sentinel和每一個主備實例都有兩個鏈接,命令鏈接和發佈訂閱鏈接。可是與其餘監聽相同主備實例的Sentinel只保留命令鏈接,這部分細節會在網絡章節單獨介紹。
sentinelCheckSubjectivelyDown 函數會監測特定的Redis實例(主備實例以及其餘Sentinel)是否處於主觀下線狀態,這部分函數代碼以下:
/* Is this instance down from our point of view? */ void sentinelCheckSubjectivelyDown(sentinelRedisInstance *ri) { mstime_t elapsed = 0; if (ri->link->act_ping_time) elapsed = mstime() - ri->link->act_ping_time; else if (ri->link->disconnected) elapsed = mstime() - ri->link->last_avail_time; ....... /* Update the SDOWN flag. We believe the instance is SDOWN if: * * 1) It is not replying. * 2) We believe it is a master, it reports to be a slave for enough time * to meet the down_after_period, plus enough time to get two times * INFO report from the instance. */ if (elapsed > ri->down_after_period || (ri->flags & SRI_MASTER && ri->role_reported == SRI_SLAVE && mstime() - ri->role_reported_time > (ri->down_after_period+SENTINEL_INFO_PERIOD*2))) { /* Is subjectively down */ if ((ri->flags & SRI_S_DOWN) == 0) { sentinelEvent(LL_WARNING,"+sdown",ri,"%@"); ri->s_down_since_time = mstime(); ri->flags |= SRI_S_DOWN; } } else { /* Is subjectively up */ if (ri->flags & SRI_S_DOWN) { sentinelEvent(LL_WARNING,"-sdown",ri,"%@"); ri->flags &= ~(SRI_S_DOWN|SRI_SCRIPT_KILL_SENT); } } }
- 在實例配置的down_after_milliseconds時間內沒有收到Ping的回覆。
- Sentinel認爲實例是主實例,但收到實例爲從實例的回覆,而且上次實例角色回覆時間大於在實例配置的down_after_millisecon時間加上2倍INFO命令間隔。
sentinelCheckObjectivelyDown 函數會檢查實例是否爲客觀下線狀態,這個操做僅僅對主實例進行。sentinelCheckObjectivelyDown函數定義以下:
/* Is this instance down according to the configured quorum? * * Note that ODOWN is a weak quorum, it only means that enough Sentinels * reported in a given time range that the instance was not reachable. * However messages can be delayed so there are no strong guarantees about * N instances agreeing at the same time about the down state. */ void sentinelCheckObjectivelyDown(sentinelRedisInstance *master) { dictIterator *di; dictEntry *de; unsigned int quorum = 0, odown = 0; if (master->flags & SRI_S_DOWN) { /* Is down for enough sentinels? */ quorum = 1; /* the current sentinel. */ /* Count all the other sentinels. */ di = dictGetIterator(master->sentinels); while((de = dictNext(di)) != NULL) { sentinelRedisInstance *ri = dictGetVal(de); if (ri->flags & SRI_MASTER_DOWN) quorum++; } dictReleaseIterator(di); if (quorum >= master->quorum) odown = 1; } /* Set the flag accordingly to the outcome. */ if (odown) { if ((master->flags & SRI_O_DOWN) == 0) { sentinelEvent(LL_WARNING,"+odown",master,"%@ #quorum %d/%d", quorum, master->quorum); master->flags |= SRI_O_DOWN; master->o_down_since_time = mstime(); } } else { if (master->flags & SRI_O_DOWN) { sentinelEvent(LL_WARNING,"-odown",master,"%@"); master->flags &= ~SRI_O_DOWN; } } }
這個函數主要進行的操做是循環查看監控此主實例的其餘Sentinel SRI_MASTER_DOWN 標誌是否打開,若是打開則意味着其餘特定的Sentinel認爲主實例處於下線狀態,並統計認爲主實例處於下線狀態的票數,若是票數大於等於主實例配置的quorum值,則Sentinel會把主實例的SRI_O_DOWN標誌打開,並認爲主實例處於客觀下線狀態。
int sentinelStartFailoverIfNeeded(sentinelRedisInstance *master) { /* We can't failover if the master is not in O_DOWN state. */ if (!(master->flags & SRI_O_DOWN)) return 0; /* Failover already in progress? */ if (master->flags & SRI_FAILOVER_IN_PROGRESS) return 0; /* Last failover attempt started too little time ago? */ if (mstime() - master->failover_start_time < master->failover_timeout*2) { if (master->failover_delay_logged != master->failover_start_time) { time_t clock = (master->failover_start_time + master->failover_timeout*2) / 1000; char ctimebuf[26]; ctime_r(&clock,ctimebuf); ctimebuf[24] = '\0'; /* Remove newline. */ master->failover_delay_logged = master->failover_start_time; serverLog(LL_WARNING, "Next failover delay: I will not start a failover before %s", ctimebuf); } return 0; } sentinelStartFailover(master); return 1; }
上文提到每一個Sentinel實例會維護與所監測的主從實例的兩個鏈接,分別是命令鏈接(Command Connection)和發佈訂閱鏈接(Pub/Sub Connection)。可是須要注意的是,Sentinel和其餘Sentinel之間只有一個命令鏈接。下面將分別介紹命令鏈接和發佈訂閱鏈接的做用。
- Sentinel會默認以每1s間隔發送PING 命令給其餘實例以主觀判斷其餘實例是否下線。
- Sentinel會經過Sentinel和主實例之間的命令鏈接每隔10s發送INFO命令給主從實例以獲得主實例和從實例的最新信息。
- 在主實例下線的狀況下,Sentinel會經過Sentinel和從實例的命令鏈接發送SLAVEOF NO ONE命令給選定的從實例從而使從實例提高爲新的主節點。
- Sentinel會默認每隔1s發送is-master-down-by-addr命令以詢問其餘Sentinel節點關於監控的主節點是否下線。
/* Commands connection. */ if (link->cc == NULL) { link->cc = redisAsyncConnectBind(ri->addr->ip,ri->addr->port,NET_FIRST_BIND_ADDR); if (!link->cc->err && server.tls_replication && (instanceLinkNegotiateTLS(link->cc) == C_ERR)) { sentinelEvent(LL_DEBUG,"-cmd-link-reconnection",ri,"%@ #Failed to initialize TLS"); instanceLinkCloseConnection(link,link->cc); } else if (link->cc->err) { sentinelEvent(LL_DEBUG,"-cmd-link-reconnection",ri,"%@ #%s", link->cc->errstr); instanceLinkCloseConnection(link,link->cc); } else { link->pending_commands = 0; link->cc_conn_time = mstime(); link->cc->data = link; redisAeAttach(server.el,link->cc); redisAsyncSetConnectCallback(link->cc, sentinelLinkEstablishedCallback); redisAsyncSetDisconnectCallback(link->cc, sentinelDisconnectCallback); sentinelSendAuthIfNeeded(ri,link->cc); sentinelSetClientName(ri,link->cc,"cmd"); /* Send a PING ASAP when reconnecting. */ sentinelSendPing(ri); } }
__sentinel_:hello <sentinel地址> <sentinel端口號> <sentinel運行id> <sentinel配置紀元> <主節點名字 > <主節點地址> <主節點端口號> <主節點配置紀元>
/* Pub / Sub */ if ((ri->flags & (SRI_MASTER|SRI_SLAVE)) && link->pc == NULL) { link->pc = redisAsyncConnectBind(ri->addr->ip,ri->addr->port,NET_FIRST_BIND_ADDR); if (!link->pc->err && server.tls_replication && (instanceLinkNegotiateTLS(link->pc) == C_ERR)) { sentinelEvent(LL_DEBUG,"-pubsub-link-reconnection",ri,"%@ #Failed to initialize TLS"); } else if (link->pc->err) { sentinelEvent(LL_DEBUG,"-pubsub-link-reconnection",ri,"%@ #%s", link->pc->errstr); instanceLinkCloseConnection(link,link->pc); } else { int retval; link->pc_conn_time = mstime(); link->pc->data = link; redisAeAttach(server.el,link->pc); redisAsyncSetConnectCallback(link->pc, sentinelLinkEstablishedCallback); redisAsyncSetDisconnectCallback(link->pc, sentinelDisconnectCallback); sentinelSendAuthIfNeeded(ri,link->pc); sentinelSetClientName(ri,link->pc,"pubsub"); /* Now we subscribe to the Sentinels "Hello" channel. */ retval = redisAsyncCommand(link->pc, sentinelReceiveHelloMessages, ri, "%s %s", sentinelInstanceMapCommand(ri,"SUBSCRIBE"), SENTINEL_HELLO_CHANNEL); if (retval != C_OK) { /* If we can't subscribe, the Pub/Sub connection is useless * and we can simply disconnect it and try again. */ instanceLinkCloseConnection(link,link->pc); return; } } }
is-master-down-by-addr 命令
is-master-down-by-addr: <主實例地址> <主實例端口號> <當前配置紀元> <運行ID>
若是不是在選舉領頭Sentinel過程當中, <runid>項總爲*,相反地,若是在Sentinel向其餘Sentinel發送投票請求狀況下,<runid>項爲本身的運行id。這部分代碼以下:
if ((master->flags & SRI_S_DOWN) == 0) continue; if (ri->link->disconnected) continue; if (!(flags & SENTINEL_ASK_FORCED) && mstime() - ri->last_master_down_reply_time < SENTINEL_ASK_PERIOD) continue; /* Ask */ ll2string(port,sizeof(port),master->addr->port); retval = redisAsyncCommand(ri->link->cc, sentinelReceiveIsMasterDownReply, ri, "%s is-master-down-by-addr %s %s %llu %s", sentinelInstanceMapCommand(ri,"SENTINEL"), master->addr->ip, port, sentinel.current_epoch, (master->failover_state > SENTINEL_FAILOVER_STATE_NONE) ? sentinel.myid : "*"); if (retval == C_OK) ri->link->pending_commands++;
- <主節點下線狀態>
- <領頭Sentinel運行ID >
- <領頭Sentinel配置紀元>
/* Ignore every error or unexpected reply. * Note that if the command returns an error for any reason we'll * end clearing the SRI_MASTER_DOWN flag for timeout anyway. */ if (r->type == REDIS_REPLY_ARRAY && r->elements == 3 && r->element[0]->type == REDIS_REPLY_INTEGER && r->element[1]->type == REDIS_REPLY_STRING && r->element[2]->type == REDIS_REPLY_INTEGER) { ri->last_master_down_reply_time = mstime(); if (r->element[0]->integer == 1) { ri->flags |= SRI_MASTER_DOWN; } else { ri->flags &= ~SRI_MASTER_DOWN; } if (strcmp(r->element[1]->str,"*")) { /* If the runid in the reply is not "*" the Sentinel actually * replied with a vote. */ sdsfree(ri->leader); if ((long long)ri->leader_epoch != r->element[2]->integer) { serverLog(LL_WARNING, "%s voted for %s %llu", ri->name, r->element[1]->str, (unsigned long long) r->element[2]->integer); } ri->leader = sdsnew(r->element[1]->str); ri->leader_epoch = r->element[2]->integer; } }
- Sentinel進程被阻塞超過SENTINEL_TILT_TRIGGER時間(默認爲2s),可能由於進程或系統I/O(內存,網絡,存儲)請求過多。
- 系統時鐘調整到以前某個時間值。
Tilt模式是一種保護機制,處於該模式下Sentinel除了發送必要的PING及INFO命令外,不會主動作其餘操做,例如主備倒換,標誌主觀、客觀下線等。但能夠經過INFO 命令及發佈訂閱鏈接的HELLO消息包來獲取外界信息並對自身結構進行更新,直到SENTINEL_TILT_PERIOD時長(默認爲30s)結束爲止,咱們能夠認爲Tilt模式是Sentinel的被動模式。
void sentinelCheckTiltCondition(void) { mstime_t now = mstime(); mstime_t delta = now - sentinel.previous_time; if (delta < 0 || delta > SENTINEL_TILT_TRIGGER) { sentinel.tilt = 1; sentinel.tilt_start_time = mstime(); sentinelEvent(LL_WARNING,"+tilt",NULL,"#tilt mode entered"); } sentinel.previous_time = mstime(); }
- https://github.com/antirez/redis
- https://redis.io/topics/sentinel
- Redis設計與實現第二版 黃健宏著
本文分享自華爲雲社區《Redis Sentinel 源碼分析》,原文做者:中間件小哥。