[轉]Redis cluster failover

時間 2019-11-06

原文原文鏈接

今天測試了redis cluster failover 功能，在切換過程當中很快，但在failover時有force 與takeover 之分node

[RHZYTEST_10:REDIS:6237:M ~]$r 127.0.0.1:6237> cluster nodes 13b6094babb9c16066a315a828434b3c6525f08b 192.168.1.91:6236 master - 0 1498035354251 3 connected 10923-16383 1f23e15fc418a157af3a8bb02616221f7c89be9c 192.168.1.91:6234 master - 0 1498035351241 5 connected 0-5460 6defcadf4122643b49d2fbf963e6d1fe75bc077f 192.168.1.91:6240 handshake - 1498035354852 0 0 disconnected 5b0b0b1b77426a27024c334aa52c964a61d5aee8 192.168.1.91:6238 slave 5caa45a805a9fefe2c8d41b549b15fd8568133ee 0 1498035356258 2 connected 9c21cb36a91f937185f51b91642360ff843db33b 192.168.1.91:6239 slave 13b6094babb9c16066a315a828434b3c6525f08b 0 1498035353248 3 connected 5caa45a805a9fefe2c8d41b549b15fd8568133ee 192.168.1.91:6235 master - 0 1498035355253 2 connected 5461-10922 edc4ba4257425393dc8d21680baab991b5e91241 192.168.1.91:6237 myself,slave 1f23e15fc418a157af3a8bb02616221f7c89be9c 0 0 4 connected 127.0.0.1:6237> cluster failover force OK 127.0.0.1:6237> 127.0.0.1:6237> cluster nodes 3559d4c6e79ba7fffa130831d8abbad1ee7c4beb 192.168.1.91:6240 handshake - 1498035440166 0 0 disconnected 13b6094babb9c16066a315a828434b3c6525f08b 192.168.1.91:6236 master - 0 1498035448608 3 connected 10923-16383 1f23e15fc418a157af3a8bb02616221f7c89be9c 192.168.1.91:6234 slave edc4ba4257425393dc8d21680baab991b5e91241 0 1498035451616 6 connected 5b0b0b1b77426a27024c334aa52c964a61d5aee8 192.168.1.91:6238 slave 5caa45a805a9fefe2c8d41b549b15fd8568133ee 0 1498035449611 2 connected 9c21cb36a91f937185f51b91642360ff843db33b 192.168.1.91:6239 slave 13b6094babb9c16066a315a828434b3c6525f08b 0 1498035447606 3 connected 5caa45a805a9fefe2c8d41b549b15fd8568133ee 192.168.1.91:6235 master - 0 1498035450615 2 connected 5461-10922 edc4ba4257425393dc8d21680baab991b5e91241 192.168.1.91:6237 myself,master - 0 0 6 connected 0-5460 127.0.0.1:6237>

failove 主要應用如下場景：redis

The slave tells the master to stop processing queries from clients.
The master replies to the slave with the current replication offset.
The slave waits for the replication offset to match on its side, to make sure it processed all the data from the master before it continues.
The slave starts a failover, obtains a new configuration epoch from the majority of the masters, and broadcasts the new configuration.
The old master receives the configuration update: unblocks its clients and starts replying with redirection messages so that they'll continue the chat with the new master.

他們之間有什麼區別呢?算法

FORCE option is given, the slave does not perform any handshake with the master, that may be not reachable, but instead just starts a failover ASAP starting from point 4. This is useful when we want to start a manual failover while the master is no longer reachable.bash

However using FORCE we still need the majority of masters to be available in order to authorize the failover and generate a new configuration epoch for the slave that is going to become master.app

There are situations where this is not enough, and we want a slave to failover without any agreement with the rest of the cluster. A real world use case for this is to mass promote slaves in a different data center to masters in order to perform a data center switch, while all the masters are down or partitioned away.less

The TAKEOVER option implies everything FORCE implies, but also does not uses any cluster authorization in order to failover. A slave receiving CLUSTER FAILOVER TAKEOVER will instead:分佈式

Generate a new configEpoch unilaterally, just taking the current greatest epoch available and incrementing it if its local configuration epoch is not already the greatest.
Assign itself all the hash slots of its master, and propagate the new configuration to every node which is reachable ASAP, and eventually to every other node.

Note that TAKEOVER violates the last-failover-wins principle of Redis Cluster, since the configuration epoch generated by the slave violates the normal generation of configuration epochs in several ways:ide

There is no guarantee that it is actually the higher configuration epoch, since, for example, we can use the TAKEOVER option within a minority, nor any message exchange is performed to generate the new configuration epoch.
If we generate a configuration epoch which happens to collide with another instance, eventually our configuration epoch, or the one of another instance with our same epoch, will be moved away using the configuration epoch collision resolution algorithm.

Because of this the TAKEOVER option should be used with care.函數

下面是分析cluster failover 選項takeover 與forcer的原碼測試

http://blog.csdn.net/gqtcgq/article/details/51830483

一：手動故障轉移

Redis集羣支持手動故障轉移。也就是向從節點發送」CLUSTER FAILOVER」命令，使其在主節點未下線的狀況下，發起故障轉移流程，升級爲新的主節點，而原來的主節點降級爲從節點。

爲了避免丟失數據，向從節點發送」CLUSTER FAILOVER」命令後，流程以下：

a：從節點收到命令後，向主節點發送CLUSTERMSG_TYPE_MFSTART包；

b：主節點收到該包後，會將其全部客戶端置於阻塞狀態，也就是在10s的時間內，再也不處理客戶端發來的命令；而且在其發送的心跳包中，會帶有CLUSTERMSG_FLAG0_PAUSED標記；

c：從節點收到主節點發來的，帶CLUSTERMSG_FLAG0_PAUSED標記的心跳包後，從中獲取主節點當前的複製偏移量。從節點等到本身的複製偏移量達到該值後，纔會開始執行故障轉移流程：發起選舉、統計選票、贏得選舉、升級爲主節點並更新配置；

」CLUSTER FAILOVER」命令支持兩個選項：FORCE和TAKEOVER。使用這兩個選項，能夠改變上述的流程。

若是有FORCE選項，則從節點不會與主節點進行交互，主節點也不會阻塞其客戶端，而是從節點當即開始故障轉移流程：發起選舉、統計選票、贏得選舉、升級爲主節點並更新配置。

若是有TAKEOVER選項，則更加簡單粗暴：從節點再也不發起選舉，而是直接將本身升級爲主節點，接手原主節點的槽位，增長本身的configEpoch後更新配置。

所以，使用FORCE和TAKEOVER選項，主節點能夠已經下線；而不使用任何選項，只發送」CLUSTER FAILOVER」命令的話，主節點必須在線。

在clusterCommand函數中，處理」CLUSTER FAILOVER」命令的部分代碼以下：

else if (!strcasecmp(c->argv[1]->ptr,"failover") &&
               (c->argc == 2 || c->argc == 3))
    {
        /* CLUSTER FAILOVER [FORCE|TAKEOVER] */
        int force = 0, takeover = 0;

        if (c->argc == 3) {
            if (!strcasecmp(c->argv[2]->ptr,"force")) {
                force = 1;
            } else if (!strcasecmp(c->argv[2]->ptr,"takeover")) {
                takeover = 1;
                force = 1; /* Takeover also implies force. */
            } else {
                addReply(c,shared.syntaxerr);
                return;
            }
        }

        /* Check preconditions. */
        if (nodeIsMaster(myself)) {
            addReplyError(c,"You should send CLUSTER FAILOVER to a slave");
            return;
        } else if (myself->slaveof == NULL) {
            addReplyError(c,"I'm a slave but my master is unknown to me");
            return;
        } else if (!force &&
                   (nodeFailed(myself->slaveof) ||
                    myself->slaveof->link == NULL))
        {
            addReplyError(c,"Master is down or failed, "
                            "please use CLUSTER FAILOVER FORCE");
            return;
        }
        resetManualFailover();
        server.cluster->mf_end = mstime() + REDIS_CLUSTER_MF_TIMEOUT;

        if (takeover) {
            /* A takeover does not perform any initial check. It just
             * generates a new configuration epoch for this node without
             * consensus, claims the master's slots, and broadcast the new
             * configuration. */
            redisLog(REDIS_WARNING,"Taking over the master (user request).");
            clusterBumpConfigEpochWithoutConsensus();
            clusterFailoverReplaceYourMaster();
        } else if (force) {
            /* If this is a forced failover, we don't need to talk with our
             * master to agree about the offset. We just failover taking over
             * it without coordination. */
            redisLog(REDIS_WARNING,"Forced failover user request accepted.");
            server.cluster->mf_can_start = 1;
        } else {
            redisLog(REDIS_WARNING,"Manual failover user request accepted.");
            clusterSendMFStart(myself->slaveof);
        }
        addReply(c,shared.ok);
    }

首先檢查命令的最後一個參數是不是FORCE或TAKEOVER；

若是當前節點是主節點；或者當前節點是從節點，但沒有主節點；或者當前從節點的主節點已經下線或者斷鏈，而且命令中沒有FORCE或TAKEOVER參數，則直接回復客戶端錯誤信息後返回；

而後調用resetManualFailover，重置手動強制故障轉移的狀態；

置mf_end爲當前時間加5秒，該屬性表示手動強制故障轉移流程的超時時間，也用來表示當前是否正在進行手動強制故障轉移；

若是命令最後一個參數爲TAKEOVER，這表示收到命令的從節點無需通過選舉的過程，直接接手其主節點的槽位，併成爲新的主節點。所以首先調用函數clusterBumpConfigEpochWithoutConsensus，產生新的configEpoch，以便後續更新配置；而後調用clusterFailoverReplaceYourMaster函數，轉變成爲新的主節點，並將這種轉變廣播給集羣中全部節點；

若是命令最後一個參數是FORCE，這表示收到命令的從節點能夠直接開始選舉過程，而無需達到主節點的複製偏移量以後纔開始選舉過程。所以置mf_can_start爲1，這樣在函數clusterHandleSlaveFailover中，即便在主節點未下線或者當前從節點的複製數據比較舊的狀況下，也能夠開始故障轉移流程；

若是最後一個參數不是FORCE或TAKEOVER，這表示收到命令的從節點，首先須要向主節點發送CLUSTERMSG_TYPE_MFSTART包，所以調用clusterSendMFStart函數，向其主節點發送該包；

主節點收到CLUSTERMSG_TYPE_MFSTART包後，在clusterProcessPacket函數中，是這樣處理的：

else if (type == CLUSTERMSG_TYPE_MFSTART) {
        /* This message is acceptable only if I'm a master and the sender
         * is one of my slaves. */
        if (!sender || sender->slaveof != myself) return 1;
        /* Manual failover requested from slaves. Initialize the state
         * accordingly. */
        resetManualFailover();
        server.cluster->mf_end = mstime() + REDIS_CLUSTER_MF_TIMEOUT;
        server.cluster->mf_slave = sender;
        pauseClients(mstime()+(REDIS_CLUSTER_MF_TIMEOUT*2));
        redisLog(REDIS_WARNING,"Manual failover requested by slave %.40s.",
            sender->name);
    }

若是字典中找不到發送節點，或者發送節點的主節點不是當前節點，則直接返回；

調用resetManualFailover，重置手動強制故障轉移的狀態；

而後置mf_end爲當前時間加5秒，該屬性表示手動強制故障轉移流程的超時時間，也用來表示當前是否正在進行手動強制故障轉移；

而後設置mf_slave爲sender，該屬性表示要進行手動強制故障轉移的從節點；

而後調用pauseClients，使全部客戶端在以後的10s內阻塞；

主節點在發送心跳包時，在構建包頭時，若是發現當前正處於手動強制故障轉移階段，則會在包頭中增長CLUSTERMSG_FLAG0_PAUSED標記：

void clusterBuildMessageHdr(clusterMsg *hdr, int type) {
    ...
    /* Set the message flags. */
    if (nodeIsMaster(myself) && server.cluster->mf_end)
        hdr->mflags[0] |= CLUSTERMSG_FLAG0_PAUSED;
    ...
}

從節點在clusterProcessPacket函數中處理收到的包，一旦發現主節點發來的，帶有CLUSTERMSG_FLAG0_PAUSED標記的包，就會將該主節點的複製偏移量記錄到server.cluster->mf_master_offset中：

int clusterProcessPacket(clusterLink *link) {
    ...
    /* Check if the sender is a known node. */
    sender = clusterLookupNode(hdr->sender);
    if (sender && !nodeInHandshake(sender)) {
        ...
        /* Update the replication offset info for this node. */
        sender->repl_offset = ntohu64(hdr->offset);
        sender->repl_offset_time = mstime();
        /* If we are a slave performing a manual failover and our master
         * sent its offset while already paused, populate the MF state. */
        if (server.cluster->mf_end &&
            nodeIsSlave(myself) &&
            myself->slaveof == sender &&
            hdr->mflags[0] & CLUSTERMSG_FLAG0_PAUSED &&
            server.cluster->mf_master_offset == 0)
        {
            server.cluster->mf_master_offset = sender->repl_offset;
            redisLog(REDIS_WARNING,
                "Received replication offset for paused "
                "master manual failover: %lld",
                server.cluster->mf_master_offset);
        }
    }
}

從節點在集羣定時器函數clusterCron中，會調用clusterHandleManualFailover函數，判斷一旦當前從節點的複製偏移量達到了server.cluster->mf_master_offset，就會置server.cluster->mf_can_start爲1。這樣在接下來要調用的clusterHandleSlaveFailover函數中，就會當即開始故障轉移流程了。

clusterHandleManualFailover函數的代碼以下：

void clusterHandleManualFailover(void) {
    /* Return ASAP if no manual failover is in progress. */
    if (server.cluster->mf_end == 0) return;

    /* If mf_can_start is non-zero, the failover was already triggered so the
     * next steps are performed by clusterHandleSlaveFailover(). */
    if (server.cluster->mf_can_start) return;

    if (server.cluster->mf_master_offset == 0) return; /* Wait for offset... */

    if (server.cluster->mf_master_offset == replicationGetSlaveOffset()) {
        /* Our replication offset matches the master replication offset
         * announced after clients were paused. We can start the failover. */
        server.cluster->mf_can_start = 1;
        redisLog(REDIS_WARNING,
            "All master replication stream processed, "
            "manual failover can start.");
    }
}

不論是從節點，仍是主節點，在集羣定時器函數clusterCron中，都會調用manualFailoverCheckTimeout函數，一旦發現手動故障轉移的超時時間已到，就會重置手動故障轉移的狀態，表示終止該過程。manualFailoverCheckTimeout函數代碼以下：

/* If a manual failover timed out, abort it. */
void manualFailoverCheckTimeout(void) {
    if (server.cluster->mf_end && server.cluster->mf_end < mstime()) {
        redisLog(REDIS_WARNING,"Manual failover timed out.");
        resetManualFailover();
    }
}

二：從節點遷移

在Redis集羣中，爲了加強集羣的可用性，通常狀況下須要爲每一個主節點配置若干從節點。可是這種主從關係若是是固定不變的，則通過一段時間以後，就有可能出現孤立主節點的狀況，也就是一個主節點再也沒有可用於故障轉移的從節點了，一旦這樣的主節點下線，整個集羣也就不可用了。

所以，在Redis集羣中，增長了從節點遷移的功能。簡單描述以下：一旦發現集羣中出現了孤立主節點，則某個從節點A就會自動變成該孤立主節點的從節點。該從節點A知足這樣的條件：A的主節點具備最多的附屬從節點；A在這些附屬從節點中，節點ID是最小的（The acting slave is the slave among the masterswith the maximum number of attached slaves, that is not in FAIL state and hasthe smallest node ID）。

該功能是在集羣定時器函數clusterCron中實現的。這部分的代碼以下：

void clusterCron(void) {
    ...
    orphaned_masters = 0;
    max_slaves = 0;
    this_slaves = 0;
    di = dictGetSafeIterator(server.cluster->nodes);
    while((de = dictNext(di)) != NULL) {
        clusterNode *node = dictGetVal(de);
        now = mstime(); /* Use an updated time at every iteration. */
        mstime_t delay;

        if (node->flags &
            (REDIS_NODE_MYSELF|REDIS_NODE_NOADDR|REDIS_NODE_HANDSHAKE))
                continue;

        /* Orphaned master check, useful only if the current instance
         * is a slave that may migrate to another master. */
        if (nodeIsSlave(myself) && nodeIsMaster(node) && !nodeFailed(node)) {
            int okslaves = clusterCountNonFailingSlaves(node);

            /* A master is orphaned if it is serving a non-zero number of
             * slots, have no working slaves, but used to have at least one
             * slave. */
            if (okslaves == 0 && node->numslots > 0 && node->numslaves)
                orphaned_masters++;
            if (okslaves > max_slaves) max_slaves = okslaves;
            if (nodeIsSlave(myself) && myself->slaveof == node)
                this_slaves = okslaves;
        }
        ...
    }
    ...
    if (nodeIsSlave(myself)) {
        ...
        /* If there are orphaned slaves, and we are a slave among the masters
         * with the max number of non-failing slaves, consider migrating to
         * the orphaned masters. Note that it does not make sense to try
         * a migration if there is no master with at least *two* working
         * slaves. */
        if (orphaned_masters && max_slaves >= 2 && this_slaves == max_slaves)
            clusterHandleSlaveMigration(max_slaves);
    }
    ...
}

輪訓字典server.cluster->nodes，只要其中的節點不是當前節點，沒有處於REDIS_NODE_NOADDR或者握手狀態，就對該node節點作相應的處理：

若是當前節點是從節點，而且node節點是主節點，而且node未被標記爲下線，則首先調用函數clusterCountNonFailingSlaves，計算node節點未下線的從節點個數okslaves，若是node主節點的okslaves爲0，而且該主節點負責的插槽數不爲0，說明該node主節點是孤立主節點，所以增長orphaned_masters的值；若是該node主節點的okslaves大於max_slaves，則將max_slaves改成okslaves，所以，max_slaves記錄了全部主節點中，擁有最多未下線從節點的那個主節點的未下線從節點個數；若是當前節點正好是node主節點的從節點之一，則將okslaves記錄到this_slaves中，以上都是爲後續作從節點遷移作的準備；

輪訓完全部節點以後，若是存在孤立主節點，而且max_slaves大於等於2，而且當前節點恰好是那個擁有最多未下線從節點的主節點的衆多從節點之一，則調用函數clusterHandleSlaveMigration，知足條件的狀況下，進行從節點遷移，也就是將當前從節點置爲某孤立主節點的從節點。

clusterHandleSlaveMigration函數的代碼以下：

void clusterHandleSlaveMigration(int max_slaves) {
    int j, okslaves = 0;
    clusterNode *mymaster = myself->slaveof, *target = NULL, *candidate = NULL;
    dictIterator *di;
    dictEntry *de;

    /* Step 1: Don't migrate if the cluster state is not ok. */
    if (server.cluster->state != REDIS_CLUSTER_OK) return;

    /* Step 2: Don't migrate if my master will not be left with at least
     *         'migration-barrier' slaves after my migration. */
    if (mymaster == NULL) return;
    for (j = 0; j < mymaster->numslaves; j++)
        if (!nodeFailed(mymaster->slaves[j]) &&
            !nodeTimedOut(mymaster->slaves[j])) okslaves++;
    if (okslaves <= server.cluster_migration_barrier) return;

    /* Step 3: Idenitfy a candidate for migration, and check if among the
     * masters with the greatest number of ok slaves, I'm the one with the
     * smaller node ID.
     *
     * Note that this means that eventually a replica migration will occurr
     * since slaves that are reachable again always have their FAIL flag
     * cleared. At the same time this does not mean that there are no
     * race conditions possible (two slaves migrating at the same time), but
     * this is extremely unlikely to happen, and harmless. */
    candidate = myself;
    di = dictGetSafeIterator(server.cluster->nodes);
    while((de = dictNext(di)) != NULL) {
        clusterNode *node = dictGetVal(de);
        int okslaves;

        /* Only iterate over working masters. */
        if (nodeIsSlave(node) || nodeFailed(node)) continue;
        /* If this master never had slaves so far, don't migrate. We want
         * to migrate to a master that remained orphaned, not masters that
         * were never configured to have slaves. */
        if (node->numslaves == 0) continue;
        okslaves = clusterCountNonFailingSlaves(node);

        if (okslaves == 0 && target == NULL && node->numslots > 0)
            target = node;

        if (okslaves == max_slaves) {
            for (j = 0; j < node->numslaves; j++) {
                if (memcmp(node->slaves[j]->name,
                           candidate->name,
                           REDIS_CLUSTER_NAMELEN) < 0)
                {
                    candidate = node->slaves[j];
                }
            }
        }
    }
    dictReleaseIterator(di);

    /* Step 4: perform the migration if there is a target, and if I'm the
     * candidate. */
    if (target && candidate == myself) {
        redisLog(REDIS_WARNING,"Migrating to orphaned master %.40s",
            target->name);
        clusterSetMaster(target);
    }
}

若是當前集羣狀態不是REDIS_CLUSTER_OK，則直接返回；若是當前從節點沒有主節點，則直接返回；

接下來計算，當前從節點的主節點，具備未下線從節點的個數okslaves；若是okslaves小於等於遷移閾值server.cluster_migration_barrier，則直接返回；

接下來，開始輪訓字典server.cluster->nodes，針對其中的每個節點node：

若是node節點是從節點，或者處於下線狀態，則直接處理下一個節點；若是node節點沒有配置從節點，則直接處理下一個節點；

調用clusterCountNonFailingSlaves函數，計算該node節點的未下線主節點數okslaves；若是okslaves爲0，而且該node節點的numslots大於0，說明該主節點以前有從節點，可是都下線了，所以找到了一個孤立主節點target；

若是okslaves等於參數max_slaves，說明該node節點就是具備最多未下線從節點的主節點，所以將當前節點的節點ID，與其全部從節點的節點ID進行比較，若是當前節點的名字更大，則將candidate置爲具備更小名字的那個從節點；（其實從這裏就能夠直接退出返回了）

輪訓完全部節點後，若是找到了孤立節點，而且當前節點擁有最小的節點ID，則調用clusterSetMaster，將target置爲當前節點的主節點，並開始主從複製流程。

三：configEpoch衝突問題

在集羣中，負責不一樣槽位的主節點，具備相同的configEpoch實際上是沒有問題的，可是有可能由於人爲介入的緣由或者BUG的問題，致使具備相同configEpoch的主節點都宣稱負責相同的槽位，這在分佈式系統中是致命的問題；所以，Redis規定集羣中的全部節點，必須具備不一樣的configEpoch。

當某個從節點升級爲新主節點時，它會獲得一個大於當前全部節點的configEpoch的新configEpoch，因此不會致使具備重複configEpoch的從節點（由於一次選舉中，不會有兩個從節點同時勝出）。可是在管理員發起的從新分片過程的最後，遷入槽位的節點會本身更新本身的configEpoch，而無需其餘節點的贊成；或者手動強制故障轉移過程，也會致使從節點在無需其餘節點贊成的狀況下更新configEpoch，以上的狀況均可能致使出現多個主節點具備相同configEpoch的狀況。

所以，就須要一種算法，保證集羣中全部節點的configEpoch都不相同。這種算法是這樣實現的：當某個主節點收到其餘主節點發來的心跳包後，發現包中的configEpoch與本身的configEpoch相同，就會調用clusterHandleConfigEpochCollision函數，解決這種configEpoch衝突的問題。

clusterHandleConfigEpochCollision函數的代碼以下：

void clusterHandleConfigEpochCollision(clusterNode *sender) {
    /* Prerequisites: nodes have the same configEpoch and are both masters. */
    if (sender->configEpoch != myself->configEpoch ||
        !nodeIsMaster(sender) || !nodeIsMaster(myself)) return;
    /* Don't act if the colliding node has a smaller Node ID. */
    if (memcmp(sender->name,myself->name,REDIS_CLUSTER_NAMELEN) <= 0) return;
    /* Get the next ID available at the best of this node knowledge. */
    server.cluster->currentEpoch++;
    myself->configEpoch = server.cluster->currentEpoch;
    clusterSaveConfigOrDie(1);
    redisLog(REDIS_VERBOSE,
        "WARNING: configEpoch collision with node %.40s."
        " configEpoch set to %llu",
        sender->name,
        (unsigned long long) myself->configEpoch);
}

若是發送節點的configEpoch不等於當前節點的configEpoch，或者發送節點不是主節點，或者當前節點不是主節點，則直接返回；

若是相比於當前節點的節點ID，發送節點的節點ID更小，則直接返回；

所以，較小名字的節點能得到更大的configEpoch，接下來首先增長本身的currentEpoch，而後將configEpoch賦值爲currentEpoch。

這樣，即便有多個節點具備相同的configEpoch，最終，只有具備最大節點ID的節點的configEpoch保持不變，其餘節點都會增長本身的configEpoch，並且增長的值會不一樣，具備最小NODE ID的節點，最終具備最大的configEpoch。

參考：http://redis.io/topics/cluster-spec