RocketMQ消息發送的隊列選擇與容錯策略

 

一個topic有多個隊列,分散在不一樣的broker。producer在發送消息的時候,須要選擇一個隊列java

producer發送消息全局時序圖:數組

隊列選擇與容錯策略結論:this

  • 在不開啓容錯的狀況下,輪詢隊列進行發送,若是失敗了,重試的時候過濾失敗的Broker
  • 若是開啓了容錯策略,會經過RocketMQ的預測機制來預測一個Broker是否可用
  • 若是上次失敗的Broker可用那麼仍是會選擇該Broker的隊列
  • 若是上述狀況失敗,則隨機選擇一個進行發送
  • 在發送消息的時候會記錄一下調用的時間與是否報錯,根據該時間去預測broker的可用時間

 

 

String lastBrokerName = null == mq ? null : mq.getBrokerName();
    MessageQueue tmpmq = this.selectOneMessageQueue(lastBrokerName);
if (tmpmq != null) {
                    mq = tmpmq;
//....

如上,若是發送失敗了,重試的時候lastBrokerName將不爲空,進入到selectOneMessageQueue方法spa

public MessageQueue selectOneMessageQueue(final TopicPublishInfo tpInfo, final String lastBrokerName) {
        if (this.sendLatencyFaultEnable) {
            try {
                int index = tpInfo.getSendWhichQueue().getAndIncrement();
                for (int i = 0; i < tpInfo.getMessageQueueList().size(); i++) {
                    int pos = Math.abs(index++) % tpInfo.getMessageQueueList().size();
                    if (pos < 0)
                        pos = 0;
                    MessageQueue mq = tpInfo.getMessageQueueList().get(pos);
                    if (latencyFaultTolerance.isAvailable(mq.getBrokerName())) {
                        if (null == lastBrokerName || mq.getBrokerName().equals(lastBrokerName))
                            return mq;
                    }
                }

                final String notBestBroker = latencyFaultTolerance.pickOneAtLeast();
                int writeQueueNums = tpInfo.getQueueIdByBroker(notBestBroker);
                if (writeQueueNums > 0) {
                    final MessageQueue mq = tpInfo.selectOneMessageQueue();
                    if (notBestBroker != null) {
                        mq.setBrokerName(notBestBroker);
                        mq.setQueueId(tpInfo.getSendWhichQueue().getAndIncrement() % writeQueueNums);
                    }
                    return mq;
                } else {
                    latencyFaultTolerance.remove(notBestBroker);
                }
            } catch (Exception e) {
            }

            return tpInfo.selectOneMessageQueue();
        }

        return tpInfo.selectOneMessageQueue(lastBrokerName);
    }

首先判斷sendLatencyFaultEnable是否爲true,來走不一樣的流程,默認爲falsecode

public MessageQueue selectOneMessageQueue(final String lastBrokerName) {
        // 若是爲空,即第一次發生,未發生錯誤重試
        // 直接輪詢隊列進行發送
        if (lastBrokerName == null) {
            return selectOneMessageQueue();
        } else {
            // 與selectOneMessageQueue相似,過濾的lastBrokerName的隊列
            int index = this.sendWhichQueue.getAndIncrement();
            for (int i = 0; i < this.messageQueueList.size(); i++) {
                int pos = Math.abs(index++) % this.messageQueueList.size();
                if (pos < 0)
                    pos = 0;
                MessageQueue mq = this.messageQueueList.get(pos);
                if (!mq.getBrokerName().equals(lastBrokerName)) {
                    return mq;
                }
            }
            return selectOneMessageQueue();
        }
    }
    public MessageQueue selectOneMessageQueue() {
        int index = this.sendWhichQueue.getAndIncrement();
        int pos = Math.abs(index) % this.messageQueueList.size();
        if (pos < 0)
            pos = 0;
        return this.messageQueueList.get(pos);
    }

總的來講都是輪詢,只是一個有過濾失敗的lastBrokerName,一個沒有blog

sendLatencyFaultEnable開啓:排序

  • 1
int index = tpInfo.getSendWhichQueue().getAndIncrement();
                for (int i = 0; i < tpInfo.getMessageQueueList().size(); i++) {
                    int pos = Math.abs(index++) % tpInfo.getMessageQueueList().size();
                    if (pos < 0)
                        pos = 0;
                    MessageQueue mq = tpInfo.getMessageQueueList().get(pos);
                    // 判斷該Broker是否可用,不可用則進行第二部分的邏輯
                    if (latencyFaultTolerance.isAvailable(mq.getBrokerName())) {
                        // 非失敗重試,直接返回到的隊列
                        // 失敗重試的狀況,若是和選擇的隊列是上次重試是同樣的,則返回
                        if (null == lastBrokerName || mq.getBrokerName().equals(lastBrokerName))
                            return mq;
                    }
                }
  • 2
 //從容錯信息中取一個Broker
final String notBestBroker = latencyFaultTolerance.pickOneAtLeast();
                int writeQueueNums = tpInfo.getQueueIdByBroker(notBestBroker);
                if (writeQueueNums > 0) {// 有可寫隊列
                    // 日後取一個
                    final MessageQueue mq = tpInfo.selectOneMessageQueue();
                    if (notBestBroker != null) {
                        // 將取到的隊列信息設置爲取到的broker
                        mq.setBrokerName(notBestBroker);
                        // 隊列重置
                        mq.setQueueId(tpInfo.getSendWhichQueue().getAndIncrement() % writeQueueNums);
                    }
                    return mq;
                } else {
                    latencyFaultTolerance.remove(notBestBroker);
                }

第一部分主要是選擇一個可用的而且brokerName爲lastBrokerName的隊列,這裏其實有點疑問,是失敗的時候lastBrokerName纔不爲空,這時候爲何還會選擇可用且brokerName爲lastBrokerName的隊列?這個猜想多是以爲當前brokerName的上一次發送的隊列失敗了,可能下個隊列會成功,加上當前延遲容錯機制下的確保可用狀況下,選擇另外的隊列。隊列

假設沒有找到對應的隊列,只有一種狀況element

  • 延遲容錯機制以爲lastBrokerName這個broker不可用

那麼將會進入第二部分代碼,首先調用pickOneAtLeast獲取一個broker,再調用selectOneMessageQueue獲取一個隊列,若是pickOneAtLeast取到的不爲空,那麼將隊列信息替換rem

容錯策略

如何判斷broker是否可用

public boolean isAvailable(final String name) {
        final FaultItem faultItem = this.faultItemTable.get(name);
        if (faultItem != null) {
            return faultItem.isAvailable();
        }
        return true;
    }

分兩部分

  • faultItemTable放進去的時機
  • FaultItem的isAvailable實現

isAvailable實現

public boolean isAvailable() {
            return (System.currentTimeMillis() - startTimestamp) >= 0;
        }

判斷當前時間是否大於startTimestamp,爲何只是判斷一個時間就能夠知道Broker是否可用?

faultItemTable

經過查找faultItemTable使用的地方,找到updateFaultItem方法

 

public void updateFaultItem(final String name/*brokerName*/, final long currentLatency, final long notAvailableDuration) {
        FaultItem old = this.faultItemTable.get(name);
        if (null == old) {
            final FaultItem faultItem = new FaultItem(name);
            faultItem.setCurrentLatency(currentLatency);
            faultItem.setStartTimestamp(System.currentTimeMillis() + notAvailableDuration);

            old = this.faultItemTable.putIfAbsent(name, faultItem);
            if (old != null) {
                old.setCurrentLatency(currentLatency);
                old.setStartTimestamp(System.currentTimeMillis() + notAvailableDuration);
            }
        } else {
            old.setCurrentLatency(currentLatency);
            old.setStartTimestamp(System.currentTimeMillis() + notAvailableDuration);
        }
    }

經過brokerName找到對應的FaultItem,startTimestamp=當前時間+notAvailableDuration,找到updateFaultItem使用的地方,看看notAvailableDuration是什麼,找到MQFaultStrategy.updateFaultItem(String, long, boolean)方法

 

public void updateFaultItem(final String brokerName, final long currentLatency, boolean isolation) {
        if (this.sendLatencyFaultEnable) {// 開啓延遲容錯功能
            long duration = computeNotAvailableDuration(isolation ? 30000 : currentLatency);
            this.latencyFaultTolerance.updateFaultItem(brokerName, currentLatency, duration);
        }
    }
    private long computeNotAvailableDuration(final long currentLatency) {
        for (int i = latencyMax.length - 1; i >= 0; i--) {
            if (currentLatency >= latencyMax[i]) return this.notAvailableDuration[i];
        }
        return 0;
    }

MQFaultStrategy.java部分屬性

public class MQFaultStrategy {
      private final static Logger log = ClientLogger.getLog();
      /**
       * 延遲故障容錯,維護每一個Broker的發送消息的延遲
       * key:brokerName
       */
      private final LatencyFaultTolerance<String> latencyFaultTolerance = new LatencyFaultToleranceImpl();
      /**
        * 發送消息延遲容錯開關
     */
      private boolean sendLatencyFaultEnable = false;
    /**
      * 延遲級別數組
      */
    private long[] latencyMax = {50L, 100L, 550L, 1000L, 2000L, 3000L, 15000L};
     /**
      * 不可用時長數組
      */
     private long[] notAvailableDuration = {0L, 0L, 30000L, 60000L, 120000L, 180000L, 600000L};

.....
}

 

 

notAvailableDuration爲notAvailableDuration數組某個位置的值,latencyMax和notAvailableDuration數組的值分別以下

 

 

 

latencyMax notAvailableDuration
50L 0L
100L 0L
550L 30000L
1000L 60000L
2000L 120000L
3000L 180000L
15000L 600000L

  • currentLatency若是大於等於50小於100,則notAvailableDuration爲0
  • currentLatency若是大於等於100小於550,則notAvailableDuration爲0
  • currentLatency若是大於等於550小於1000,則notAvailableDuration爲300000
  • …以此類推

假設isolation傳入true,那麼notAvailableDuration將傳入600000。
結合isAvailable方法,大概流程以下,RocketMQ爲每一個Broker預測了個可用時間(當前時間+notAvailableDuration),噹噹前時間大於該時間,才表明Broker可用,而notAvailableDuration有6個級別和latencyMax的區間一一對應,根據傳入的currentLatency去預測該Broker在何時可用

那麼看下updateFaultItem使用的地方,看看currentLatency傳入的是什麼

  // 1.
try {
    beginTimestampPrev = System.currentTimeMillis();
    sendResult = this.sendKernelImpl(msg, mq, communicationMode, sendCallback, topicPublishInfo, timeout);
    endTimestamp = System.currentTimeMillis();
    this.updateFaultItem(mq.getBrokerName(), endTimestamp - beginTimestampPrev, false);

  // 2.
} catch (xxException e) {
    endTimestamp = System.currentTimeMillis();
    this.updateFaultItem(mq.getBrokerName(), endTimestamp - beginTimestampPrev, true);
}

currentLatency爲發送消息的執行時間,根據執行時間來看落入哪一個區間,在0~100的時間內notAvailableDuration都是0,都是可用的,大於該值後,可用的時間就會開始變大了,而在報錯的時候isolation參數爲true,那麼該broker在600000毫秒後纔可用

pickOneAtLeast

當真的出現600000毫秒後纔可用的狀況,在selectOneMessageQueue方法的第一部分代碼就走不下去了,只能走到第二部分代碼,先調用pickOneAtLeast方法獲取一個broker

public String pickOneAtLeast() {
        final Enumeration<FaultItem> elements = this.faultItemTable.elements();
        List<FaultItem> tmpList = new LinkedList<FaultItem>();
        // 將faultItemTable裏的元素全放到list中
        while (elements.hasMoreElements()) {
            final FaultItem faultItem = elements.nextElement();
            tmpList.add(faultItem);
        }

        if (!tmpList.isEmpty()) {
            // 先打亂再排序
            Collections.shuffle(tmpList);
            Collections.sort(tmpList);
        
            final int half = tmpList.size() / 2;
            if (half <= 0) {// 只有一個元素的狀況
                return tmpList.get(0).getName();
            } else {// 根據half取餘
                final int i = this.whichItemWorst.getAndIncrement() % half;
                return tmpList.get(i).getName();
            }
        }
        return null;
    }
相關文章
相關標籤/搜索