聊聊apache gossip的FailureDetector

本文主要研究一下apache gossip的FailureDetectorjava

FailureDetector

incubator-retired-gossip/gossip-base/src/main/java/org/apache/gossip/accrual/FailureDetector.javagit

public class FailureDetector {

  public static final Logger LOGGER = Logger.getLogger(FailureDetector.class);
  private final DescriptiveStatistics descriptiveStatistics;
  private final long minimumSamples;
  private volatile long latestHeartbeatMs = -1;
  private final String distribution;

  public FailureDetector(long minimumSamples, int windowSize, String distribution) {
    descriptiveStatistics = new DescriptiveStatistics(windowSize);
    this.minimumSamples = minimumSamples;
    this.distribution = distribution;
  }

  /**
   * Updates the statistics based on the delta between the last
   * heartbeat and supplied time
   *
   * @param now the time of the heartbeat in milliseconds
   */
  public synchronized void recordHeartbeat(long now) {
    if (now <= latestHeartbeatMs) {
      return;
    }
    if (latestHeartbeatMs != -1) {
      descriptiveStatistics.addValue(now - latestHeartbeatMs);
    }
    latestHeartbeatMs = now;
  }

  public synchronized Double computePhiMeasure(long now) {
    if (latestHeartbeatMs == -1 || descriptiveStatistics.getN() < minimumSamples) {
      return null;
    }
    long delta = now - latestHeartbeatMs;
    try {
      double probability;
      if (distribution.equals("normal")) {
        double standardDeviation = descriptiveStatistics.getStandardDeviation();
        standardDeviation = standardDeviation < 0.1 ? 0.1 : standardDeviation;
        probability = new NormalDistributionImpl(descriptiveStatistics.getMean(), standardDeviation).cumulativeProbability(delta);
      } else {
        probability = new ExponentialDistributionImpl(descriptiveStatistics.getMean()).cumulativeProbability(delta);
      }
      final double eps = 1e-12;
      if (1 - probability < eps) {
        probability = 1.0;
      }
      return -1.0d * Math.log10(1.0d - probability);
    } catch (MathException | IllegalArgumentException e) {
      LOGGER.debug(e);
      return null;
    }
  }
}
  • FailureDetector的構造器接收三個參數,分別是minimumSamples, windowSize, distribution
  • 其中minimumSamples表示最少須要多少統計值的時候才真正計算phi值,windowSize表示統計窗口的大小,distribution表示使用哪一種分佈,normal表示NormalDistribution,其餘表示ExponentialDistribution
  • FailureDetector使用了apache commons math的DescriptiveStatistics來做爲Heartbeat Interval的時間窗口統計;使用了NormalDistribution、ExponentialDistribution來完成正態分佈、指數分佈的累積分佈probability,最後使用公式-1.0d * Math.log10(1.0d - probability)來計算phi值

小結

  • The Phi Accrual Failure Detector by Hayashibara et al論文提出了基於phi值的Accrual Failure Detector方法
  • 業界關於Failure Detector的實現大體有兩種,一種是以akka爲表明的按照論文基於NormalDistribution來計算;一種是以cassandra爲表明的基於ExponentialDistribution來計算
  • apache gossip的FailureDetector則集大成地同時支持了NormalDistribution及ExponentialDistribution兩種實現方式

doc

相關文章
相關標籤/搜索