聊聊flink的Evictors

本文主要研究一下flink的Evictorshtml

Evictor

flink-streaming-java_2.11-1.7.0-sources.jar!/org/apache/flink/streaming/api/windowing/evictors/Evictor.javajava

@PublicEvolving
public interface Evictor<T, W extends Window> extends Serializable {

    void evictBefore(Iterable<TimestampedValue<T>> elements, int size, W window, EvictorContext evictorContext);

    void evictAfter(Iterable<TimestampedValue<T>> elements, int size, W window, EvictorContext evictorContext);

    interface EvictorContext {

        long getCurrentProcessingTime();

        MetricGroup getMetricGroup();

        long getCurrentWatermark();
    }
}
  • Evictor接收兩個泛型,一個是element的類型,一個是窗口類型;它定義了evictBefore(在windowing function以前)、evictAfter(在windowing function以後)兩個方法,它們都有EvictorContext參數;EvictorContext定義了getCurrentProcessingTime、getMetricGroup、getCurrentWatermark方法

CountEvictor

flink-streaming-java_2.11-1.7.0-sources.jar!/org/apache/flink/streaming/api/windowing/evictors/CountEvictor.javaapache

@PublicEvolving
public class CountEvictor<W extends Window> implements Evictor<Object, W> {
    private static final long serialVersionUID = 1L;

    private final long maxCount;
    private final boolean doEvictAfter;

    private CountEvictor(long count, boolean doEvictAfter) {
        this.maxCount = count;
        this.doEvictAfter = doEvictAfter;
    }

    private CountEvictor(long count) {
        this.maxCount = count;
        this.doEvictAfter = false;
    }

    @Override
    public void evictBefore(Iterable<TimestampedValue<Object>> elements, int size, W window, EvictorContext ctx) {
        if (!doEvictAfter) {
            evict(elements, size, ctx);
        }
    }

    @Override
    public void evictAfter(Iterable<TimestampedValue<Object>> elements, int size, W window, EvictorContext ctx) {
        if (doEvictAfter) {
            evict(elements, size, ctx);
        }
    }

    private void evict(Iterable<TimestampedValue<Object>> elements, int size, EvictorContext ctx) {
        if (size <= maxCount) {
            return;
        } else {
            int evictedCount = 0;
            for (Iterator<TimestampedValue<Object>> iterator = elements.iterator(); iterator.hasNext();){
                iterator.next();
                evictedCount++;
                if (evictedCount > size - maxCount) {
                    break;
                } else {
                    iterator.remove();
                }
            }
        }
    }

    public static <W extends Window> CountEvictor<W> of(long maxCount) {
        return new CountEvictor<>(maxCount);
    }

    public static <W extends Window> CountEvictor<W> of(long maxCount, boolean doEvictAfter) {
        return new CountEvictor<>(maxCount, doEvictAfter);
    }
}
  • CountEvictor實現了Evictor接口,其中element類型爲Object;它有兩個屬性,分別是doEvictAfter、maxCount;其中doEvictAfter用於指定是使用evictBefore方法仍是evictAfter方法;maxCount爲窗口元素個數的閾值,超出則刪掉

DeltaEvictor

flink-streaming-java_2.11-1.7.0-sources.jar!/org/apache/flink/streaming/api/windowing/evictors/DeltaEvictor.javawindows

@PublicEvolving
public class DeltaEvictor<T, W extends Window> implements Evictor<T, W> {
    private static final long serialVersionUID = 1L;

    DeltaFunction<T> deltaFunction;
    private double threshold;
    private final boolean doEvictAfter;

    private DeltaEvictor(double threshold, DeltaFunction<T> deltaFunction) {
        this.deltaFunction = deltaFunction;
        this.threshold = threshold;
        this.doEvictAfter = false;
    }

    private DeltaEvictor(double threshold, DeltaFunction<T> deltaFunction, boolean doEvictAfter) {
        this.deltaFunction = deltaFunction;
        this.threshold = threshold;
        this.doEvictAfter = doEvictAfter;
    }

    @Override
    public void evictBefore(Iterable<TimestampedValue<T>> elements, int size, W window, EvictorContext ctx) {
        if (!doEvictAfter) {
            evict(elements, size, ctx);
        }
    }

    @Override
    public void evictAfter(Iterable<TimestampedValue<T>> elements, int size, W window, EvictorContext ctx) {
        if (doEvictAfter) {
            evict(elements, size, ctx);
        }
    }

    private void evict(Iterable<TimestampedValue<T>> elements, int size, EvictorContext ctx) {
        TimestampedValue<T> lastElement = Iterables.getLast(elements);
        for (Iterator<TimestampedValue<T>> iterator = elements.iterator(); iterator.hasNext();){
            TimestampedValue<T> element = iterator.next();
            if (deltaFunction.getDelta(element.getValue(), lastElement.getValue()) >= this.threshold) {
                iterator.remove();
            }
        }
    }

    @Override
    public String toString() {
        return "DeltaEvictor(" +  deltaFunction + ", " + threshold + ")";
    }

    public static <T, W extends Window> DeltaEvictor<T, W> of(double threshold, DeltaFunction<T> deltaFunction) {
        return new DeltaEvictor<>(threshold, deltaFunction);
    }

    public static <T, W extends Window> DeltaEvictor<T, W> of(double threshold, DeltaFunction<T> deltaFunction, boolean doEvictAfter) {
        return new DeltaEvictor<>(threshold, deltaFunction, doEvictAfter);
    }
}
  • DeltaEvictor實現了Evictor接口,它有三個屬性,分別是doEvictAfter、threshold、deltaFunction;其中doEvictAfter用於指定是使用evictBefore方法仍是evictAfter方法;threshold爲閾值,若是deltaFunction.getDelta方法(每一個element與lastElement計算delta)算出來的值大於等於該值,則須要移除該元素

TimeEvictor

flink-streaming-java_2.11-1.7.0-sources.jar!/org/apache/flink/streaming/api/windowing/evictors/TimeEvictor.javaapi

@PublicEvolving
public class TimeEvictor<W extends Window> implements Evictor<Object, W> {
    private static final long serialVersionUID = 1L;

    private final long windowSize;
    private final boolean doEvictAfter;

    public TimeEvictor(long windowSize) {
        this.windowSize = windowSize;
        this.doEvictAfter = false;
    }

    public TimeEvictor(long windowSize, boolean doEvictAfter) {
        this.windowSize = windowSize;
        this.doEvictAfter = doEvictAfter;
    }

    @Override
    public void evictBefore(Iterable<TimestampedValue<Object>> elements, int size, W window, EvictorContext ctx) {
        if (!doEvictAfter) {
            evict(elements, size, ctx);
        }
    }

    @Override
    public void evictAfter(Iterable<TimestampedValue<Object>> elements, int size, W window, EvictorContext ctx) {
        if (doEvictAfter) {
            evict(elements, size, ctx);
        }
    }

    private void evict(Iterable<TimestampedValue<Object>> elements, int size, EvictorContext ctx) {
        if (!hasTimestamp(elements)) {
            return;
        }

        long currentTime = getMaxTimestamp(elements);
        long evictCutoff = currentTime - windowSize;

        for (Iterator<TimestampedValue<Object>> iterator = elements.iterator(); iterator.hasNext(); ) {
            TimestampedValue<Object> record = iterator.next();
            if (record.getTimestamp() <= evictCutoff) {
                iterator.remove();
            }
        }
    }

    private boolean hasTimestamp(Iterable<TimestampedValue<Object>> elements) {
        Iterator<TimestampedValue<Object>> it = elements.iterator();
        if (it.hasNext()) {
            return it.next().hasTimestamp();
        }
        return false;
    }

    private long getMaxTimestamp(Iterable<TimestampedValue<Object>> elements) {
        long currentTime = Long.MIN_VALUE;
        for (Iterator<TimestampedValue<Object>> iterator = elements.iterator(); iterator.hasNext();){
            TimestampedValue<Object> record = iterator.next();
            currentTime = Math.max(currentTime, record.getTimestamp());
        }
        return currentTime;
    }

    @Override
    public String toString() {
        return "TimeEvictor(" + windowSize + ")";
    }

    @VisibleForTesting
    public long getWindowSize() {
        return windowSize;
    }

    public static <W extends Window> TimeEvictor<W> of(Time windowSize) {
        return new TimeEvictor<>(windowSize.toMilliseconds());
    }

    public static <W extends Window> TimeEvictor<W> of(Time windowSize, boolean doEvictAfter) {
        return new TimeEvictor<>(windowSize.toMilliseconds(), doEvictAfter);
    }
}
  • TimeEvictor實現了Evictor接口,其中element類型爲Object;它有兩個屬性,分別是doEvictAfter、windowSize;其中doEvictAfter用於指定是使用evictBefore方法仍是evictAfter方法;windowSize用於指定窗口的時間長度,以窗口元素最大時間戳-windowSize爲evictCutoff,全部timestamp小於等於evictCutoff的元素都將會被剔除

小結

  • Evictor接收兩個泛型,一個是element的類型,一個是窗口類型;它定義了evictBefore(在windowing function以前)、evictAfter(在windowing function以後)兩個方法,它們都有EvictorContext參數;EvictorContext定義了getCurrentProcessingTime、getMetricGroup、getCurrentWatermark方法
  • Evictor有幾個內置的實現類,分別是CountEvictor、DeltaEvictor、TimeEvictor;其中CountEvictor是按窗口元素個數來進行剔除,TimeEvictor是按窗口長度來進行剔除,DeltaEvictor則是根據窗口元素與lastElement的delta與指定的threshold對比來進行剔除
  • 若是指定了evictor(evictBefore)則會妨礙任何pre-aggregation操做,由於全部的窗口元素都會在windowing function計算以前先執行evictor操做;另外就是flink不保障窗口元素的順序,也就是evictor若是有按窗口開頭或末尾剔除元素,可能剔除的元素實際上並非最早或最後達到的

doc

相關文章
相關標籤/搜索