Java8-11-Stream收集器源碼分析與自定義收集器

時間 2019-11-10

標籤 java8 java stream 收集源碼分析自定義欄目 Java 简体版

原文原文鏈接

上一篇咱們系統的學習了Stream的分組分區，本篇咱們學習下Stream中的收集器。
那麼什麼是收集器呢，在以前的課程中，咱們學習了能夠經過Stream對集合中的元素進行例如映射，過濾，分組，分區等操做。例以下面將全部元素轉成大寫就是用map映射操做segmentfault

List<String> list = Arrays.asList("hello", "world", "helloworld");
List<String> collect = list.stream().map(String::toUpperCase).collect(Collectors.toList());

如今再看上面的程序就很容易理解了，可是咱們以前的文章只是對於中間操做（map方法等）進行了詳細的介紹，包括lambda表達式和方法引用以及各類函數式接口。接下來咱們將注意力放在collect方法上，collect接收一個Collector類型的參數，Collector就是Java8中的收集器。併發

<R, A> R collect(Collector<? super T, A, R> collector);

也就是說collect方法最終須要接收一個收集器做爲結果容器。雖然大多數收集器不須要咱們自行建立，能夠藉助Collectors類提供的建立經常使用收集器的方法，例如toList() toSet() toCollection(Supplier collectionFactory)等方法。可是深刻理解收集器的實現，對咱們編寫正確的程序會起到極大的做用。app

下面就是toList方法的具體實現ide

public static <T> Collector<T, ?, List<T>> toList() {
    return new CollectorImpl<>((Supplier<List<T>>) ArrayList::new, List::add,
                               (left, right) -> { left.addAll(right); return left; },
                               CH_ID);
}

經過查看toList方法源碼，知道返回的收集器是一個CollectorImpl的實例。而CollectorImpl就是收集器Collector的一個實現類，被定義在Collectors輔助類中，用於建立經常使用的收集器實例供咱們使用函數

/**
 * Simple implementation class for {@code Collector}.
 *
 * @param <T> the type of elements to be collected
 * @param <R> the type of the result
 */
static class CollectorImpl<T, A, R> implements Collector<T, A, R> {
    private final Supplier<A> supplier;
    private final BiConsumer<A, T> accumulator;
    private final BinaryOperator<A> combiner;
    private final Function<A, R> finisher;
    private final Set<Characteristics> characteristics;

    CollectorImpl(Supplier<A> supplier,
                  BiConsumer<A, T> accumulator,
                  BinaryOperator<A> combiner,
                  Function<A,R> finisher,
                  Set<Characteristics> characteristics) {
        this.supplier = supplier;
        this.accumulator = accumulator;
        this.combiner = combiner;
        this.finisher = finisher;
        this.characteristics = characteristics;
    }

    CollectorImpl(Supplier<A> supplier,
                  BiConsumer<A, T> accumulator,
                  BinaryOperator<A> combiner,
                  Set<Characteristics> characteristics) {
        this(supplier, accumulator, combiner, castingIdentity(), characteristics);
    }

    @Override
    public BiConsumer<A, T> accumulator() {
        return accumulator;
    }

    @Override
    public Supplier<A> supplier() {
        return supplier;
    }

    @Override
    public BinaryOperator<A> combiner() {
        return combiner;
    }

    @Override
    public Function<A, R> finisher() {
        return finisher;
    }

    @Override
    public Set<Characteristics> characteristics() {
        return characteristics;
    }
}

CollectorImpl構造方法根據傳入的不一樣參數實現Collector接口中的方法，例如上面的toList
因此若是要實現自定義的收集器，就須要咱們本身來實現Collector接口中的各個方法，接下來就接口中的每一個方法進行分析學習

/*
 * @param <T> the type of input elements to the reduction operation
 * @param <A> the mutable accumulation type of the reduction operation (often
 *            hidden as an implementation detail)
 * @param <R> the result type of the reduction operation
 * @since 1.8
 */
public interface Collector<T, A, R> {

在分析Collector接口以前，咱們須要關注下Collector接口的三個泛型
泛型T 表示向集合中放入的元素類型
泛型A 表示可變的中間結果容器類型
泛型R 表示最終的結果容器類型this

下面咱們還會提到這些泛型，接下來看下Collector接口中的方法lua

/**
     * A function that creates and returns a new mutable result container.
     *
     * @return a function which returns a new, mutable result container
     */
    Supplier<A> supplier();

supplier()是一個建立並返回一個新的可變的結果容器的函數，也就是收集器工做時，首先要將收集的元素(也就是泛型T類型)放到supplier()建立的容器中。線程

/**
     * A function that folds a value into a mutable result container.
     *
     * @return a function which folds a value into a mutable result container
     */
    BiConsumer<A, T> accumulator();

accumulator()是將一個個元素(泛型T類型)內容放到一個可變的結果容器(泛型A類型)中的函數，這個結果容器就是上面supplier()函數所建立的。code

/**
     * A function that accepts two partial results and merges them.  The
     * combiner function may fold state from one argument into the other and
     * return that, or may return a new result container.
     *
     * @return a function which combines two partial results into a combined
     * result
     */
    BinaryOperator<A> combiner();

combiner()會接收兩部分結果容器(泛型A類型)而且將他們進行合併。便可以將一個結果集合併到另外一個結果集中，也能夠將這兩個結果集合併到一個新的結果集中，並將獲得的並集返回。
這裏所說的結果集是指supplier()建立的結果容器中的全部元素，可是爲何說會接收兩個結果集呢，這裏涉及到並行流機制，若是是串行流執行只會生成一個結果容器不須要combiner()
函數進行合併，可是若是是並行流會生成多個結果容器，須要combiner()分別進行兩兩合併，最終獲得一個最終的結果容器(泛型R類型)

其實並行流這裏說的並不嚴謹，並行流須要結合Characteristics中的CONCURRENT特性值才能判斷是否會產生多箇中間可變結果容器，咱們在後續分析收集器執行機制時，會結合示例來講明這部分的區別。

/**
     * Perform the final transformation from the intermediate accumulation type
     * {@code A} to the final result type {@code R}.
     *
     * <p>If the characteristic {@code IDENTITY_TRANSFORM} is
     * set, this function may be presumed to be an identity transform with an
     * unchecked cast from {@code A} to {@code R}.
     *
     * @return a function which transforms the intermediate result to the final
     * result
     */
    Function<A, R> finisher();

finisher()會執行最終的轉換操做，也就是說若是咱們須要將獲得的結果再次進行類型轉換或者其餘一些邏輯處理的話，能夠經過finisher()完成。若是收集器包含了
Characteristics.IDENTITY_FINISH特性，說明不須要進行任何轉換操做了，那麼finisher()函數就不會執行。

/**
     * Returns a {@code Set} of {@code Collector.Characteristics} indicating
     * the characteristics of this Collector.  This set should be immutable.
     *
     * @return an immutable set of collector characteristics
     */
    Set<Characteristics> characteristics();

最後來看下characteristics()函數，上面咱們不止一次提到了收集器的特性值這個概念，characteristics()方法就是返回這些特性值的函數。這些特性值是咱們建立收集器時，本身經過Characteristics指定的。Characteristics是一個定義在Collector接口中的枚舉，它包括三個枚舉值CONCURRENT,UNORDERED,IDENTITY_FINISH

/**
     * Characteristics indicating properties of a {@code Collector}, which can
     * be used to optimize reduction implementations.
     */
    enum Characteristics {
        /**
         * Indicates that this collector is <em>concurrent</em>, meaning that
         * the result container can support the accumulator function being
         * called concurrently with the same result container from multiple
         * threads.
         *
         * <p>If a {@code CONCURRENT} collector is not also {@code UNORDERED},
         * then it should only be evaluated concurrently if applied to an
         * unordered data source.
         */
        CONCURRENT,

        /**
         * Indicates that the collection operation does not commit to preserving
         * the encounter order of input elements.  (This might be true if the
         * result container has no intrinsic order, such as a {@link Set}.)
         */
        UNORDERED,

        /**
         * Indicates that the finisher function is the identity function and
         * can be elided.  If set, it must be the case that an unchecked cast
         * from A to R will succeed.
         */
        IDENTITY_FINISH
    }

若是包含了CONCURRENT特性值，表示這個收集器是支持併發操做的，這意味着多個線程能夠同時調用accumulator()函數來向同一個中間結果容器放置元素。
注意這裏是同一個中間結果容器而不是多箇中間結果容器，也就是說若是包含了CONCURRENT特性值，(即便是並行流)只會產生一箇中間結果容器，而且這個中間結果容器支持併發操做。

UNORDERED特性就很好理解了，它表示收集器中的元素是無序的。

IDENTITY_FINISH特性就表示肯定獲得的結果容器類型就是咱們最終須要的類型，(在進行向最終類型強制類型轉換時必定是成功的)

分析完咱們總結一下：
1.supplier() 用於建立並返回一個可變的結果容器。
2.accumulator() 能夠將元素累加到可變的結果容器中，也就是supplier()返回的容器。
3.combiner() 將兩部分結果容器（也就是supplier()返回的容器）合併起來，能夠是將一個結果容器合併到另外一個結果容器中，也能夠是將兩個結果容器合併到一個新的空結果容器。
4.finisher() 執行最終的轉換，將中間結果類型轉換成最終的結果類型。
5.characteristics() 收集器的特性集合不一樣的特性執行機制也不一樣

瞭解了Collector接口中的各個方法後，下面咱們結合一個簡單的需求，實現本身自的收集器
簡單的需求就是將集合中的元素進行去重，這個需求十幾枚多大意義，主要爲了演示如何自定義收集器

public class MySetCollector<T> implements Collector<T,Set<T>,Set<T>>{
    @Override
    public Supplier<Set<T>> supplier() {
        return HashSet<T>::new;
    }

    @Override
    public BiConsumer<Set<T>, T> accumulator() {
        return Set<T>::add;
    }

    @Override
    public BinaryOperator<Set<T>> combiner() {
        return (Set<T> s1, Set<T> s2) -> {
            s1.addAll(s2);
            return s1;
        };
    }

    @Override
    public Function<Set<T>, Set<T>> finisher() {
        return Function.identity();
    }

    @Override
    public Set<Characteristics> characteristics() {
        EnumSet<Characteristics> characteristicsEnumSet = EnumSet.of(Characteristics.UNORDERED,
                Characteristics.IDENTITY_FINISH);//remove IDENTITY_FINISH finisher method will be invoked
        return Collections.unmodifiableSet(characteristicsEnumSet);
    }

    public static void main(String[] args) {
        List<String> list = Arrays.asList("hello","world","welcome","hello");
        Set<String> collect = list.stream().collect(new MySetCollector<String>());
        System.out.println(collect);
    }
}

MySetCollector類實現了Collector接口，並指定了三個泛型，集合中收集每一個元素類型爲T，中間結果容器類型爲Set<T>，不須要對中間結果容器類型進行轉換，因此最終結果類型也是Set<T>
supplier()中咱們返回一個HashSet做爲中間結果容器，accumulator()中調用Set的add方法將一個個元素加入到集合中，全都採用方法引用的方式實現。
而後combiner()對中間結果容器兩兩合併，finisher()中直接調用Function.identity()將合併後的中間結果容器做爲最終的結果返回

/**
     * Returns a function that always returns its input argument.
     *
     * @param <T> the type of the input and output objects to the function
     * @return a function that always returns its input argument
     */
    static <T> Function<T, T> identity() {
        return t -> t;
    }

characteristics()方法定義了收集器的特性值，UNORDERED和IDENTITY_FINISH。表示容器中的元素是無序的而且不須要進行最終的類型轉換
執行結果爲[world, hello, welcome]

本篇咱們經過分析收集器源碼並結合一個簡單的元素去重的需求實現了本身的收集器MySetCollector，下一篇咱們會繼續借助這個實例來分析收集器的執行機制。