歸約與分組 - 讀《Java 8實戰》

時間 2019-12-05

原文原文鏈接

區分Collection,Collector和collectjava

代碼中用到的類與方法用紅框標出，可從git庫中查看git

收集器用做高級歸約

// 按貨幣對交易進行分組
Map<Currency, List<Transaction>> currencyListMap = getTransactions().stream()
    .collect(groupingBy(Transaction::getCurrency));
for (Map.Entry<Currency, List<Transaction>> entry : currencyListMap.entrySet()) {
    System.out.println(entry.getKey() + "\t" + entry.getValue().size());
}

預約義收集器的功能

將流元素歸約和彙總爲一個值安全
元素分組app
元素分區，分組的特殊狀況，使用謂詞做爲分組函數(謂詞，返回boolean類型的函數)ide

Collectorsors類的靜態工廠方法一覽

// import static java.util.stream.Collectors.*;
Stream<Dish> menuStream = getMenu().stream();

// Collectors類的靜態工廠方法
List<Dish> dishes1 = 
    menuStream.collect(toList());
Set<Dish> dishes2 = 
    menuStream.collect(toSet());
Collection<Dish> dishes3 = 
    menuStream.collect(toCollection(ArrayList::new));
long howManyDishes = 
    menuStream.collect(counting());
int totalCalories = 
    menuStream.collect(summingInt(Dish::getCalories));
double avgCalories = 
    menuStream.collect(averagingInt(Dish::getCalories));
IntSummaryStatistics menuStatistics = 
    menuStream.collect(summarizingInt(Dish::getCalories));
String shortMenu = 
    menuStream.map(Dish::getName).collect(joining(", "));
Optional<Dish> fattest = 
    menuStream.collect(maxBy(comparingInt(Dish::getCalories)));
Optional<Dish> lightest = 
    menuStream.collect(minBy(comparingInt(Dish::getCalories)));
int totalCalories2 = 
    menuStream.collect(reducing(0, Dish::getCalories, Integer::sum));
int howManyDishes2 = 
    menuStream.collect(collectingAndThen(toList(), List::size));
Map<Dish.Type,List<Dish>> dishesByType = 
    menuStream.collect(groupingBy(Dish::getType));
Map<Boolean,List<Dish>> vegetarianDishes = 
    menuStream.collect(partitioningBy(Dish::isVegetarian));

歸約和彙總

匯老是歸約的一種特殊狀況函數

彙總

菜單中有多少種菜性能

// 菜單裏有多少種菜
long howManyDishes = getMenu().stream().collect(Collectors.counting());
System.out.println(howManyDishes); // 8
long howManyDishes2 = getMenu().stream().count();
System.out.println(howManyDishes2); // 8
System.out.println(getMenu().size()); // 8，這樣不是更簡單？？

最大值，最小值和平均值測試

// 菜單中熱量最高的菜
Optional<Dish> mostCalaorieDish = 		 
    getMenu().stream().collect(maxBy(comparingInt(Dish::getCalories)));
System.out.println(mostCalaorieDish.orElse(null)); // pork
// 菜單中熱量最低的菜
Optional<Dish> leastCalaorieDish = 
    getMenu().stream().collect(minBy(comparingInt(Dish::getCalories)));
System.out.println(leastCalaorieDish.orElse(null)); //season
// 菜單中總熱量
int totalCalories = 
    getMenu().stream().collect(summingInt(Dish::getCalories));
System.out.println(totalCalories); // 3850
// 菜單中的平均熱量
OptionalDouble averageCalories = 
    getMenu().stream().mapToDouble(Dish::getCalories).average();
System.out.println(averageCalories.orElse(0d)); // 481.25

一個綜合的方法：求count,sum,min,average,max優化

// 以上彙總數據可用下面一個方法執行
IntSummaryStatistics menuStatistics = getMenu().stream().collect(summarizingInt(Dish::getCalories));
System.out.println(menuStatistics);
// IntSummaryStatistics{count=8, sum=3850, min=120, average=481.250000, max=800}

鏈接字符串joining

// 鏈接字符串
String shortMenu = getMenu().stream()
    .map(Dish::getName) // 省略這步，返回Dish的toString
    .collect(joining());
System.out.println(shortMenu); 
// porkchickenfrench friesriceseasonpizzaprawnssalmon

// 逗號分隔
String shortMenu2 = getMenu().stream()
    .map(Dish::getName)
    .collect(joining(", "));
System.out.println(shortMenu2); 
// pork, chicken, french fries, rice, season, pizza, prawns, salmon

廣義的彙總：歸約

全部收集器，都是一個能夠用reducing工廠方法定義的歸約過程的特殊狀況而已。 Collectors.reducing工廠方法是全部這些特殊狀況的通常化。線程

// Collectors.reducing() 是以上狀況的通常化
// 菜單中總熱量
int totalCalories2 = getMenu().stream()
    .collect(reducing(0,        		// 第一個參數：初始值
					Dish::getCalories, // 第二個參數：轉換函數，要被操做的值
					(i, j) -> i + j)); // 第三個參數：累積函數，求和代碼
System.out.println(totalCalories2); // 3850

// 菜單中熱量最高的菜
Optional<Dish> mostCaloriesDish = getMenu().stream()
    .collect(reducing((d1, d2) -> d1.getCalories() > d2.getCalories() ? d1 : d2));
System.out.println(mostCalaorieDish.orElse(null)); // pork

// collect與reduce
int totalCalories3 = getMenu().stream()
    .map(Dish::getCalories)
    .reduce(Integer::sum)
    .get();
System.out.println(totalCalories3);

分組和分區

按類型對菜餚進行分組

// 按類型分組
Map<Dish.Type, List<Dish>> typeMap = getMenu().stream()
    .collect(groupingBy(Dish::getType));
System.out.println(typeMap);
// {OTHER=[rice, season, pizza], FISH=[prawns, salmon], MEAT=[pork, chicken, french fries]}

// 按熱量分組
Map<CaloricLevel, List<Dish>> dishesByCaloricLevel = getMenu().stream()
    .collect(groupingBy(Dish::getCaloricLevel));
System.out.println(dishesByCaloricLevel);
// {DIET=[french fries, season, prawns], FAT=[pork], NORMAL=[chicken, rice, pizza, salmon]}

多級分組

先按類型分，再按熱量分

// 先按類型分，再按熱量分
Map<Dish.Type, Map<CaloricLevel, List<Dish>>> dishesByTypeCaloriclevel = 
    getMenu().stream()
    	.collect(groupingBy(Dish::getType, groupingBy(Dish::getCaloricLevel)));
System.out.println(dishesByTypeCaloriclevel);
// {OTHER={DIET=[season], NORMAL=[rice, pizza]},
// FISH={DIET=[prawns], NORMAL=[salmon]},
// MEAT={DIET=[french fries], FAT=[pork], NORMAL=[chicken]}}

按子組收集數據

// 每種類型的菜有多少個
Map<Dish.Type, Long> typesCount = getMenu().stream()
    .collect(groupingBy(Dish::getType, counting()));
System.out.println(typesCount);
// {OTHER=3, FISH=2, MEAT=3}
// 注意：groupingBy(f)  等價於 groupingBy(f, toList())

把收集器的結果轉換爲另外一種類型

// 每種類型的中最高熱量的那個菜
Map<Dish.Type, Optional<Dish>> mostCaloricByType = getMenu().stream()
    .collect(groupingBy(Dish::getType, maxBy(comparingInt(Dish::getCalories))));
System.out.println(mostCaloricByType);
// {OTHER=Optional[pizza], FISH=Optional[salmon], MEAT=Optional[pork]}

// 把收集器的結果轉換爲另外一種類型
// 每種類型的中最高熱量的那個菜
Map<Dish.Type, Dish> mostCaloricByType2 = getMenu().stream()
    .collect(groupingBy(Dish::getType, // 分類函數
                        collectingAndThen( // 這是一個收集器
                            maxBy(comparingInt(Dish::getCalories)), // 要轉換的收集器
                            Optional::get))); // 轉換函數

與groupingBy聯合使用的其餘收集器的例子

// 與groupingBy聯合使用的其餘收集器的例子
// 每種類型的總熱量
Map<Dish.Type, Integer> totalCaloriesByType = getMenu().stream()
    .collect(groupingBy(Dish::getType,
                        summingInt(Dish::getCalories)));
System.out.println(totalCaloriesByType);

// 每種類型有哪些熱量類型
// 使用toSet()
Map<Dish.Type, Set<CaloricLevel>> caloricLevelsByType = getMenu().stream()
    .collect(groupingBy(Dish::getType,
                        mapping( 
// 在累加前對每一個輸入元素應用一個映射函數，這樣就可讓接受特定類型元素的收集器適用不一樣類型的對象
                            Dish::getCaloricLevel, // 對流中的元素作變換
                            toSet()))); // 將變換的結果對象收集起來
System.out.println(caloricLevelsByType);
// {FISH=[NORMAL, DIET], MEAT=[FAT, NORMAL, DIET], OTHER=[NORMAL, DIET]}

// 使用toCollection(HashSet::new)
Map<Dish.Type, Set<CaloricLevel>> caloricLevelsByType2 = getMenu().stream()
	.collect(groupingBy(Dish::getType, 
                        mapping(Dish::getCaloricLevel, 
                                toCollection(HashSet::new))));
System.out.println(caloricLevelsByType2);
// {FISH=[NORMAL, DIET], MEAT=[FAT, NORMAL, DIET], OTHER=[NORMAL, DIET]}

特殊狀況：分區

分區是分組的特殊狀況：由一個謂詞(返回一個布爾值的函數)做爲分類函數，它稱爲分區函數。

// 區分素食與非素食
Map<Boolean, List<Dish>> partitionedMenu = getMenu().stream()
    .collect(partitioningBy(Dish::isVegetarian));
System.out.println(partitionedMenu);
// {false=[pork, chicken, french fries, prawns, salmon], true=[rice, season, pizza]}

// 區分素食與非素食，再按類型分類
Map<Boolean, Map<Dish.Type, List<Dish>>> vegetarianDishesByType = getMenu().stream()
    .collect(partitioningBy(Dish::isVegetarian, // 分區函數
                            groupingBy(Dish::getType))); // 收集器

// 素食與非素食中熱量最高的菜
Map<Boolean, Dish> mostCaloricPartitionedByVegetarian = getMenu().stream()
    .collect(partitioningBy(Dish::isVegetarian,
                            collectingAndThen(
                                maxBy(comparing(Dish::getCalories)),
                                Optional::get)));
System.out.println(mostCaloricPartitionedByVegetarian);
// {false=pork, true=pizza}

將數字按質數和非質數分區

判斷質數

// 質數
public boolean isPrime(int candidate) {
    return IntStream.range(2, candidate)
        .noneMatch(i -> candidate % i == 0);
}

// 優化，僅測試小於等於待測試數平方根的因子(限制除數不超過被測試數的平方根)
public boolean isPrime2(int candidate) {
    int candidateRoot = (int) Math.sqrt((double) candidate);
    return IntStream.rangeClosed(2, candidateRoot)
        .noneMatch(i -> candidate % i == 0);
}

將數字按質數和非質數分區

// 將數字按質數和非質數分區
public Map<Boolean, List<Integer>> partitionPrimes(int n) {
    return IntStream.rangeClosed(2, n).boxed()
        .collect(partitioningBy(candidate -> isPrime2(candidate)));
}

自定義收集器

將Stream裏的元素收集到List

/**
 * 將Stream<T>中的全部元素收集到一個List<T>裏
 * Author:   admin
 * Date:     2018/8/15 15:03
 */
public class ToListCollector<T> implements Collector<T, List<T>, List<T>> {
    // T是流中要收集的項目的泛型
    // A是累加器的類型，累加器是在收集過程當中用於累積部分結果的對象。
    // R是收集操做獲得的對象（一般但並不必定是集合）的類型。

    // 創建新的結果容器
    @Override
    public Supplier<List<T>> supplier() {
        // 必須返回一個結果爲空的Supplier，也就是一個元參函數
        // 在調用它時它會建立一個空的累加器實例，供數據收集過程使用
        // return () -> new ArrayList<T>();
        return ArrayList::new; // 修建集合操做的起始點
    }

    // 將元素添加到結果容器
    @Override
    public BiConsumer<List<T>, T> accumulator() {
        // 返回執行歸約操做的函數
        // return (list, item) -> list.add(item);
        return List::add; // 累積遍歷過的項目，原位修改累加器
    }

    // 對結果容器應用最終轉換
    @Override
    public Function<List<T>, List<T>> finisher() {
        return Function.identity(); // 恆等函數
    }

    // 合併兩個結果容器
    @Override
    public BinaryOperator<List<T>> combiner() {
        return (list1, list2) -> { // 合併兩個累加器
            list1.addAll(list2);
            return list1;
        };
    }

    // 返回一個不可變的Characteristics集合
    @Override
    public Set<Characteristics> characteristics() {
        // IDENTITY_FINISH:將累加器A不加檢查地轉換爲結果R是安全的
        // CONCURRENT:accumulator函數能夠從多個線程同時調用，且該收集器能夠並行歸約流
        return Collections.unmodifiableSet( // 爲收集器添加標誌
                EnumSet.of(Characteristics.IDENTITY_FINISH, 
                        Characteristics.CONCURRENT)); 
    }
}

使用

Stream<Dish> menuStream = FakeDb.getMenu().stream();

// 使用已有的收集器
List<Dish> dishes2 = menuStream.collect(Collectors.toList());

// 使用自定義的收集器
List<Dish> dishes = menuStream.collect(new ToListCollector<Dish>());

// 自定義收集而不去實現Collector
List<Dish> dishes3 = menuStream.collect(
    ArrayList::new, /// 供應源
    List::add, // 累加器
    List::addAll // 組合器
);

將數字按質數和非質數分區

/**
 * 將前n個天然數按質數和非質數分區
 * Author:   admin
 * Date:     2018/8/15 15:28
 */
public class PrimeNumbersCollector implements Collector<Integer,
        Map<Boolean, List<Integer>>,
        Map<Boolean, List<Integer>>> {

    @Override
    public Supplier<Map<Boolean, List<Integer>>> supplier() {
        // 從一個有兩個空List的Map開始收集過程
        return () -> new HashMap<Boolean, List<Integer>>() {{
           put(true, new ArrayList<Integer>());
           put(false, new ArrayList<Integer>());
        }};
    }

    @Override
    public BiConsumer<Map<Boolean, List<Integer>>, Integer> accumulator() {
        // 將已經找到的質數列表傳遞給isPrime方法
        return (Map<Boolean, List<Integer>> acc, Integer candidate) -> {
            // 根據isPrime方法返回值，從Map中取質數或非質數列表，把當前的被測數據加進去
            acc.get(isPrime(acc.get(true), candidate)).add(candidate);
        };
    }

    @Override
    public BinaryOperator<Map<Boolean, List<Integer>>> combiner() {
        // 將第2個Map合併到第1個
        return (Map<Boolean, List<Integer>> map1, Map<Boolean, List<Integer>> map2) -> {
            map1.get(true).addAll(map2.get(true));
            map1.get(false).addAll(map2.get(false));
            return map1;
        };
    }

    @Override
    public Function<Map<Boolean, List<Integer>>, Map<Boolean, List<Integer>>> finisher() {
        return Function.identity();
    }

    @Override
    public Set<Characteristics> characteristics() {
        // 質數是按順序發現的
        return Collections.unmodifiableSet(EnumSet.of(Characteristics.IDENTITY_FINISH));
    }

    // 再優化，僅僅用被測試數以前的質數來測試
    public static boolean isPrime(List<Integer> primes, int candidate) {
        // return primes.stream().noneMatch(i -> candidate % i == 0);
        int candidateRoot = (int) Math.sqrt((double) candidate);
        return takeWhile(primes, i -> i <= candidateRoot)
                .stream()
                .noneMatch(p -> candidate %p == 0);
    }

    public static <A> List<A> takeWhile(List<A> list, Predicate<A> p) {
        int i = 0;
        for (A item : list) {
            if (!p.test(item)) { // 檢查列表中的當前項目是否知足謂詞
                return list.subList(0, i); // 若是不知足，返回以前的列表
            }
            i++;
        }
        return list; // 都知足，返回所有
    }
}

使用

// 使用自定義的素數收集器 實現 將數字按質數和非質數分區
public Map<Boolean, List<Integer>> partitionPrimesWithCustomCollector(int n) {
    return IntStream.rangeClosed(2, n).boxed()
        .collect(new PrimeNumbersCollector());
}

比較收集器的性能

@Test
public void test08() {
    long fastest = Long.MAX_VALUE;
    for (int i=0; i<10; i++) {
        long start = System.nanoTime();
        partitionPrimes(1_000_000);
        long duration = (System.nanoTime() - start) / 1_000_000;
        if (duration < fastest) fastest = duration;
    }
    System.out.println("Fastest execution done in " + fastest + " msecs");
    // Fastest execution done in 371 msecs
}

@Test
public void test09() {
    long fastest = Long.MAX_VALUE;
    for (int i=0; i<10; i++) {
        long start = System.nanoTime();
        partitionPrimesWithCustomCollector(1_000_000);
        long duration = (System.nanoTime() - start) / 1_000_000;
        if (duration < fastest) fastest = duration;
    }
    System.out.println("Fastest execution done in " + fastest + " msecs");
    // Fastest execution done in 294 msecs
}

環境：