hadoop之 mapreduce Combiner

時間 2019-11-11

標籤 hadoop mapreduce combiner 欄目 Hadoop 简体版

原文原文鏈接

許多mapreduce做業會受限與集羣的帶寬，所以儘可能下降map和reduce任務之間的數據傳輸是有必要的。Hadoop容許用戶針對map任務的輸出指定一個combiner函數處理map任務的輸出，並做爲reduce函數的輸入。由於combine是優化方案，因此Hadoop沒法肯定針對map輸出記錄須要調用多少次combine函數。in the other word，無論調用多少次combine函數，reducer的輸出結果都是同樣的。
The contract for the combiner function constrains the type of function that may be used。
combiner函數協議會制約可用的函數類型。舉個例子：app

假設第一個map輸出以下：函數

(1950, 0)
(1950, 20)
(1950, 10)

第二個map輸出以下：oop

(1950, 25)
(1950, 15)

reduce函數被調用時，其輸入是優化

(1950, [0, 20, 10, 25, 15])

結果：code

(1950, 25)

若是調用combine函數，像reduce函數同樣去尋找每一個map的輸出的最大溫度。那麼輸出結果應該是：orm

(1950, [20, 25])

reduce 輸出結果和之前同樣。可用經過下面的表達式來講明氣溫數值的函數調用：get

max(0, 20, 10, 25, 15) = max(max(0, 20, 10), max(25, 15)) = max(20, 25) = 25

並非全部函數都有這個屬性。例如，咱們計算平均氣溫，就不能使用平均函數做爲combiner。it

mean(0, 20, 10, 25, 15) = 14

可是：io

mean(mean(0, 20, 10), mean(25, 15)) = mean(10, 20) = 15

combiner函數不能取代reducer。但它能有效減小mapper和reducer之間的數據傳輸量。table

指定一個 combiner

Job job = Job.getInstance();
            job.setJarByClass(MaxTemperatureJob.class);
            job.setJobName("max temperature");
            //方法爲何不保持一致，不是一我的寫的？
            FileInputFormat.addInputPath(job, new Path(INPUT_PATH));
            FileOutputFormat.setOutputPath(job, new Path(OUT_PATH));

            job.setMapperClass(MaxTemperatureMapper.class);
            job.setReducerClass(MaxTemperatureReducer.class);
            //設置combiner
            job.setCombinerClass(MaxTemperatureReducer.class);
            
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(IntWritable.class);
            
           // job.setInputFormatClass();

            System.out.println(job.waitForCompletion(true) ? 0 : 1);

1. Mapreduce之Combiner
2. Hadoop MapReduce中Combiner作用
3. [Hadoop]MapReduce中的Partitioner與Combiner
4. MapReduce編程之Combiner
5. MapReduce之Combiner合併
6. MapReduce的combiner
7. Hadoop學習之Combiner
8. MapReduce框架-combiner
9. hadoop筆記八：Combiner優化MapReduce
10. hadoop之mapReduce
更多相關文章...
• TiDB數據庫的應用場景 - NoSQL教程
• PHP localeconv() 函數 - PHP參考手冊
• 互聯網組織的未來：剖析GitHub員工的任性之源
• Java 8 Stream 教程

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。