Mapreduce實例--求平均值

求平均數是MapReduce比較常見的算法，求平均數的算法也比較簡單，一種思路是Map端讀取數據，在數據輸入到Reduce以前先通過shuffle，將map函數輸出的key值相同的全部的value值造成一個集合value-list，而後將輸入到Reduce端，Reduce端彙總而且統計記錄數，而後做商便可。具體原理以下圖所示：java

操做環境：算法

Centos 7apache

jdk 1.8app

hadoop-3.2.0maven

IDEA2019函數

實現內容：oop

將自定義的電商關於商品點擊狀況的數據文件，包含兩個字段(商品分類，商品點擊次數)，用"\t"分割，相似以下：spa

商品分類 商品點擊次數 52127    5
52120    93
52092    93
52132    38
52006    462
52109    28
52109    43
52132    0
52132    34
52132    9
52132    30
52132    45
52132    24
52009    2615
52132    25
52090    13
52132    6
52136    0
52090    10
52024    347

使用mapreduce統計出每類商品的平均點擊次數3d

商品分類 商品平均點擊次數 52006    462
52009    2615
52024    347
52090    11
52092    93
52109    35
52120    93
52127    5
52132    23
52136    0

1、啓動Hadoop集羣，上傳數據集文件到hdfscode

hadoop fs -mkdir -p /mymapreduce4/in hadoop fs -put /data/mapreduce4/goods_click /mymapreduce4/in

2、在IDEA中建立java項目，導入Jar包，若是清楚本身使用的集羣的jar包，可以使用maven導入指定的jar包。

　　爲了不Jar包衝突，使用hadoop/share目錄下的全部jar包。

3、編寫java代碼程序

package mapreduce; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.NullWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; public class MyAverage{ public static class Map extends Mapper<Object , Text , Text , IntWritable>{ private static Text newKey=new Text(); public void map(Object key,Text value,Context context) throws IOException, InterruptedException{ //將輸入的純文本文件數據轉化成string
            String line=value.toString(); System.out.println(line); //將值經過split()方法截取出來
            String arr[]=line.split("\t"); newKey.set(arr[0]); int click=Integer.parseInt(arr[1]); //將數據和值輸入到reduce處理
            context.write(newKey, new IntWritable(click)); } } public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable>{ public void reduce(Text key,Iterable<IntWritable> values,Context context) throws IOException, InterruptedException{ int num=0; int count=0; for(IntWritable val:values){ //每一個元素求和num
                num+=val.get(); //統計元素的次數count
                count++; } //統計次數
            int avg=num/count; context.write(key,new IntWritable(avg)); } } public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException{ Configuration conf=new Configuration(); System.out.println("start"); Job job =new Job(conf,"MyAverage"); job.setJarByClass(MyAverage.class); job.setMapperClass(Map.class); job.setReducerClass(Reduce.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); Path in=new Path("hdfs://172.18.74.137:9000/mymapreduce4/in/goods_click"); Path out=new Path("hdfs://172.18.74.137:9000/mymapreduce4/out"); FileInputFormat.addInputPath(job,in); FileOutputFormat.setOutputPath(job,out); System.exit(job.waitForCompletion(true) ? 0 : 1); } }

4、運行查看結果

hadoop fs -ls /mymapreduce4/out hadoop fs -cat /mymapreduce4/out/part-r-00000