【轉載】MapReduce編程 Intellij Idea配置MapReduce編程環境

 

介紹如何在Intellij Idea中經過建立maven工程配置MapReduce的編程環境。java

1、軟件環境

我使用的軟件版本以下:node

  1. Intellij Idea 2017.1
  2. Maven 3.3.9
  3. Hadoop僞分佈式環境( 安裝教程可參考這裏)

2、建立maven工程

打開Idea,file->new->Project,左側面板選擇maven工程。(若是隻跑MapReduce建立Java工程便可,不用勾選Creat from archetype,若是想建立web工程或者使用骨架能夠勾選) 
這裏寫圖片描述 
設置GroupId和ArtifactId,下一步。 
這裏寫圖片描述 
設置工程存儲路徑,下一步。 
這裏寫圖片描述 
Finish以後,空白工程的路徑以下圖所示。web

這裏寫圖片描述

完整的工程路徑以下圖所示: 
這裏寫圖片描述sql

3、添加maven依賴

在pom.xml添加依賴,對於Hadoop 2.7.3版本的hadoop,須要的jar包有如下幾個:apache

  • hadoop-common
  • hadoop-hdfs
  • hadoop-mapreduce-client-core
  • hadoop-mapreduce-client-jobclient
  • log4j( 打印日誌)編程

    pom.xml中的依賴以下:數組

<dependencies> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.12</version> <scope>test</scope> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>2.7.3</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-hdfs</artifactId> <version>2.7.3</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-mapreduce-client-core</artifactId> <version>2.7.3</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-mapreduce-client-jobclient</artifactId> <version>2.7.3</version> </dependency> <dependency> <groupId>log4j</groupId> <artifactId>log4j</artifactId> <version>1.2.17</version> </dependency> </dependencies>

4、配置log4j

src/main/resources目錄下新增log4j的配置文件log4j.properties,內容以下:安全

log4j.rootLogger = debug,stdout ### 輸出信息到控制擡 ### log4j.appender.stdout = org.apache.log4j.ConsoleAppender log4j.appender.stdout.Target = System.out log4j.appender.stdout.layout = org.apache.log4j.PatternLayout log4j.appender.stdout.layout.ConversionPattern = [%-5p] %d{yyyy-MM-dd HH:mm:ss,SSS} method:%l%n%m%n 

5、啓動Hadoopmarkdown

啓動Hadoop,運行命令:

cd hadoop-2.7.3/
./sbin/start-all.sh

 

訪問http://localhost:50070/查看hadoop是否正常啓動。

6、運行WordCount(從本地讀取文件)

在工程根目錄下新建input文件夾,input文件夾下新增dream.txt,隨便寫入一些單詞:

I have a dream a dream

 

在src/main/java目錄下新建包,新增FileUtil.java,建立一個刪除output文件的函數,之後就不用手動刪除了。內容以下:

package com.mrtest.hadoop; import java.io.File; /** * Created by bee on 3/25/17. */ public class FileUtil { public static boolean deleteDir(String path) { File dir = new File(path); if (dir.exists()) { for (File f : dir.listFiles()) { if (f.isDirectory()) { deleteDir(f.getName()); } else { f.delete(); } } dir.delete(); return true; } else { System.out.println("文件(夾)不存在!"); return false; } } }

 

編寫WordCount的MapReduce程序WordCount.java,內容以下:

package com.mrtest.hadoop; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import java.io.IOException; import java.util.Iterator; import java.util.StringTokenizer; /** * Created by bee on 3/25/17. */ public class WordCount { public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> { public static final IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { this.word.set(itr.nextToken()); context.write(this.word, one); } } } public static class IntSumReduce extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; IntWritable val; for (Iterator i = values.iterator(); i.hasNext(); sum += val.get()) { val = (IntWritable) i.next(); } this.result.set(sum); context.write(key, this.result); } } public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException { FileUtil.deleteDir("output"); Configuration conf = new Configuration(); String[] otherArgs = new String[]{"input/dream.txt","output"}; if (otherArgs.length != 2) { System.err.println("Usage:Merge and duplicate removal <in> <out>"); System.exit(2); } Job job = Job.getInstance(conf, "WordCount"); job.setJarByClass(WordCount.class); job.setMapperClass(WordCount.TokenizerMapper.class); job.setReducerClass(WordCount.IntSumReduce.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(otherArgs[0])); FileOutputFormat.setOutputPath(job, new Path(otherArgs[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } } 

 

運行完畢之後,會在工程根目錄下增長一個output文件夾,打開output/part-r-00000,內容以下:

I 1 a 2 dream 2 have 1

 

這裏在main函數中新增了一個String類型的數組,若是想用main函數的args數組接受參數,在運行時指定輸入和輸出路徑也是能夠的。運行WordCount以前,配置Configuration並指定Program arguments便可。 
這裏寫圖片描述


7、運行WordCount(從HDFS讀取文件)

在HDFS上新建文件夾:

hadoop fs -mkdir /worddir

若是出現Namenode安全模式致使的不能建立文件夾提示:

mkdir: Cannot create directory /worddir. Name node is in safe mode.

運行如下命令關閉safe mode:

hadoop dfsadmin -safemode leave

上傳本地文件:

hadoop fs -put dream.txt /worddir

修改otherArgs參數,指定輸入爲文件在HDFS上的路徑:

String[] otherArgs = new String[]{"hdfs://localhost:9000/wo

驗證過程:

 

相關文章
相關標籤/搜索