IDEA遠程提交hadoop任務

時間 2019-12-07

原文原文鏈接

IDEA遠程提交hadoop任務

新建maven項目，添加以下依賴

<dependencies>
     <dependency>
         <groupId>org.apache.hadoop</groupId>
         <artifactId>hadoop-common</artifactId>
         <version>2.7.1</version>
     </dependency>

     <dependency>
         <groupId>org.apache.hadoop</groupId>
         <artifactId>hadoop-mapreduce-client-core</artifactId>
         <version>2.7.1</version>
     </dependency>

     <dependency>
         <groupId>org.apache.hadoop</groupId>
         <artifactId>hadoop-hdfs</artifactId>
         <version>2.7.1</version>
     </dependency>

     <dependency>
         <groupId>org.apache.hadoop</groupId>
         <artifactId>hadoop-mapreduce-client-jobclient</artifactId>
         <version>2.7.1</version>
     </dependency>
 </dependencies>

編寫Map處理

public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
      @Override
      protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
          String line = value.toString();
          System.out.println("行值:" + line);
          StringTokenizer tokenizer = new StringTokenizer(line, "\n");
          while (tokenizer.hasMoreTokens()) {
              StringTokenizer tokenizerLine = new StringTokenizer(tokenizer.nextToken());
              String strName = tokenizerLine.nextToken();
              String strScore = tokenizerLine.nextToken();
              Text name = new Text(strName);
              int score = Integer.parseInt(strScore);
              context.write(name, new IntWritable(score));
          }
      }
  }

編寫Reduce處理

public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
      @Override
      protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
          int sum = 0;
          int count = 0;
          Iterator<IntWritable> iterator = values.iterator();
          while (iterator.hasNext()) {
              sum += iterator.next().get();
              count++;
          }
          int average = sum / count;
          context.write(key, new IntWritable(average));
      }
  }

main函數

System.setProperty("HADOOP_USER_NAME", "wujinlei");
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://master:9000");
conf.set("mapreduce.app-submission.cross-platform", "true");
conf.set("mapred.jar", "E:\\JackManWu\\hadoo-ptest\\target\\hadoop-test-1.0-SNAPSHOT.jar");
conf.set("fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
Job job = Job.getInstance(conf, "student_score");
job.setJarByClass(StudentScore.class);//要執行的jar中的類

job.setMapperClass(Map.class);
job.setCombinerClass(Reduce.class);
job.setReducerClass(Reduce.class);

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

FileInputFormat.addInputPath(job, new Path("hdfs://master:9000/home/wujinlei/work/student/input"));
FileOutputFormat.setOutputPath(job, new Path("hdfs://master:9000/home/wujinlei/work/student/output"));
System.exit(job.waitForCompletion(true) ? 0 : 1);

準備好home/wujinlei/work/student/input輸入文件，參照第一個hadoop程序-WordCount中的建立輸入文件部分，在集羣上預先準備好輸入文件（ps：home/wujinlei/work/student/output不用準備，系統自動生成輸出）
- 樣例輸入文件：
```
陳洲立 67
陳東偉 98
李寧 87
楊森 86
劉東奇 78
譚果 94
蓋蓋 83
陳洲立 68
陳東偉 96
李寧 82
楊森 85
劉東奇 72
譚果 97
蓋蓋 82
```
執行main函數，結合hadoop日誌，在任務頁面查看任務執行狀況，檢驗最終生成的結果。