MapReduce工程(IDEA)

MapReduce工程(IDEA)


 

 

1. maven工程

1.1 建立maven工程

  1. 選擇建立工程。

 

建立工程
建立工程

 

  1. 選擇Maven工程,不選模板。

 

maven選項
maven選項

 

  1. 填好座標,選擇項目存放地址,建立工程。

 

座標
座標

 

1.2 修改配置文件

  1. 修改pom.xml,mainClass選擇本身的入口類以下:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>per.hao</groupId>
    <artifactId>MapReduceTest</artifactId>
    <version>1.0</version>


    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <java.version>1.8</java.version>
        <hadoop.version>2.7.2</hadoop.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>RELEASE</version>
        </dependency>
        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-core</artifactId>
            <version>2.8.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>${hadoop.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>${hadoop.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>${hadoop.version}</version>
        </dependency>
    </dependencies>

    <!-- 構建打包插件, mainClass選擇本身的入口類 -->
    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.8.0</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                    <encoding>UTF-8</encoding>
                </configuration>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-jar-plugin</artifactId>
                <configuration>
                    <archive>
                        <manifest>
                            <addClasspath>true</addClasspath>
                            <!-- 指定入口函數 -->
                            <mainClass>per.hao.mapreduce.MRMainClass</mainClass>
                            <!-- 是否添加依賴的jar路徑配置 -->
                            <addClasspath>false</addClasspath>
                            <!-- 依賴的jar包存放位置,和生成的jar放在同一級目錄下 -->
                            <!--<classpathPrefix>lib/</classpathPrefix>-->
                        </manifest>
                    </archive>
                </configuration>
            </plugin>
            <plugin>
                <artifactId>maven-assembly-plugin</artifactId>
                <configuration>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                    <archive>
                        <manifest>
                            <mainClass>per.hao.mapreduce.MRMainClass</mainClass>
                        </manifest>
                    </archive>
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>

</project>
  1. 在項目的src/main/resources目錄下,新建一個文件,命名爲log4j.properties,在文件中填入
log4j.rootLogger=INFO, stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n
log4j.appender.logfile=org.apache.log4j.FileAppender
log4j.appender.logfile.File=target/spring.log
log4j.appender.logfile.layout=org.apache.log4j.PatternLayout
log4j.appender.logfile.layout.ConversionPattern=%d %p [%c] - %m%n

1.3 Mapper類

package per.hao.mapreduce.wordcount;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

/** * 輸入: * 行讀取偏移量:LongWritable * 每行內容:Text * 輸出: * 單詞:Text * 單詞計數:IntWritable * */
public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

    private Text k = new Text();
    private static final IntWritable ONE = new IntWritable(1);

    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        // 獲取一行數據
        String line = value.toString();

        // 切分
        String[] words = line.split("\\s");

        // 輸出
        for (String word : words) {
            k.set(word);
            context.write(k, ONE);
        }
    }
}

1.4 Reduces類

package per.hao.mapreduce.wordcount;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;

public class WordCountReduce extends Reducer<Text, IntWritable, Text, IntWritable> {

    private int sum;
    private IntWritable v = new IntWritable();

    @Override
    protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
        // 累加求和
        sum = 0;
        for (IntWritable count : values) {
            sum += count.get();
        }

        // 輸出
        v.set(sum);
        context.write(key, v);
    }
}

1.5 Driver類

package per.hao.mapreduce.wordcount;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;

public class WordCountDriver {
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        // 獲取配置並根據配置獲取任務實例
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf);

        //設置jar加載路徑
        job.setJarByClass(WordCountDriver.class);

        // 設置Mapper、Reduce類
        job.setMapperClass(WordCountMapper.class);
        job.setReducerClass(WordCountReduce.class);

        // 設置Mapper輸出
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);

        // 設置最終輸出
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        // 設置輸入輸出路徑
        if (args.length < 2) {
            System.out.println("須要指定輸入輸出路徑");
            System.exit(1);
        } else {
            FileInputFormat.setInputPaths(job, new Path(args[0]));
            FileOutputFormat.setOutputPath(job, new Path(args[1]));
        }

        // 提交任務
        boolean result = job.waitForCompletion(true);
        System.exit(result ? 0 : 1);

    }
}

1.6 入口類

package per.hao.mapreduce;

import org.apache.hadoop.util.ProgramDriver;
import per.hao.mapreduce.wordcount.WordCountDriver;

public class MRMainClass {
    public static void main(String[] args) {
        int exitCode = -1;
        ProgramDriver pd = new ProgramDriver();

        try {
            pd.addClass("wordcount", WordCountDriver.class, "個人MapReduce測試程序-WordCount");

            exitCode = pd.run(args);
        } catch (Throwable throwable) {
            throwable.printStackTrace();
        }

        System.exit(exitCode);
    }
}

1.7 測試

  1. 打包jar包
mvn clean test package

 

打包好的jar
打包好的jar

 

  1. 上傳jar到服務器php

  2. 建立文件word.txt,內容以下:html

export	HADOOP_CLUSTER_NAME	myhadoop
export	HADOOP_TMP_DIR	hdata	hadoop
hdata	export
HADOOP_TMP_DIR	myhadoop	export
  1. 建立文件到指定路徑
# 建立路徑
/opt/cluster/hadoop/bin/hadoop fs -mkdir -p /mapreduce/test/input/20180702;
# 上傳
/opt/cluster/hadoop/bin/hadoop fs -put ./word.txt /mapreduce/test/input/20180702;
  1. 測試運行wordcount
/opt/cluster/hadoop/bin/hadoop jar ./MapReduceTest-1.0.jar wordcount /mapreduce/test/input/20180702 /mapreduce/test/output/20180702;
  1. 結果

 

輸出結果
輸出結果

 

2. 普通工程

注: 相比maven的經過pom.xml配置文件配置依賴與打包;普通工程 手動添加依賴打包java

2.1 添加依賴

  1. 點擊File -> Project Structure
  2. 點擊Modules -> 選擇項目 -> Dependencies -> JARs or dir…

 

依賴添加界面
依賴添加界面

 

2.2 打包

  1. 點擊File -> Project Structure。
  2. 依次點擊圖片所示藍色部分。

 

添加打包
添加打包

 

  1. 選擇mainClass與依賴打包選項,點擊OK。

 

打包選項
打包選項

 

 

配置完成
配置完成

 

  1. 選擇打包,彈出窗口選擇build,rebuild…

 

打包
打包

 

  1. 輸出目錄,找到輸出的jar

 

輸出目錄
輸出目錄

 

 

輸出的jar
輸出的jar
相關文章
相關標籤/搜索