2018-07-07期 Hadoop本地運行模式配置

詳細來源:05-Hadoop本地運行模式配置java

在Windows開發環境中實現Hadoop的本地運行模式,詳細步驟以下: linux

一、在本地安裝好jdk、hadoop2.4.1,並配置好環境變量:JAVA_HOME、HADOOP_HOME、Path路徑(配置好環境變量後最好重啓電腦)apache

二、用hadoop-common-2.2.0-bin-master的bin目錄替換本地hadoop2.4.1的bin目錄,由於hadoop2.0版本中沒有hadoop.dll和winutils.exe這兩個文件。 windows

若是缺乏hadoop.dll和winutils.exe話,程序將會拋出下面異常:app

java.io.IOException: Could not locate executable D:\hadoop-2.4.1\bin\winutils.exe in the Hadoop binaries.eclipse

java.lang.Exception: java.lang.NullPointerException分佈式

因此用hadoop-common-2.2.0-bin-master的bin目錄替換本地hadoop2.4.1的bin目錄是必要的一個步驟。 ide

注意:若是隻是將hadoop-common-2.2.0-bin-master的bin目錄中的hadoop.dll和winutils.exe這兩個文件添加到hadoop2.4.1的bin目錄中,也是可行的,但最好用用hadoop-common-2.2.0-bin-master的bin目錄替換本地hadoop2.4.1的bin目錄。 oop

上面這兩個步驟完成以後咱們就能夠跑程序了,從而實現Hadoop的本地運行模式: spa

首先輸入輸出路徑都選擇windows的文件系統: 

代碼以下:

代碼以下:

package MapReduce;


import java.io.IOException;



import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.FileSystem;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

import org.apache.hadoop.mapreduce.lib.partition.HashPartitioner;



public class WordCount

{

     public static String path1 = "file:///C:\\word.txt";//讀取本地windows文件系統中的數據

     public static String path2 = "file:///D:\\dir";

     public static void main(String[] args) throws Exception

     {

         Configuration conf = new Configuration();

         FileSystem fileSystem = FileSystem.get(conf);


         if(fileSystem.exists(new Path(path2)))

         {

             fileSystem.delete(new Path(path2), true);

         }

         Job job = Job.getInstance(conf);

         job.setJarByClass(WordCount.class);


         FileInputFormat.setInputPaths(job, new Path(path1));

         job.setInputFormatClass(TextInputFormat.class);

         job.setMapperClass(MyMapper.class);

         job.setMapOutputKeyClass(Text.class);

         job.setMapOutputValueClass(LongWritable.class);


         job.setNumReduceTasks(1);

         job.setPartitionerClass(HashPartitioner.class);



         job.setReducerClass(MyReducer.class);

         job.setOutputKeyClass(Text.class);

         job.setOutputValueClass(LongWritable.class);

         job.setOutputFormatClass(TextOutputFormat.class);

         FileOutputFormat.setOutputPath(job, new Path(path2));

         job.waitForCompletion(true);

     }    

     public  static  class MyMapper extends Mapper<LongWritable, Text, Text, LongWritable>

     {

             protected void map(LongWritable k1, Text v1,Context context)throws IOException, InterruptedException

            {

                 String[] splited = v1.toString().split("\t");

                 for (String string : splited)

                {

                       context.write(new Text(string),new LongWritable(1L));

                }

            }     

     }

     public  static class MyReducer extends Reducer<Text, LongWritable, Text, LongWritable>

     {

        protected void reduce(Text k2, Iterable<LongWritable> v2s,Context context)throws IOException, InterruptedException

        {

                 long sum = 0L;

                 for (LongWritable v2 : v2s)

                {

                    sum += v2.get();

                }

                context.write(k2,new LongWritable(sum));

        }

     }

}



在dos下查看運行中的java進程: 


其中28568爲windows中啓動的eclipse進程。 

接下來咱們查看運行結果: 

這裏寫圖片描述

 

part-r-00000中的內容以下:

hello   2me  1you 1

接下來輸入路徑選擇windows本地,輸出路徑換成HDFS文件系統,代碼以下:

package MapReduce;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.FileSystem;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

import org.apache.hadoop.mapreduce.lib.partition.HashPartitioner;

public class WordCount

{

public static String path1 = "file:///C:\\word.txt";//讀取windows文件系統中的數據

public static String path2 = "hdfs://hadoop20:9000/dir";//輸出到hdfs中

public static void main(String[] args) throws Exception

{

Configuration conf = new Configuration();

FileSystem fileSystem = FileSystem.get(conf);

if(fileSystem.exists(new Path(path2)))

{

fileSystem.delete(new Path(path2), true);

}

Job job = Job.getInstance(conf);

job.setJarByClass(WordCount.class);

FileInputFormat.setInputPaths(job, new Path(path1));

job.setInputFormatClass(TextInputFormat.class);

job.setMapperClass(MyMapper.class);

job.setMapOutputKeyClass(Text.class);

job.setMapOutputValueClass(LongWritable.class);

job.setNumReduceTasks(1);

job.setPartitionerClass(HashPartitioner.class);

job.setReducerClass(MyReducer.class);

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(LongWritable.class);

job.setOutputFormatClass(TextOutputFormat.class);

FileOutputFormat.setOutputPath(job, new Path(path2));

job.waitForCompletion(true);

}

public  static  class MyMapper extends Mapper<LongWritable, Text, Text, LongWritable>

{

protected void map(LongWritable k1, Text v1,Context context)throws IOException, InterruptedException

{

String[] splited = v1.toString().split("\t");

for (String string : splited)

{

context.write(new Text(string),new LongWritable(1L));

}

}

}

public  static class MyReducer extends Reducer<Text, LongWritable, Text, LongWritable>

{

protected void reduce(Text k2, Iterable<LongWritable> v2s,Context context)throws IOException, InterruptedException

{

long sum = 0L;

for (LongWritable v2 : v2s)

{

sum += v2.get();

}

context.write(k2,new LongWritable(sum));

}

}

}

程序拋出異常: 

這裏寫圖片描述

處理措施同上:

Configuration conf = new Configuration(); conf.set("fs.defaultFS", "hdfs://hadoop20:9000/"); FileSystem fileSystem = FileSystem.get(conf);//獲取HDFS中的FileSystem實例

查看運行結果:

[root@hadoop20 dir4]# hadoop fs -cat /dir/part-r-00000hello   2me      1you     1

好的,到這裏hadoop的本地文件系統就講述完了,注意一下幾點: 

一、file:\\ 表明本地文件系統,hdfs:// 表明hdfs分佈式文件系統 

二、linux下的hadoop本地運行模式很簡單,可是windows下的hadoop本地運行模式須要配置相應文件。 

三、MapReduce所用的文件放在哪裏是沒有關係的(能夠放在Windows本地文件系統、能夠放在Linux本地文件系統、也能夠放在HDFS分佈式文件系統中),最後是經過FileSystem這個實例來獲取文件的。 

若有問題,歡迎留言指正!

注意:若是用戶用的是Hadoop1.0版本,而且是Windows環境下實現本地運行模式,則只需設置HADOOP_HOME與PATH路徑,其他不用任何設置!

--Exception: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z 

錯誤

Windows的惟一方法用於檢查當前進程的請求,在給定的路徑的訪問權限,因此咱們先給以能進行訪問,咱們本身先修改源代碼,return true 時容許訪問。咱們下載對應hadoop源代碼,hadoop-2.7.3-src.tar.gz解壓,hadoop-2.7.3-src\hadoop-common-project\hadoop-common\src\main\java\org\apache\hadoop\io\nativeio下NativeIO.java 複製到對應的Eclipse的project

即:把紅色源碼進行修改

修改成返回true

問題解決

處理方式:

第一步:下載hadoo2.7.3的hadoop.dll和winutils.exe.zip賦值覆蓋hadoop本地bin下,同時拷貝到C:\Windows\System32下(覆蓋)

第二步:項目下新建包名org.apache.hadoop.io.nativeio新建類NativeIO,接下來再次在Windows下運行eclipse中的Hadoop程序,Ok

相關文章
相關標籤/搜索