詳細來源:05-Hadoop本地運行模式配置java
在Windows開發環境中實現Hadoop的本地運行模式,詳細步驟以下: linux
一、在本地安裝好jdk、hadoop2.4.1,並配置好環境變量:JAVA_HOME、HADOOP_HOME、Path路徑(配置好環境變量後最好重啓電腦)apache
二、用hadoop-common-2.2.0-bin-master的bin目錄替換本地hadoop2.4.1的bin目錄,由於hadoop2.0版本中沒有hadoop.dll和winutils.exe這兩個文件。 windows
若是缺乏hadoop.dll和winutils.exe話,程序將會拋出下面異常:app
java.io.IOException: Could not locate executable D:\hadoop-2.4.1\bin\winutils.exe in the Hadoop binaries.eclipse
java.lang.Exception: java.lang.NullPointerException分佈式
因此用hadoop-common-2.2.0-bin-master的bin目錄替換本地hadoop2.4.1的bin目錄是必要的一個步驟。 ide
注意:若是隻是將hadoop-common-2.2.0-bin-master的bin目錄中的hadoop.dll和winutils.exe這兩個文件添加到hadoop2.4.1的bin目錄中,也是可行的,但最好用用hadoop-common-2.2.0-bin-master的bin目錄替換本地hadoop2.4.1的bin目錄。 oop
上面這兩個步驟完成以後咱們就能夠跑程序了,從而實現Hadoop的本地運行模式: spa
首先輸入輸出路徑都選擇windows的文件系統:
代碼以下:
代碼以下:
package MapReduce;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.mapreduce.lib.partition.HashPartitioner;
public class WordCount
{
public static String path1 = "file:///C:\\word.txt";//讀取本地windows文件系統中的數據
public static String path2 = "file:///D:\\dir";
public static void main(String[] args) throws Exception
{
Configuration conf = new Configuration();
FileSystem fileSystem = FileSystem.get(conf);
if(fileSystem.exists(new Path(path2)))
{
fileSystem.delete(new Path(path2), true);
}
Job job = Job.getInstance(conf);
job.setJarByClass(WordCount.class);
FileInputFormat.setInputPaths(job, new Path(path1));
job.setInputFormatClass(TextInputFormat.class);
job.setMapperClass(MyMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(LongWritable.class);
job.setNumReduceTasks(1);
job.setPartitionerClass(HashPartitioner.class);
job.setReducerClass(MyReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileOutputFormat.setOutputPath(job, new Path(path2));
job.waitForCompletion(true);
}
public static class MyMapper extends Mapper<LongWritable, Text, Text, LongWritable>
{
protected void map(LongWritable k1, Text v1,Context context)throws IOException, InterruptedException
{
String[] splited = v1.toString().split("\t");
for (String string : splited)
{
context.write(new Text(string),new LongWritable(1L));
}
}
}
public static class MyReducer extends Reducer<Text, LongWritable, Text, LongWritable>
{
protected void reduce(Text k2, Iterable<LongWritable> v2s,Context context)throws IOException, InterruptedException
{
long sum = 0L;
for (LongWritable v2 : v2s)
{
sum += v2.get();
}
context.write(k2,new LongWritable(sum));
}
}
}
在dos下查看運行中的java進程:
其中28568爲windows中啓動的eclipse進程。
接下來咱們查看運行結果:
part-r-00000中的內容以下:
hello 2me 1you 1
接下來輸入路徑選擇windows本地,輸出路徑換成HDFS文件系統,代碼以下:
package MapReduce;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.mapreduce.lib.partition.HashPartitioner;
public class WordCount
{
public static String path1 = "file:///C:\\word.txt";//讀取windows文件系統中的數據
public static String path2 = "hdfs://hadoop20:9000/dir";//輸出到hdfs中
public static void main(String[] args) throws Exception
{
Configuration conf = new Configuration();
FileSystem fileSystem = FileSystem.get(conf);
if(fileSystem.exists(new Path(path2)))
{
fileSystem.delete(new Path(path2), true);
}
Job job = Job.getInstance(conf);
job.setJarByClass(WordCount.class);
FileInputFormat.setInputPaths(job, new Path(path1));
job.setInputFormatClass(TextInputFormat.class);
job.setMapperClass(MyMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(LongWritable.class);
job.setNumReduceTasks(1);
job.setPartitionerClass(HashPartitioner.class);
job.setReducerClass(MyReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileOutputFormat.setOutputPath(job, new Path(path2));
job.waitForCompletion(true);
}
public static class MyMapper extends Mapper<LongWritable, Text, Text, LongWritable>
{
protected void map(LongWritable k1, Text v1,Context context)throws IOException, InterruptedException
{
String[] splited = v1.toString().split("\t");
for (String string : splited)
{
context.write(new Text(string),new LongWritable(1L));
}
}
}
public static class MyReducer extends Reducer<Text, LongWritable, Text, LongWritable>
{
protected void reduce(Text k2, Iterable<LongWritable> v2s,Context context)throws IOException, InterruptedException
{
long sum = 0L;
for (LongWritable v2 : v2s)
{
sum += v2.get();
}
context.write(k2,new LongWritable(sum));
}
}
}
程序拋出異常:
處理措施同上:
Configuration conf = new Configuration(); conf.set("fs.defaultFS", "hdfs://hadoop20:9000/"); FileSystem fileSystem = FileSystem.get(conf);//獲取HDFS中的FileSystem實例
查看運行結果:
[root@hadoop20 dir4]# hadoop fs -cat /dir/part-r-00000hello 2me 1you 1
好的,到這裏hadoop的本地文件系統就講述完了,注意一下幾點:
一、file:\\ 表明本地文件系統,hdfs:// 表明hdfs分佈式文件系統
二、linux下的hadoop本地運行模式很簡單,可是windows下的hadoop本地運行模式須要配置相應文件。
三、MapReduce所用的文件放在哪裏是沒有關係的(能夠放在Windows本地文件系統、能夠放在Linux本地文件系統、也能夠放在HDFS分佈式文件系統中),最後是經過FileSystem這個實例來獲取文件的。
若有問題,歡迎留言指正!
注意:若是用戶用的是Hadoop1.0版本,而且是Windows環境下實現本地運行模式,則只需設置HADOOP_HOME與PATH路徑,其他不用任何設置!
--Exception: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
錯誤
Windows的惟一方法用於檢查當前進程的請求,在給定的路徑的訪問權限,因此咱們先給以能進行訪問,咱們本身先修改源代碼,return true 時容許訪問。咱們下載對應hadoop源代碼,hadoop-2.7.3-src.tar.gz解壓,hadoop-2.7.3-src\hadoop-common-project\hadoop-common\src\main\java\org\apache\hadoop\io\nativeio下NativeIO.java 複製到對應的Eclipse的project
即:把紅色源碼進行修改
修改成返回true
問題解決
處理方式:
第一步:下載hadoo2.7.3的hadoop.dll和winutils.exe.zip賦值覆蓋hadoop本地bin下,同時拷貝到C:\Windows\System32下(覆蓋)
第二步:項目下新建包名org.apache.hadoop.io.nativeio新建類NativeIO,接下來再次在Windows下運行eclipse中的Hadoop程序,Ok