hadoop Map-Reduce程序開發（4）

時間 2019-12-05

標籤 hadoop map reduce 程序開發欄目 Hadoop 简体版

原文原文鏈接

map-reduce 數據分析java

1.Hadoop API 開發步驟linux

肯定目標——開發軟件（使用Eclipse等工具）——測試結果算法

2.如何在eclipse安裝map-reduce插件windows

（1）查找eclipse的安裝目錄網絡

在linux下查找app

whereis eclipseeclipse

/usr/lib/eclipseide

在windows下就更簡單了工具

把hadoop/contrib./eclipse-plugin 文件下的hadoop-0.20.2-eclipse-plugin.jar 放到eclipse/plugins文件夾下oop

重啓eclipse

（2）打開window——preferences——Hadoop Map/Reduce配置選項，首先設置hadoop文件目錄

配置Hadoop路徑

Window -> Preferences 選擇「Hadoop Map/Reduce」，點擊「Browse...」選擇Hadoop文件夾的路徑(在windows上)。
這個步驟與運行環境無關，只是在新建工程的時候能將hadoop根目錄和lib目錄下的全部jar包自動導入。

建立工程

File -> New -> Project 選擇「Map/Reduce Project」，而後輸入項目名稱，建立項目。插件會自動把hadoop根目錄和lib目錄下的全部jar包導入。

建立Mapper或者Reducer

File -> New -> Mapper 建立Mapper，自動繼承mapred包裏面的MapReduceBase並實現Mapper接口。
注意：這個插件自動繼承的是mapred包裏舊版的類和接口，新版的Mapper得本身寫。

Reducer同理

（3）openperspective-other-map/reduce 打開一個hadoop視圖

（4）show view把hadoop視圖顯示出來

（5）會出現map/reduce locations選項卡

右鍵 ——New hadoop location配置一個新的配置

Location name ：一個名字

Map/Reduce Master DFSMaster

Host: localhost

Port: 9001 Port 9000

配置好後，在右邊會出現一個DFSLocations

---DisConnect 會出現HDFS文件系統的目錄樹能夠上傳文件

3.新建一個hadoop map-reduce任務

File——New——project——Map/Reduce Project

4.案例：

（1）數據篩選程序

任務要求：

-現有一批路由日誌。須要提取MAC地址和時間，刪去其餘內容

Apr 23 11:49:54 hostapd: wlan0 STA 14:7d:c5:9e:fb:84

Apr 23 11:49:52 hostapd: wlan0 STA 74:e5:0b:9e:fb:84

Apr 23 11:49:50 hostapd: wlan0 STA cc:dy:c5:9e:fb:84

Apr 23 11:49:44 hostapd: wlan0 STA cc:7d:c5:ee:fb:84

Apr 23 11:49:43 hostapd: wlan0 STA 74:7d:c5:9e:gb:84

Apr 23 11:49:42 hostapd: wlan0 STA 14:7d:c5:9e:fb:84

算法思路：

Hadoop 網絡模板程序

public class Test_1 extends Configured implements Tool{

enum Counter{

LINESKIP //出錯的行

}

public static class Map extends Mapper<LongWritable,Text,NullWritable,Text>{

public void map(LongWritable key , Text value, Context context)throws IOException,InterruptedException{

String line=value.toString();//讀取源數據

try{

//數據處理

String[] lineSplit=line.split(「」);

String mont=lineSplit[0];

String time=lineSplit[1];

String mac=lineSplit[6];

Text out=new Text(month+’ ’+time+’ ‘+mac);

context.write(NullWritable.get(),out);//輸出key \t value

}catch(java.long.ArrayIndexOutOfBoundsException e){

context.getCounter(Counter.LINESKIP).increment(1); //出錯令計數器+1

return;

}

@Override

public int run(String[] args) throws Exception{

Configuration conf=getConf();

Job job=new Job(conf,」Test_1」); //任務名

job.setJarByClass(Test_1.class); //指定class

FileInputFormat.addInputPath(job,new Path(args[0]));//輸入路徑

FileOutputFormat.setOutputPath(job,new Path(args[1]));//輸出路徑

job.setMapperClass(Map.class);//調用上面Map類做爲Map任務代碼

job.setOutputFormatClass(TextOutputFormat.class);

job.setOutputKeyClass(NullWritable.class); 指定輸出的KEY的格式

job.setOutputValueClass(Text.class);//指定輸出的vlaue格式

job.watiForCompletion(true);

return job.isSuccessful ? 0:1;

}

public static void main (String[] args) throws Exception

{

//運行任務

int res=ToolRunner.run(new Confiuration(),new Test_1(),args);

System.exit(res);

}

運行程序

運行按鈕——Run Configurations

Arguments:

program arguments 運行參數

hdfs://localhost:9000/user/james/input hdfs://localhost:9000/user/james/output

輸入路徑輸出路徑

（2）倒排索引

任務要求

-現有一批電話通訊清單，記錄了用戶A撥打用戶B的記錄

-須要作一個倒排索引，記錄撥打給用戶B的全部用戶A

13599999999 10086

13899999999 120

13944444444 13800138000

13722222222 13800138000

18800000000 120

13722222222 10086

18944444444 10086

任務輸出必須以下所示，主叫以’|’分隔

10086 13599999999|13722222222|18944444444

120 18800000000|

13800138000 13944444444|13722222222|

算法思路

程序：

public class Test_2 extends Configured implements Tool{

enum Counter{

LINESKIP //出錯的行

}

public static class Map extends Mapper<LongWritable,Text,Text,Text>{

public void map(LongWritable key , Text value,Context context)throws IOException,InterruptedException{

String line=value.toString();//讀取源數據

try{

//數據處理

String[] lineSplit=line.split(「」);

String anum=lineSplit[0];

String bnum=lineSplit[1];

Context.write(new Text(bnum),new Text(anum));//輸出

}catch(java.long.ArrayIndexOutOfBoundsException e){

Context.getCounter(Counter.LINESKIP).increment(1); //出錯令計數器+1

return;

}

public static class Reduce extents Reducer<Text,Text,Text,Text>{

public void reduce(Text key, Iterable<Text> values,Context context)throws IOException,InterruptedException{

String valueString;

String out=」」;

for(Text value:values){

valueString=value.toString();

out+=valueString+」|」;

}

Context.write(key,new Text(out));

}

@Override

public int run(String[] args) throws Exception{

Configuration conf=getConf();

Job job=new Job(conf,」Test_2」); //任務名

job.setJarByClass(Test_2.class); //指定class

FileInputFormat.addInputPath(job,new Path(args[0]));//輸入路徑

FileOutputFormat.setOutputPath(job,new Path(args[1]));//輸出路徑

job.setMapperClass(Map.class);//調用上面Map類做爲Map任務代碼

job.setReduceClass(Reduce.class); //調用上面Reduce類做爲Reduce任務代碼

job.setOutputFormatClass(TextOutputFormat.class);

job.setOutputKeyClass(Text.class); 指定輸出的KEY的格式

job.setOutputValueClass(Text.class);//指定輸出的vlaue格式

job.watiForCompletion(true);

return job.isSuccessful ? 0:1;

}

public static void main (String[] args) throws Exception

{

//運行任務

int res=ToolRunner.run(new Confiuration(),new Test_2(),args);

System.exit(res);

}

5.將程序打包輸出

在project右鍵——Export——Jar File——選擇位置（選上.classpath .project）(ttest_2.jar)——next——Main class（選擇main class）——Finish

運行test_2.jar 文件就像運行wordCount程序同樣

hadoop jar /home/james/hadoop/source/test_2.jar /home/james/Test_2 /home/james/output

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。