KeyValueTextInputFormat使用案例java
1.需求swift
統計輸入文件中每一行的第一個單詞相同的行數。微信
(1) 輸入數據app
hadoop ni haoxiaoming hive helloworldhadoop ni haoxiaoming hive helloworld
(2) 指望結果數據ide
hadoop 2xiaoming 2
2.需求分析
oop
3.代碼編寫 大數據
(1) 編寫Mapper類spa
public class KVTextMapper extends Mapper<Text, Text, Text, LongWritable>{
// 1 設置value LongWritable v = new LongWritable(1);
@Override protected void map(Text key, Text value, Context context) throws IOException, InterruptedException { // 2 寫出 context.write(key, v); }}
(2) 編寫Reducer類.net
public class KVTextReducer extends Reducer<Text, LongWritable, Text, LongWritable>{
LongWritable v = new LongWritable();
@Override protected void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException {
long sum = 0L;
// 1 彙總統計 for (LongWritable value : values) { sum += value.get(); } v.set(sum); // 2 輸出 context.write(key, v); }}
(3) 編寫Driver類code
public class KVTextDriver {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration(); // 設置切割符 conf.set(KeyValueLineRecordReader.KEY_VALUE_SEPERATOR, " "); // 1 獲取job對象 Job job = Job.getInstance(conf);
// 2 設置jar包位置,關聯mapper和reducer job.setJarByClass(KVTextDriver.class); job.setMapperClass(KVTextMapper.class);job.setReducerClass(KVTextReducer.class);
// 3 設置map輸出kv類型 job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(LongWritable.class);
// 4 設置最終輸出kv類型 job.setOutputKeyClass(Text.class);job.setOutputValueClass(LongWritable.class);
// 5 設置輸入輸出數據路徑 FileInputFormat.setInputPaths(job, new Path(args[0]));
// 設置輸入格式 job.setInputFormatClass(KeyValueTextInputFormat.class);
// 6 設置輸出數據路徑 FileOutputFormat.setOutputPath(job, new Path(args[1]));
// 7 提交job job.waitForCompletion(true); }}
關注「跟我一塊兒學大數據」
跟我一塊兒學大數據
本文分享自微信公衆號 - 跟我一塊兒學大數據(java_big_data)。
若有侵權,請聯繫 support@oschina.cn 刪除。
本文參與「OSC源創計劃」,歡迎正在閱讀的你也加入,一塊兒分享。