Java大數據開發(三)Hadoop(22)-NLineInputFormat案例

導讀:上一節咱們講解了FileInputFormat實現類有不少,本節講解實現類NLineInputFormat的案例操做。


NLineInputFormat使用案例javascript


1.需求java


對每一個單詞進行個數統計,要求根據每一個輸入文件的行數來規定輸出多少個切片。此案例要求每三行放入一個切片中。swift


(1)  輸入數據微信


hadoop ni haoxiaoming hive helloworldhadoop ni haoxiaoming hive helloworldhadoop ni haoxiaoming hive helloworldhadoop ni haoxiaoming hive helloworldhadoop ni haoxiaoming hive helloworldxiaoming hive helloworld


(2)  指望輸出數據app


Number of splits:4


2.需求分析
ide



3.代碼編寫
oop


(1)  編寫Mapper類測試


public class NLineMapper extends Mapper<LongWritable, Text, Text, LongWritable>{
private Text k = new Text(); private LongWritable v = new LongWritable(1);
@Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
// 1 獲取一行 String line = value.toString();
// 2 切割 String[] splited = line.split(" ");
// 3 循環寫出 for (int i = 0; i < splited.length; i++) {
k.set(splited[i]);
context.write(k, v); } }}


(2)  編寫Reducer類大數據


public class NLineReducer extends Reducer<Text, LongWritable, Text, LongWritable>{
LongWritable v = new LongWritable();
@Override protected void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException {    long sum = 0l;
// 1 彙總 for (LongWritable value : values) { sum += value.get();        }          v.set(sum); // 2 輸出 context.write(key, v); }}


(3)  編寫Driver類ui


public class NLineDriver {
public static void main(String[] args) throws IOException, URISyntaxException, ClassNotFoundException, InterruptedException {
// 輸入輸出路徑須要根據本身電腦上實際的輸入輸出路徑設置args = new String[] { "d:/input/inputword", "d:/output1" };
// 1 獲取job對象 Configuration configuration = new Configuration(); Job job = Job.getInstance(configuration);
// 7設置每一個切片InputSplit中劃分三條記錄 NLineInputFormat.setNumLinesPerSplit(job, 3);
// 8使用NLineInputFormat處理記錄數 job.setInputFormatClass(NLineInputFormat.class);
// 2設置jar包位置,關聯mapper和reducer job.setJarByClass(NLineDriver.class); job.setMapperClass(NLineMapper.class); job.setReducerClass(NLineReducer.class);
// 3設置map輸出kv類型 job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(LongWritable.class);
// 4設置最終輸出kv類型 job.setOutputKeyClass(Text.class); job.setOutputValueClass(LongWritable.class);
// 5設置輸入輸出數據路徑 FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1]));
// 6提交job job.waitForCompletion(true); }}


4.測試


(1)  輸入數據


hadoop ni haoxiaoming hive helloworldhadoop ni haoxiaoming hive helloworldhadoop ni haoxiaoming hive helloworldhadoop ni haoxiaoming hive helloworldhadoop ni haoxiaoming hive helloworldxiaoming hive helloworld


(2)  輸出結果的切片數


Number of splits:4


關注「跟我一塊兒學大數據」

跟我一塊兒學大數據

本文分享自微信公衆號 - 跟我一塊兒學大數據(java_big_data)。
若有侵權,請聯繫 support@oschina.cn 刪除。
本文參與「OSC源創計劃」,歡迎正在閱讀的你也加入,一塊兒分享。

相關文章
相關標籤/搜索