大數據清洗階段1

Result文件數聽說明:java

Ip106.39.41.166,(城市)數據庫

Date10/Nov/2016:00:01:02 +0800,(日期)apache

Day10,(天數)app

Traffic: 54 ,(流量)ide

Type: video,(類型:視頻video或文章articleoop

Id: 8701(視頻或者文章的id學習

測試要求:測試

一、 數據清洗:按照進行數據清洗,並將清洗後的數據導入hive數據庫中spa

兩階段數據清洗:日誌

1)第一階段:把須要的信息從原始日誌中提取出來

ip:    199.30.25.88

time:  10/Nov/2016:00:01:03 +0800

traffic:  62

文章: article/11325

視頻: video/3235

2)第二階段:根據提取出來的信息作精細化操做

ip--->城市 cityIP

date--> time:2016-11-10 00:01:03

day: 10

traffic:62

type:article/video

id:11325

3hive數據庫表結構:

create table data(  ip string,  time string , day string, traffic bigint,

type string, id   string )

2、數據處理:

·統計最受歡迎的視頻/文章的Top10訪問次數 (video/article

·按照地市統計最受歡迎的Top10課程 (ip

·按照流量統計最受歡迎的Top10課程 (traffic

3、數據可視化:將統計結果倒入MySql數據庫中,經過圖形化展現的方式展示出來。

 

 今天完成了MapReduce的基礎學習,只實現了第一階段裏面數據的清洗 由於hive一直出錯 沒有實現把數據加載到hive裏

這是wordcount代碼 實現了對數據的統計個數 目前僅作到這兒了

 

今天不能及時完成緣由:1.對MapReduce沒有提早去學習 ,如今已經學了MapReduce一部分,明天計劃把上次11個實驗弄懂學會,並完成第二階段以及排序。

2.沒有提早對本身的hive進行測試,結果課上發現hive配置有錯誤。

package QingXi;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCount{
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
      Job job = Job.getInstance();
      job.setJobName("WordCount");
      job.setJarByClass(WordCount.class);
      job.setMapperClass(doMapper.class);
      job.setReducerClass(doReducer.class);
      job.setOutputKeyClass(Text.class);
      job.setOutputValueClass(IntWritable.class);
      Path in = new Path("hdfs://localhost:9000/user/hadoop/name/result.txt");  
      Path out = new Path("hdfs://localhost:9000/user/hadoop/name/out2");  
      FileInputFormat.addInputPath(job, in);  
      FileOutputFormat.setOutputPath(job, out);  
      System.exit(job.waitForCompletion(true) ? 0 : 1);  
    }  
    public static class doMapper extends Mapper<Object, Text, Text, IntWritable>{  
      public static final IntWritable one = new IntWritable(1);  
      public static Text word = new Text();  
      @Override  
       protected void map(Object key, Text value, Context context)  
               throws IOException, InterruptedException {  
          StringTokenizer tokenizer = new StringTokenizer(value.toString(), "");  
           word.set(tokenizer.nextToken());  
           context.write(word, one);  
      }  
    }  
    public static class doReducer extends Reducer<Text, IntWritable, Text, IntWritable>{  
      private IntWritable result = new IntWritable();  
         @Override
      protected void reduce(Text key, Iterable<IntWritable> values, Context context)  
      throws IOException, InterruptedException {  
      int sum = 0;  
      for (IntWritable value : values) {  
      sum += value.get();  
      }  
      result.set(sum);  
      context.write(key, result);  
      }  
    }  
}
相關文章
相關標籤/搜索