大數據(日誌分析)項目

需求:統計一個網頁的訪問次數pv,及其用戶訪問量uvjavascript

環境:hadoop-cdh僞分佈式,mysql,hive,sqoop,springboothtml

思路:java

 

編碼開始:mysql

1.將爬取的元數據上傳至hdfs文件系統jquery

bin/hdfs dfs -mkdir /project
bin/hdfs dfs -put /usr/local/2015082818 /project
bin/hdfs dfs -put /usr/local/2015082819 /project

2.hive建立源文件表web

create table yhd_source(
id               string ,
url              string ,
referer          string ,
keyword          string ,
type             string ,
guid             string ,
pageId           string ,
moduleId         string ,
linkId           string ,
attachedInfo     string ,
sessionId        string ,
trackerU         string ,
trackerType      string ,
ip               string ,
trackerSrc       string ,
cookie           string ,
orderCode        string ,
trackTime        string ,
endUserId        string ,
firstLink        string ,
sessionViewNo    string ,
productId        string ,
curMerchantId    string ,
provinceId       string ,
cityId           string ,
fee              string ,
edmActivity      string ,
edmEmail         string ,
edmJobId         string ,
ieVersion        string ,
platform         string ,
internalKeyword  string ,
resultSum        string ,
currentPage      string ,
linkPosition     string ,
buttonPosition   string 
)row format delimited fields terminated by '\t' location '/project';

3.建一張清洗表,將時間字段清洗,提取部分的時間字段出來ajax

create table yhd_qingxi(
id string,
url string,
guid string,
date string,
hour string
)
row format delimited fields terminated by '\t';

4.字段截取,天&小時spring

insert into table yhd_qingxi select id,url,guid,substring(trackTime,9,2) date,substring(trackTime,12,2) hour from yhd_source;

5.分區的方式:hive靜態分區sql

create table yhd_part1(
id string,
url string,
guid string
)
partitioned by (date string,hour string)
row format delimited fields terminated by '\t';

6.加載數據,來源於source源表chrome

insert into table yhd_part1 partition (date='20150828',hour='18') select id,url,guid from yhd_qingxi where date='28' and hour='18';
insert into table yhd_part1 partition (date='20150828',hour='19') select id,url,guid from yhd_qingxi where date='28' and hour='19';

7.PV實現:

select date,hour,count(url) PV from yhd_part1 group by date,hour;
-》按照天和小時進行分區
-》結果:
+-----------+-------+--------+--+
|   date    | hour  |   pv   |
+-----------+-------+--------+--+
| 20150828  | 18    | 64972  |
| 20150828  | 19    | 61162  |
+-----------+-------+--------+--+

8.uv實現

select date,hour,count(distinct guid) UV from yhd_part1 group by date,hour; 

-》結果:
+-----------+-------+--------+--+
|   date    | hour  |   uv   |
+-----------+-------+--------+--+
| 20150828  | 18    | 23938  |
| 20150828  | 19    | 22330  |
+-----------+-------+--------+--+

9.pv與uv整合

create table if not exists result as select date,hour,count(url) PV ,count(distinct guid) UV from yhd_part1 group by date,hour; 

-》結果:
+--------------+--------------+------------+------------+--+
| result.date  | result.hour  | result.pv  | result.uv  |
+--------------+--------------+------------+------------+--+
| 20150828     | 18           | 64972      | 23938      |
| 20150828     | 19           | 61162      | 22330      |
+--------------+--------------+------------+------------+--+

10.將結果導出到mysql表中

將結果導出到mysql表中

先在mysql建表:用於保存結果集
create table if not exists save(
date varchar(30) not null,
hour varchar(30) not null,
pv varchar(30) not null,
uv varchar(30) not null
);

使用sqoop實現導出到mysql

bin/sqoop export \
--connect \
jdbc:mysql://hadoop5.baizhiedu.com:3306/sqoop \
--username root \
--password 1234456 \
--table save \
--export-dir /user/hive/warehouse/baizhi125.db/result \
--num-mappers 1 \
--input-fields-terminated-by '\001'

+----------+------+-------+-------+
| date     | hour | pv    | uv    |
+----------+------+-------+-------+
| 20150828 | 18   | 64972 | 23938 |
| 20150828 | 19   | 61162 | 22330 |
+----------+------+-------+-------+


hive默認的分隔符:\001

11.

動態分區

分區的方式:動態分區

hive-site.xml

<property>
  <name>hive.exec.dynamic.partition</name>
  <value>true</value>
  <description>Whether or not to allow dynamic partitions in DML/DDL.</description>
</property>
-》默認值是true,表明容許使用動態分區實現

<property>
  <name>hive.exec.dynamic.partition.mode</name>
  <value>strict</value>
  <description>In strict mode, the user must specify at least one static partition in case the user accidentally overwrites all partitions.</description>
</property>


-set hive.exec.dynamic.partition.mode=nonstrict;  使用非嚴格模式

建表:

create table yhd_part2(
id string,
url string,
guid string
)
partitioned by (date string,hour string)
row format delimited fields terminated by '\t';

執行動態分區:
insert into table yhd_part2 partition (date,hour) select * from yhd_qingxi;

 

spring-boot整合Echarts完成圖形化報表,

echarts官網案例:https://echarts.baidu.com/examples/editor.html?c=area-stack

 常規項目環境下:

1.在jsp頁面引入echarts-min.js庫

<%@ page contentType="text/html;charset=UTF-8"  pageEncoding="UTF-8" %>
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<meta content="always" name="referrer">
<title>file</title>
<script type="text/javascript" src="${pageContext.request.contextPath}/jquery.min.js"></script>
<script src="${pageContext.request.contextPath}/echarts.min.js"></script>

<body>

<!-- 爲 ECharts 準備一個具有大小(寬高)的 DOM -->
<div id="ec" style="width: 600px;height:400px;"></div>
<script type="text/javascript">
    var myChart = echarts.init(document.getElementById('ec'));

    option = {
        title: {
            text: '一號店pv/uv統計圖'
        },
        tooltip : {
            trigger: 'axis',
            axisPointer: {
                type: 'cross',
                label: {
                    backgroundColor: '#6a7985'
                }
            }
        },
        legend: {
            data:['PV','UV']
        },
        toolbox: {
            feature: {
                saveAsImage: {}
            }
        },
        grid: {
            left: '3%',
            right: '4%',
            bottom: '3%',
            containLabel: true
        },
        xAxis : [
            {

            }
        ],
        yAxis : [
            {
                /*type : 'value'*/
            }
        ],
       series : [


        ]
    };
    myChart.setOption(option);
    $.ajax({
        type: "get",
        url: "${pageContext.request.contextPath}/re/query",
        dataType: "JSON",
        success: function (data) {
            myChart.setOption({
                xAxis : [
                    {
                        type : 'category',
                        boundaryGap : false,
                        data : data.h
                    }
                ],
                series: [ {
                    name:'PV',
                    type:'line',
                    stack: '總量',
                    areaStyle: {},
                    data:data.pv
                },
                    {
                        name:'UV',
                        type:'line',
                        stack: '總量',
                        areaStyle: {},
                        data:data.uv
                    }
                ]
            })
        }
    })
</script>
</body>
</html>

2.controller封裝完成echarts須要的數據結構的封裝

 

package com.baizhi.controller;

import com.baizhi.entity.Result;
import com.baizhi.service.ResultService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.ResponseBody;

import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

@Controller
@RequestMapping("re")
public class ResultController {
    @Autowired
    ResultService resultService;

    @RequestMapping("query")
    @ResponseBody
    public Map query(){
        List<Result> results = resultService.queryResult();
        List<String> pv = new ArrayList();
        List<String> uv = new ArrayList();
        List<String> date = new ArrayList<>();
        for (Result result : results) {
            pv.add(result.getPv());
            uv.add(result.getUv());
            date.add(result.getHour());
        }
        Map map = new HashMap();
        map.put("pv",pv);
        map.put("uv",uv);
        map.put("h",date);
        return map;
    }
}

結果,完成圖形化報表

相關文章
相關標籤/搜索