需求:統計一個網頁的訪問次數pv,及其用戶訪問量uvjavascript
環境:hadoop-cdh僞分佈式,mysql,hive,sqoop,springboothtml
思路:java
編碼開始:mysql
1.將爬取的元數據上傳至hdfs文件系統jquery
bin/hdfs dfs -mkdir /project bin/hdfs dfs -put /usr/local/2015082818 /project bin/hdfs dfs -put /usr/local/2015082819 /project
2.hive建立源文件表web
create table yhd_source( id string , url string , referer string , keyword string , type string , guid string , pageId string , moduleId string , linkId string , attachedInfo string , sessionId string , trackerU string , trackerType string , ip string , trackerSrc string , cookie string , orderCode string , trackTime string , endUserId string , firstLink string , sessionViewNo string , productId string , curMerchantId string , provinceId string , cityId string , fee string , edmActivity string , edmEmail string , edmJobId string , ieVersion string , platform string , internalKeyword string , resultSum string , currentPage string , linkPosition string , buttonPosition string )row format delimited fields terminated by '\t' location '/project';
3.建一張清洗表,將時間字段清洗,提取部分的時間字段出來ajax
create table yhd_qingxi( id string, url string, guid string, date string, hour string ) row format delimited fields terminated by '\t';
4.字段截取,天&小時spring
insert into table yhd_qingxi select id,url,guid,substring(trackTime,9,2) date,substring(trackTime,12,2) hour from yhd_source;
5.分區的方式:hive靜態分區sql
create table yhd_part1( id string, url string, guid string ) partitioned by (date string,hour string) row format delimited fields terminated by '\t';
6.加載數據,來源於source源表chrome
insert into table yhd_part1 partition (date='20150828',hour='18') select id,url,guid from yhd_qingxi where date='28' and hour='18'; insert into table yhd_part1 partition (date='20150828',hour='19') select id,url,guid from yhd_qingxi where date='28' and hour='19';
7.PV實現:
select date,hour,count(url) PV from yhd_part1 group by date,hour; -》按照天和小時進行分區 -》結果: +-----------+-------+--------+--+ | date | hour | pv | +-----------+-------+--------+--+ | 20150828 | 18 | 64972 | | 20150828 | 19 | 61162 | +-----------+-------+--------+--+
8.uv實現
select date,hour,count(distinct guid) UV from yhd_part1 group by date,hour; -》結果: +-----------+-------+--------+--+ | date | hour | uv | +-----------+-------+--------+--+ | 20150828 | 18 | 23938 | | 20150828 | 19 | 22330 | +-----------+-------+--------+--+
9.pv與uv整合
create table if not exists result as select date,hour,count(url) PV ,count(distinct guid) UV from yhd_part1 group by date,hour; -》結果: +--------------+--------------+------------+------------+--+ | result.date | result.hour | result.pv | result.uv | +--------------+--------------+------------+------------+--+ | 20150828 | 18 | 64972 | 23938 | | 20150828 | 19 | 61162 | 22330 | +--------------+--------------+------------+------------+--+
10.將結果導出到mysql表中
將結果導出到mysql表中 先在mysql建表:用於保存結果集 create table if not exists save( date varchar(30) not null, hour varchar(30) not null, pv varchar(30) not null, uv varchar(30) not null ); 使用sqoop實現導出到mysql bin/sqoop export \ --connect \ jdbc:mysql://hadoop5.baizhiedu.com:3306/sqoop \ --username root \ --password 1234456 \ --table save \ --export-dir /user/hive/warehouse/baizhi125.db/result \ --num-mappers 1 \ --input-fields-terminated-by '\001' +----------+------+-------+-------+ | date | hour | pv | uv | +----------+------+-------+-------+ | 20150828 | 18 | 64972 | 23938 | | 20150828 | 19 | 61162 | 22330 | +----------+------+-------+-------+ hive默認的分隔符:\001
11.
動態分區
分區的方式:動態分區
hive-site.xml <property> <name>hive.exec.dynamic.partition</name> <value>true</value> <description>Whether or not to allow dynamic partitions in DML/DDL.</description> </property> -》默認值是true,表明容許使用動態分區實現 <property> <name>hive.exec.dynamic.partition.mode</name> <value>strict</value> <description>In strict mode, the user must specify at least one static partition in case the user accidentally overwrites all partitions.</description> </property> -》set hive.exec.dynamic.partition.mode=nonstrict; 使用非嚴格模式 建表: create table yhd_part2( id string, url string, guid string ) partitioned by (date string,hour string) row format delimited fields terminated by '\t'; 執行動態分區: insert into table yhd_part2 partition (date,hour) select * from yhd_qingxi;
spring-boot整合Echarts完成圖形化報表,
echarts官網案例:https://echarts.baidu.com/examples/editor.html?c=area-stack
常規項目環境下:
1.在jsp頁面引入echarts-min.js庫
<%@ page contentType="text/html;charset=UTF-8" pageEncoding="UTF-8" %> <!DOCTYPE html> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"> <meta content="always" name="referrer"> <title>file</title> <script type="text/javascript" src="${pageContext.request.contextPath}/jquery.min.js"></script> <script src="${pageContext.request.contextPath}/echarts.min.js"></script> <body> <!-- 爲 ECharts 準備一個具有大小(寬高)的 DOM --> <div id="ec" style="width: 600px;height:400px;"></div> <script type="text/javascript"> var myChart = echarts.init(document.getElementById('ec')); option = { title: { text: '一號店pv/uv統計圖' }, tooltip : { trigger: 'axis', axisPointer: { type: 'cross', label: { backgroundColor: '#6a7985' } } }, legend: { data:['PV','UV'] }, toolbox: { feature: { saveAsImage: {} } }, grid: { left: '3%', right: '4%', bottom: '3%', containLabel: true }, xAxis : [ { } ], yAxis : [ { /*type : 'value'*/ } ], series : [ ] }; myChart.setOption(option); $.ajax({ type: "get", url: "${pageContext.request.contextPath}/re/query", dataType: "JSON", success: function (data) { myChart.setOption({ xAxis : [ { type : 'category', boundaryGap : false, data : data.h } ], series: [ { name:'PV', type:'line', stack: '總量', areaStyle: {}, data:data.pv }, { name:'UV', type:'line', stack: '總量', areaStyle: {}, data:data.uv } ] }) } }) </script> </body> </html>
2.controller封裝完成echarts須要的數據結構的封裝
package com.baizhi.controller; import com.baizhi.entity.Result; import com.baizhi.service.ResultService; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.stereotype.Controller; import org.springframework.web.bind.annotation.RequestMapping; import org.springframework.web.bind.annotation.ResponseBody; import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Map; @Controller @RequestMapping("re") public class ResultController { @Autowired ResultService resultService; @RequestMapping("query") @ResponseBody public Map query(){ List<Result> results = resultService.queryResult(); List<String> pv = new ArrayList(); List<String> uv = new ArrayList(); List<String> date = new ArrayList<>(); for (Result result : results) { pv.add(result.getPv()); uv.add(result.getUv()); date.add(result.getHour()); } Map map = new HashMap(); map.put("pv",pv); map.put("uv",uv); map.put("h",date); return map; } }
結果,完成圖形化報表