數倉1.3 |行爲數據| 業務數據需求

 只要是insert into 的就是沒分區mysql

 需求一:用戶活躍主題

DWS層--(用戶行爲寬表層)

目標:統計當日、當週、當月活動的每一個設備明細sql

1 每日活躍設備明細 dwd_start_log--->dws_uv_detail_day

--把相同的字段collect_set到一個數組, 按mid_id分組(便於後邊統計)vim

 collect_set將某字段的值進行去重彙總,產生array類型字段。如: concat_ws('|', collect_set(user_id)) user_id,數組

建分區表dws_uv_detail_day partitioned by ('dt' string)bash

drop table if exists dws_uv_detail_day; create table dws_uv_detail_day( `mid_id` string COMMENT '設備惟一標識', `user_id` string COMMENT '用戶標識', `version_code` string COMMENT '程序版本號', `version_name` string COMMENT '程序版本名', `lang` string COMMENT '系統語言', `source` string COMMENT '渠道號', `os` string COMMENT '安卓系統版本', `area` string COMMENT '區域', `model` string COMMENT '手機型號', `brand` string COMMENT '手機品牌', `sdk_version` string COMMENT 'sdkVersion', `gmail` string COMMENT 'gmail', `height_width` string COMMENT '屏幕寬高', `app_time` string COMMENT '客戶端日誌產生時的時間', `network` string COMMENT '網絡模式', `lng` string COMMENT '經度', `lat` string COMMENT '緯度' ) COMMENT '活躍用戶按天明細' PARTITIONED BY ( `dt` string) stored as parquet location '/warehouse/gmall/dws/dws_uv_detail_day/' ;
View Code

數據導入  網絡

按周分區;過濾出一週內的數據;按設備id分組; ===>count(*)獲得最終結果;app

partition(dt='2019-02-10')   from dwd_start_log  where dt='2019-02-10'  group by mid_id  ( mid_id設備惟一標示 )ide

以用戶單日訪問爲key進行聚合,若是某個用戶在一天中使用了兩種操做系統、兩個系統版本、多個地區,登陸不一樣帳號,只取其中之一函數

hive (gmall)>
set hive.exec.dynamic.partition.mode=nonstrict; insert overwrite table dws_uv_detail_day  partition(dt='2019-02-10') select mid_id, concat_ws('|', collect_set(user_id)) user_id, concat_ws('|', collect_set(version_code)) version_code, concat_ws('|', collect_set(version_name)) version_name, concat_ws('|', collect_set(lang))lang, concat_ws('|', collect_set(source)) source, concat_ws('|', collect_set(os)) os, concat_ws('|', collect_set(area)) area, concat_ws('|', collect_set(model)) model, concat_ws('|', collect_set(brand)) brand, concat_ws('|', collect_set(sdk_version)) sdk_version, concat_ws('|', collect_set(gmail)) gmail, concat_ws('|', collect_set(height_width)) height_width, concat_ws('|', collect_set(app_time)) app_time, concat_ws('|', collect_set(network)) network, concat_ws('|', collect_set(lng)) lng, concat_ws('|', collect_set(lat)) lat from dwd_start_log where dt='2019-02-10'  
group by mid_id;
View Code

查詢導入結果;oop

hive (gmall)> select * from dws_uv_detail_day limit 1;

###最後count(*)便是每日活躍設備的個數; hive (gmall)
> select count(*) from dws_uv_detail_day;

2 每週(dws_uv_detail_wk)活躍設備明細  partition(wk_dt)

週一到週日concat(date_add(next_day('2019-02-10', 'MO'), -7), '_', date_add(next_day('2019-02-10', 'MO'), -1))即 2019-02-04_2019-02-10 

建立分區表: partitioned by('wk_dt' string) 

hive (gmall)>
drop table if exists dws_uv_detail_wk; create table dws_uv_detail_wk( `mid_id` string COMMENT '設備惟一標識', `user_id` string COMMENT '用戶標識', `version_code` string COMMENT '程序版本號', `version_name` string COMMENT '程序版本名', `lang` string COMMENT '系統語言', `source` string COMMENT '渠道號', `os` string COMMENT '安卓系統版本', `area` string COMMENT '區域', `model` string COMMENT '手機型號', `brand` string COMMENT '手機品牌', `sdk_version` string COMMENT 'sdkVersion', `gmail` string COMMENT 'gmail', `height_width` string COMMENT '屏幕寬高', `app_time` string COMMENT '客戶端日誌產生時的時間', `network` string COMMENT '網絡模式', `lng` string COMMENT '經度', `lat` string COMMENT '緯度', `monday_date` string COMMENT '週一日期', `sunday_date` string COMMENT '週日日期' ) COMMENT '活躍用戶按周明細' PARTITIONED BY (`wk_dt` string) stored as parquet location '/warehouse/gmall/dws/dws_uv_detail_wk/' ;
View Code

導入數據:以周爲分區;過濾出一個月內的數據,按設備id分組;

週一: date_add(next_day('2019-05-16','MO'),-7);

週日:date_add(next_day('2019-05-16','MO'),-1);

週一---週日:concat(date_add(next_day('2019-05-16', 'MO'), -7), "_", date_add(next_day('2019-05-16', 'MO'), -1));

insert overwrite table dws_uv_detail_wk partition(wk_dt) select mid_id, concat_ws('|', collect_set(user_id)) user_id, concat_ws('|', collect_set(version_code)) version_code, concat_ws('|', collect_set(version_name)) version_name, concat_ws('|', collect_set(lang)) lang, concat_ws('|', collect_set(source)) source, concat_ws('|', collect_set(os)) os, concat_ws('|', collect_set(area)) area, concat_ws('|', collect_set(model)) model, concat_ws('|', collect_set(brand)) brand, concat_ws('|', collect_set(sdk_version)) sdk_version, concat_ws('|', collect_set(gmail)) gmail, concat_ws('|', collect_set(height_width)) height_width, concat_ws('|', collect_set(app_time)) app_time, concat_ws('|', collect_set(network)) network, concat_ws('|', collect_set(lng)) lng, concat_ws('|', collect_set(lat)) lat, date_add(next_day('2019-02-10', 'MO'), -7), date_add(next_day('2019-02-10', 'MO'), -1), concat(date_add(next_day('2019-02-10', 'MO'), -7), '_', date_add(next_day('2019-02-10', 'MO'), -1)) from dws_uv_detail_day where dt >= date_add(next_day('2019-02-10', 'MO'), -7) and dt <= date_add(next_day('2019-02-10', 'MO'), -1) group by mid_id; 
View Code

 

查詢導入結果

hive (gmall)> select * from dws_uv_detail_wk limit 1; hive (gmall)> select count(*) from dws_uv_detail_wk;

3 每個月活躍設備明細 dws_uv_detail_mn   partition(mn) - 把每日的數據插入進去 

DWS層建立分區表 partitioned by(mn string) 

hive (gmall)>
drop table if exists dws_uv_detail_mn; create  external table dws_uv_detail_mn( `mid_id` string COMMENT '設備惟一標識', `user_id` string COMMENT '用戶標識', `version_code` string COMMENT '程序版本號', `version_name` string COMMENT '程序版本名', `lang` string COMMENT '系統語言', `source` string COMMENT '渠道號', `os` string COMMENT '安卓系統版本', `area` string COMMENT '區域', `model` string COMMENT '手機型號', `brand` string COMMENT '手機品牌', `sdk_version` string COMMENT 'sdkVersion', `gmail` string COMMENT 'gmail', `height_width` string COMMENT '屏幕寬高', `app_time` string COMMENT '客戶端日誌產生時的時間', `network` string COMMENT '網絡模式', `lng` string COMMENT '經度', `lat` string COMMENT '緯度' ) COMMENT '活躍用戶按月明細' PARTITIONED BY (`mn` string) stored as parquet location '/warehouse/gmall/dws/dws_uv_detail_mn/' ;
View Code

數據導入 按月分區;過濾出一個月內的數據,按照設備id分組;

data_format('2019-03-10', 'yyyy-MM')  ---> 2019-03

where date_format('dt', 'yyyy-MM') = date_format('2019-02-10', 'yyyy-MM')  group by mid_id;

hive (gmall)>
set hive.exec.dynamic.partition.mode=nonstrict; insert  overwrite table dws_uv_detail_mn partition(mn) select mid_id, concat_ws('|', collect_set(user_id)) user_id, concat_ws('|', collect_set(version_code)) version_code, concat_ws('|', collect_set(version_name)) version_name, concat_ws('|', collect_set(lang)) lang, concat_ws('|', collect_set(source)) source, concat_ws('|', collect_set(os)) os, concat_ws('|', collect_set(area)) area, concat_ws('|', collect_set(model)) model, concat_ws('|', collect_set(brand)) brand, concat_ws('|', collect_set(sdk_version)) sdk_version, concat_ws('|', collect_set(gmail)) gmail, concat_ws('|', collect_set(height_width)) height_width, concat_ws('|', collect_set(app_time)) app_time, concat_ws('|', collect_set(network)) network, concat_ws('|', collect_set(lng)) lng, concat_ws('|', collect_set(lat)) lat, date_format('2019-02-10','yyyy-MM') from dws_uv_detail_day where date_format(dt,'yyyy-MM') = date_format('2019-02-10','yyyy-MM') group by mid_id;
View Code

查詢導入結果

hive (gmall)> select * from dws_uv_detail_mn limit 1; hive (gmall)> select count(*) from dws_uv_detail_mn ;

DWS層加載數據腳本

在hadoop101的/home/kris/bin目錄下建立腳本

[kris@hadoop101 bin]$ vim dws.sh

#!/bin/bash # 定義變量方便修改 APP=gmall hive=/opt/module/hive/bin/hive # 若是是輸入的日期按照取輸入日期;若是沒輸入日期取當前時間的前一天 if [ -n "$1" ] ;then do_date=$1
else do_date=`date -d "-1 day" +%F` fi sql=" set hive.exec.dynamic.partition.mode=nonstrict; insert overwrite table "$APP".dws_uv_detail_day partition(dt='$do_date') select mid_id, concat_ws('|', collect_set(user_id)) user_id, concat_ws('|', collect_set(version_code)) version_code, concat_ws('|', collect_set(version_name)) version_name, concat_ws('|', collect_set(lang)) lang, concat_ws('|', collect_set(source)) source, concat_ws('|', collect_set(os)) os, concat_ws('|', collect_set(area)) area, concat_ws('|', collect_set(model)) model, concat_ws('|', collect_set(brand)) brand, concat_ws('|', collect_set(sdk_version)) sdk_version, concat_ws('|', collect_set(gmail)) gmail, concat_ws('|', collect_set(height_width)) height_width, concat_ws('|', collect_set(app_time)) app_time, concat_ws('|', collect_set(network)) network, concat_ws('|', collect_set(lng)) lng, concat_ws('|', collect_set(lat)) lat from "$APP".dwd_start_log where dt='$do_date'  
  group by mid_id; insert  overwrite table "$APP".dws_uv_detail_wk partition(wk_dt) select mid_id, concat_ws('|', collect_set(user_id)) user_id, concat_ws('|', collect_set(version_code)) version_code, concat_ws('|', collect_set(version_name)) version_name, concat_ws('|', collect_set(lang)) lang, concat_ws('|', collect_set(source)) source, concat_ws('|', collect_set(os)) os, concat_ws('|', collect_set(area)) area, concat_ws('|', collect_set(model)) model, concat_ws('|', collect_set(brand)) brand, concat_ws('|', collect_set(sdk_version)) sdk_version, concat_ws('|', collect_set(gmail)) gmail, concat_ws('|', collect_set(height_width)) height_width, concat_ws('|', collect_set(app_time)) app_time, concat_ws('|', collect_set(network)) network, concat_ws('|', collect_set(lng)) lng, concat_ws('|', collect_set(lat)) lat, date_add(next_day('$do_date','MO'),-7), date_add(next_day('$do_date','SU'),-7), concat(date_add( next_day('$do_date','MO'),-7), '_' , date_add(next_day('$do_date','MO'),-1) ) from "$APP".dws_uv_detail_day where dt>=date_add(next_day('$do_date','MO'),-7) and dt<=date_add(next_day('$do_date','MO'),-1) group by mid_id; insert overwrite table "$APP".dws_uv_detail_mn partition(mn) select mid_id, concat_ws('|', collect_set(user_id)) user_id, concat_ws('|', collect_set(version_code)) version_code, concat_ws('|', collect_set(version_name)) version_name, concat_ws('|', collect_set(lang))lang, concat_ws('|', collect_set(source)) source, concat_ws('|', collect_set(os)) os, concat_ws('|', collect_set(area)) area, concat_ws('|', collect_set(model)) model, concat_ws('|', collect_set(brand)) brand, concat_ws('|', collect_set(sdk_version)) sdk_version, concat_ws('|', collect_set(gmail)) gmail, concat_ws('|', collect_set(height_width)) height_width, concat_ws('|', collect_set(app_time)) app_time, concat_ws('|', collect_set(network)) network, concat_ws('|', collect_set(lng)) lng, concat_ws('|', collect_set(lat)) lat, date_format('$do_date','yyyy-MM') from "$APP".dws_uv_detail_day where date_format(dt,'yyyy-MM') = date_format('$do_date','yyyy-MM') group by mid_id; " $hive -e "$sql"
View Code

增長腳本執行權限 chmod 777 dws.sh

腳本使用[kris@hadoop101 module]$ dws.sh 2019-02-11

查詢結果

hive (gmall)> select count(*) from dws_uv_detail_day; hive (gmall)> select count(*) from dws_uv_detail_wk; hive (gmall)> select count(*) from dws_uv_detail_mn ;

腳本執行時間;企業開發中通常在每日凌晨30分~1點

  ADS層 目標:當日、當週、當月活躍設備數    使用 day_count表 join wk_count  join mn_count , 把3張錶鏈接一塊兒

建表ads_uv_count表:

字段有day_count、wk_count、mn_count
is_weekend if(date_add(next_day('2019-02-10', 'MO'), -1) = '2019-02-10', 'Y', 'N')
is_monthend if(last_day('2019-02-10') = '2019-02-10', 'Y', 'N')

drop table if exists ads_uv_count; create external table ads_uv_count( `dt` string comment '統計日期', `day_count` bigint comment '當日用戶量', `wk_count` bigint comment '當週用戶量', `mn_count` bigint comment '當月用戶量', `is_weekend` string comment 'Y,N是不是週末,用於獲得本週最終結果', `is_monthend` string comment 'Y,N是不是月末,用於獲得本月最終結果' ) comment '每日活躍用戶數量' stored as parquet location '/warehouse/gmall/ads/ads_uv_count/';
View Code

導入數據:

hive (gmall)>
insert  overwrite table ads_uv_count select  
  '2019-02-10' dt, daycount.ct, wkcount.ct, mncount.ct, if(date_add(next_day('2019-02-10','MO'),-1)='2019-02-10','Y','N') , if(last_day('2019-02-10')='2019-02-10','Y','N') from ( select  
      '2019-02-10' dt, count(*) ct from dws_uv_detail_day where dt='2019-02-10' )daycount join ( select  
     '2019-02-10' dt, count (*) ct from dws_uv_detail_wk where wk_dt=concat(date_add(next_day('2019-02-10','MO'),-7),'_' ,date_add(next_day('2019-02-10','MO'),-1) ) ) wkcount on daycount.dt=wkcount.dt join ( select  
     '2019-02-10' dt, count (*) ct from dws_uv_detail_mn where mn=date_format('2019-02-10','yyyy-MM') )mncount on daycount.dt=mncount.dt ;
View Code

查詢導入結果

  hive (gmall)> select * from ads_uv_count ;

 ADS層加載數據腳本

1)在hadoop101的/home/kris/bin目錄下建立腳本

[kris@hadoop101 bin]$ vim ads.sh

#!/bin/bash # 定義變量方便修改 APP=gmall hive=/opt/module/hive/bin/hive # 若是是輸入的日期按照取輸入日期;若是沒輸入日期取當前時間的前一天 if [ -n "$1" ] ;then do_date=$1
else do_date=`date -d "-1 day" +%F` fi sql=" set hive.exec.dynamic.partition.mode=nonstrict; insert into table "$APP".ads_uv_count select  
  '$do_date' dt, daycount.ct, wkcount.ct, mncount.ct, if(date_add(next_day('$do_date','MO'),-1)='$do_date','Y','N') , if(last_day('$do_date')='$do_date','Y','N') from ( select  
      '$do_date' dt, count(*) ct from "$APP".dws_uv_detail_day where dt='$do_date' )daycount join ( select  
     '$do_date' dt, count (*) ct from "$APP".dws_uv_detail_wk where wk_dt=concat(date_add(next_day('$do_date','MO'),-7),'_' ,date_add(next_day('$do_date','MO'),-1) ) ) wkcount on daycount.dt=wkcount.dt join ( select  
     '$do_date' dt, count (*) ct from "$APP".dws_uv_detail_mn where mn=date_format('$do_date','yyyy-MM') )mncount on daycount.dt=mncount.dt; " $hive -e "$sql"
View Code

增長腳本執行權限 chmod 777 ads.sh

腳本使用 ads.sh 2019-02-11

查詢導入結果 hive (gmall)> select * from ads_uv_count ;

需求二:用戶新增主題

首次聯網使用應用的用戶。若是一個用戶首次打開某APP,那這個用戶定義爲新增用戶;卸載再安裝的設備,不會被算做一次新增。新增用戶包括日新增用戶、周新增用戶、月新增用戶。

每日新增(老用戶不算,以前沒登錄過,今天是第一次登錄)設備--沒有分區
-->以往的新增庫裏邊沒有他,但他今天活躍了即新增長的用戶;

1 DWS層(每日新增設備明細表)

建立每日新增設備明細表:dws_new_mid_day 

hive (gmall)>
drop table if exists dws_new_mid_day; create  table dws_new_mid_day ( `mid_id` string COMMENT '設備惟一標識', `user_id` string COMMENT '用戶標識', `version_code` string COMMENT '程序版本號', `version_name` string COMMENT '程序版本名', `lang` string COMMENT '系統語言', `source` string COMMENT '渠道號', `os` string COMMENT '安卓系統版本', `area` string COMMENT '區域', `model` string COMMENT '手機型號', `brand` string COMMENT '手機品牌', `sdk_version` string COMMENT 'sdkVersion', `gmail` string COMMENT 'gmail', `height_width` string COMMENT '屏幕寬高', `app_time` string COMMENT '客戶端日誌產生時的時間', `network` string COMMENT '網絡模式', `lng` string COMMENT '經度', `lat` string COMMENT '緯度', `create_date` string comment '建立時間' ) COMMENT '每日新增設備信息' stored as parquet location '/warehouse/gmall/dws/dws_new_mid_day/';
View Code

             

dws_uv_detail_day(每日活躍設備明細) left join dws_new_mid_day nm(以往的新增用戶表, 新建字段create_time2019-02-10) nm.mid_id is null;

導入數據

每日活躍用戶表 left join 每日新增設備表,關聯的條件是mid_id相等。若是是每日新增的設備,則在每日新增設備表中爲null。

  from dws_uv_detail_day ud left join dws_new_mid_day nm on ud.mid_id=nm.mid_id

  where ud.dt='2019-02-10' and nm.mid_id is null;

hive (gmall)>
insert into table dws_new_mid_day select ud.mid_id, ud.user_id , ud.version_code , ud.version_name , ud.lang , ud.source, ud.os, ud.area, ud.model, ud.brand, ud.sdk_version, ud.gmail, ud.height_width, ud.app_time, ud.network, ud.lng, ud.lat, '2019-02-10'
from dws_uv_detail_day ud left join dws_new_mid_day nm on ud.mid_id=nm.mid_id where ud.dt='2019-02-10' and nm.mid_id is null;
View Code

查詢導入數據

hive (gmall)> select count(*) from dws_new_mid_day ;

2 ADS層(每日新增設備表)

建立每日新增設備表ads_new_mid_count 

hive (gmall)>
drop table if exists `ads_new_mid_count`; create  table `ads_new_mid_count` ( `create_date` string comment '建立時間' , `new_mid_count` BIGINT comment '新增設備數量' ) COMMENT '每日新增設備信息數量' row format delimited fields terminated by '\t' location '/warehouse/gmall/ads/ads_new_mid_count/';
View Code

導入數據   count(*) dws_new_mid_day表便可

加了create_date就必須group by create_time不然報錯:not in GROUP BY key 'create_date'

hive (gmall)>
insert into table ads_new_mid_count select create_date , count(*)  from dws_new_mid_day where create_date='2019-02-10'
group by create_date ;

查詢導入數據

hive (gmall)> select * from ads_new_mid_count;

 

擴展每個月新增:

--每個月新增
drop table if exists dws_new_mid_mn; create table dws_new_mid_mn( `mid_id` string COMMENT '設備惟一標識', `user_id` string COMMENT '用戶標識', `version_code` string COMMENT '程序版本號', `version_name` string COMMENT '程序版本名', `lang` string COMMENT '系統語言', `source` string COMMENT '渠道號', `os` string COMMENT '安卓系統版本', `area` string COMMENT '區域', `model` string COMMENT '手機型號', `brand` string COMMENT '手機品牌', `sdk_version` string COMMENT 'sdkVersion', `gmail` string COMMENT 'gmail', `height_width` string COMMENT '屏幕寬高', `app_time` string COMMENT '客戶端日誌產生時的時間', `network` string COMMENT '網絡模式', `lng` string COMMENT '經度', `lat` string COMMENT '緯度' )comment "每個月新增明細" partitioned by(mn string) stored as parquet location "/warehouse/gmall/dws/dws_new_mid_mn"; insert overwrite table dws_new_mid_mn partition(mn) select um.mid_id, um.user_id , um.version_code , um.version_name , um.lang , um.source, um.os, um.area, um.model, um.brand, um.sdk_version, um.gmail, um.height_width, um.app_time, um.network, um.lng, um.lat, date_format('2019-02-10', 'yyyy-MM') from dws_uv_detail_mn um left join dws_new_mid_mn nm on um.mid_id = nm.mid_id where um.mn =date_format('2019-02-10', 'yyyy-MM') and nm.mid_id = null; ----爲何加上它就是空的??查不到數據了呢 --##注意這裏不能寫出date_format(um.mn, 'yyyy-MM') =date_format('2019-02-10', 'yyyy-MM') 
    |
View Code

 

需求三:用戶留存主題

                   

若是不考慮2019-02-11和2019-02-12的新增用戶:2019-02-10新增100人,一天後它的留存率是30%,2天12號它的留存率是25%,3天后留存率32%;

  站在2019-02-12號看02-11的留存率:新增200人,12號的留存率是20%;

  站在2019-02-13號看02-12的留存率:新增100人,13號即一天後留存率是25%;

用戶留存率的分析: 昨日的新增且今天是活躍的 /  昨日的新增用戶量

                  

現在天11日,要統計10日的 用戶留存率---->10日的新設備且是11日活躍的 / 10日新增設備
  分母:10日的新增設備(每日活躍 left join 以往新增設備表(nm)  nm.mid_id is null )
  分子:每日活躍表(ud) join 每日新增表(nm) where ud.dt='今天' and nm.create_date = '昨天'

① DWS層(每日留存用戶明細表dws_user_retention_day)

用戶1天留存的分析: ===>>

  留存用戶=前一天新增 join 今天活躍

       用戶留存率=留存用戶/前一天新增

建立表: dws_user_retention_day

hive (gmall)>
drop table if exists `dws_user_retention_day`; create  table `dws_user_retention_day` ( `mid_id` string COMMENT '設備惟一標識', `user_id` string COMMENT '用戶標識', `version_code` string COMMENT '程序版本號', `version_name` string COMMENT '程序版本名', `lang` string COMMENT '系統語言', `source` string COMMENT '渠道號', `os` string COMMENT '安卓系統版本', `area` string COMMENT '區域', `model` string COMMENT '手機型號', `brand` string COMMENT '手機品牌', `sdk_version` string COMMENT 'sdkVersion', `gmail` string COMMENT 'gmail', `height_width` string COMMENT '屏幕寬高', `app_time` string COMMENT '客戶端日誌產生時的時間', `network` string COMMENT '網絡模式', `lng` string COMMENT '經度', `lat` string COMMENT '緯度', `create_date` string comment '設備新增時間', `retention_day` int comment '截止當前日期留存天數' ) COMMENT '每日用戶留存狀況' PARTITIONED BY ( `dt` string) stored as parquet location '/warehouse/gmall/dws/dws_user_retention_day/' ;
View Code

導入數據(天天計算前1天的新用戶訪問留存明細)

  from  dws_uv_detail_day每日活躍設備 ud join dws_new_mid_day每日新增設備 nm   on ud.mid_id =nm.mid_id

    where ud.dt='2019-02-11' and nm.create_date=date_add('2019-02-11',-1);

hive (gmall)>
insert  overwrite table dws_user_retention_day  partition(dt="2019-02-11") select nm.mid_id, nm.user_id , nm.version_code , nm.version_name , nm.lang , nm.source, nm.os, nm.area, nm.model, nm.brand, nm.sdk_version, nm.gmail, nm.height_width, nm.app_time, nm.network, nm.lng, nm.lat, nm.create_date, 1 retention_day from  dws_uv_detail_day ud join dws_new_mid_day nm   on ud.mid_id =nm.mid_id where ud.dt='2019-02-11' and nm.create_date=date_add('2019-02-11',-1);
View Code

查詢導入數據(天天計算前1天的新用戶訪問留存明細)

hive (gmall)> select count(*) from dws_user_retention_day;

② DWS層(1,2,3,n天留存用戶明細表)直接插入數據: dws_user_retention_day 用union all鏈接起來,彙總到一個表中;

   1)直接導入數據(天天計算前1,2,3,n天的新用戶訪問留存明細)

        直接改變這個便可以,date_add('2019-02-11',-3);  -1是一天的留存率; -2是兩天的留存率、-3是三天的留存率

hive (gmall)>
insert  overwrite table dws_user_retention_day  partition(dt="2019-02-11") select nm.mid_id, nm.user_id , nm.version_code , nm.version_name , nm.lang , nm.source, nm.os, nm.area, nm.model, nm.brand, nm.sdk_version, nm.gmail, nm.height_width, nm.app_time, nm.network, nm.lng, nm.lat, nm.create_date, 1 retention_day from dws_uv_detail_day ud join dws_new_mid_day nm  on ud.mid_id =nm.mid_id where ud.dt='2019-02-11' and nm.create_date=date_add('2019-02-11',-1) union all
select nm.mid_id, nm.user_id , nm.version_code , nm.version_name , nm.lang , nm.source, nm.os, nm.area, nm.model, nm.brand, nm.sdk_version, nm.gmail, nm.height_width, nm.app_time, nm.network, nm.lng, nm.lat, nm.create_date, 2 retention_day from  dws_uv_detail_day ud join dws_new_mid_day nm   on ud.mid_id =nm.mid_id where ud.dt='2019-02-11' and nm.create_date=date_add('2019-02-11',-2) union all
select nm.mid_id, nm.user_id , nm.version_code , nm.version_name , nm.lang , nm.source, nm.os, nm.area, nm.model, nm.brand, nm.sdk_version, nm.gmail, nm.height_width, nm.app_time, nm.network, nm.lng, nm.lat, nm.create_date, 3 retention_day from  dws_uv_detail_day ud join dws_new_mid_day nm   on ud.mid_id =nm.mid_id where ud.dt='2019-02-11' and nm.create_date=date_add('2019-02-11',-3);
View Code

    2)查詢導入數據(天天計算前1,2,3天的新用戶訪問留存明細)

hive (gmall)> select retention_day , count(*) from dws_user_retention_day group by retention_day;

③  ADS層  留存用戶數  ads_user_retention_day_count 直接count( * )便可 

     1)建立 ads_user_retention_day_count表:

hive (gmall)>
drop table if exists `ads_user_retention_day_count`; create  table `ads_user_retention_day_count` ( `create_date` string comment '設備新增日期', `retention_day` int comment '截止當前日期留存天數', `retention_count` bigint comment  '留存數量' ) COMMENT '每日用戶留存狀況' stored as parquet location '/warehouse/gmall/ads/ads_user_retention_day_count/';

  導入數據 按建立日期create_date 和 留存天數retention_day進行分組group by;

hive (gmall)>
insert into table ads_user_retention_day_count select create_date, retention_day, count(*) retention_count from dws_user_retention_day where dt='2019-02-11' 
group by create_date,retention_day;

  查詢導入數據

    hive (gmall)> select * from ads_user_retention_day_count;

    --->  2019-02-10      1       112

④ 留存用戶比率  retention_count / new_mid_count 即留存個數 / 新增個數

    建立表 ads_user_retention_day_rate

hive (gmall)>
drop table if exists `ads_user_retention_day_rate`; create  table `ads_user_retention_day_rate` ( `stat_date` string comment '統計日期', `create_date` string comment '設備新增日期', `retention_day` int comment '截止當前日期留存天數', `retention_count` bigint comment  '留存數量', `new_mid_count` string comment '當日設備新增數量', `retention_ratio` decimal(10,2) comment '留存率' ) COMMENT '每日用戶留存狀況' stored as parquet location '/warehouse/gmall/ads/ads_user_retention_day_rate/';
View Code

   導入數據

    join ads_new_mid_countt --->每日新增設備表

hive (gmall)>
insert into table ads_user_retention_day_rate select 
    '2019-02-11' , ur.create_date, ur.retention_day, ur.retention_count , nc.new_mid_count, ur.retention_count/nc.new_mid_count*100
from ( select create_date, retention_day, count(*) retention_count from `dws_user_retention_day` where dt='2019-02-11' 
    group by create_date,retention_day ) ur join ads_new_mid_count nc on nc.create_date=ur.create_date;
View Code

   查詢導入數據

    hive (gmall)>select * from ads_user_retention_day_rate;

     2019-02-11      2019-02-10      1       112     442     25.34

 

需求四:沉默用戶數

沉默用戶:指的是隻在安裝當天啓動過,且啓動時間是在一週前

使用日活明細表dws_uv_detail_day做爲DWS層數據

                    

建表語句

hive (gmall)>
drop table if exists ads_slient_count; create external table ads_slient_count( `dt` string COMMENT '統計日期', `slient_count` bigint COMMENT '沉默設備數' ) row format delimited fields terminated by '\t' location '/warehouse/gmall/ads/ads_slient_count';
View Code

導入數據

hive (gmall)>
insert into table ads_slient_count select 
    '2019-02-20' dt, count(*) slient_count from ( select mid_id from dws_uv_detail_day where dt<='2019-02-20'
    group by mid_id having count(*)=1 and min(dt)<date_add('2019-02-20',-7) ) t1;
View Code

需求五:本週迴流用戶數

本週迴流=本週活躍-本週新增-上週活躍

使用日活明細表dws_uv_detail_day做爲DWS層數據

本週迴流(上週之前活躍過,上週沒活躍,本週活躍了)=本週活躍-本週新增-上週活躍
本週迴流=本週活躍left join 本週新增 left join 上週活躍,且本週新增id爲null,上週活躍id爲null;

建表:

hive (gmall)>
drop table if exists ads_back_count; create external table ads_back_count( `dt` string COMMENT '統計日期', `wk_dt` string COMMENT '統計日期所在周', `wastage_count` bigint COMMENT '迴流設備數' ) row format delimited fields terminated by '\t' location '/warehouse/gmall/ads/ads_back_count';
View Code

導入數據

hive (gmall)> 
insert into table ads_back_count select 
   '2019-02-20' dt, concat(date_add(next_day('2019-02-20','MO'),-7),'_',date_add(next_day('2019-02-20','MO'),-1)) wk_dt, count(*) from ( select t1.mid_id from ( select mid_id from dws_uv_detail_wk where wk_dt=concat(date_add(next_day('2019-02-20','MO'),-7),'_',date_add(next_day('2019-02-20','MO'),-1)) )t1 left join ( select mid_id from dws_new_mid_day where create_date<=date_add(next_day('2019-02-20','MO'),-1) and create_date>=date_add(next_day('2019-02-20','MO'),-7) )t2 on t1.mid_id=t2.mid_id left join ( select mid_id from dws_uv_detail_wk where wk_dt=concat(date_add(next_day('2019-02-20','MO'),-7*2),'_',date_add(next_day('2019-02-20','MO'),-7-1)) )t3 on t1.mid_id=t3.mid_id where t2.mid_id is null and t3.mid_id is null )t4;
View Code

需求六:流失用戶數

流失用戶:最近7天未登陸咱們稱之爲流失用戶

使用日活明細表dws_uv_detail_day做爲DWS層數據

建表語句

hive (gmall)>
drop table if exists ads_wastage_count; create external table ads_wastage_count( `dt` string COMMENT '統計日期', `wastage_count` bigint COMMENT '流失設備數' ) row format delimited fields terminated by '\t' location '/warehouse/gmall/ads/ads_wastage_count';
View Code

導入數據

hive (gmall)>
insert into table ads_wastage_count select
     '2019-02-20', count(*) from ( select mid_id from dws_uv_detail_day group by mid_id having max(dt)<=date_add('2019-02-20',-7) )t1;
View Code

需求七:最近連續3周活躍用戶數

最近3周連續活躍的用戶:一般是週一對前3周的數據作統計,該數據一週計算一次。

使用周活明細表dws_uv_detail_wk做爲DWS層數據

建表語句

hive (gmall)>
drop table if exists ads_continuity_wk_count; create external table ads_continuity_wk_count( `dt` string COMMENT '統計日期,通常用結束週週日日期,若是天天計算一次,可用當天日期', `wk_dt` string COMMENT '持續時間', `continuity_count` bigint ) row format delimited fields terminated by '\t' location '/warehouse/gmall/ads/ads_continuity_wk_count';
View Code

導入數據

hive (gmall)>
insert into table ads_continuity_wk_count select 
     '2019-02-20', concat(date_add(next_day('2019-02-20','MO'),-7*3),'_',date_add(next_day('2019-02-20','MO'),-1)), count(*) from ( select mid_id from dws_uv_detail_wk where wk_dt>=concat(date_add(next_day('2019-02-20','MO'),-7*3),'_',date_add(next_day('2019-02-20','MO'),-7*2-1)) and wk_dt<=concat(date_add(next_day('2019-02-20','MO'),-7),'_',date_add(next_day('2019-02-20','MO'),-1)) group by mid_id having count(*)=3 )t1;
View Code

需求八:最近七天內連續三天活躍用戶數

說明:最近7天內連續3天活躍用戶數

使用日活明細表dws_uv_detail_day做爲DWS層數據

            

建表

hive (gmall)>
drop table if exists ads_continuity_uv_count; create external table ads_continuity_uv_count( `dt` string COMMENT '統計日期', `wk_dt` string COMMENT '最近7天日期', `continuity_count` bigint ) COMMENT '連續活躍設備數' row format delimited fields terminated by '\t' location '/warehouse/gmall/ads/ads_continuity_uv_count';
View Code

導入數據

hive (gmall)>
insert into table ads_continuity_uv_count select
    '2019-02-12', concat(date_add('2019-02-12',-6),'_','2019-02-12'), count(*) from ( select mid_id from ( select mid_id from ( select mid_id, date_sub(dt,rank) date_dif from ( select mid_id, dt, rank() over(partition by mid_id order by dt) rank from dws_uv_detail_day where dt>=date_add('2019-02-12',-6) and dt<='2019-02-12' )t1 )t2 group by mid_id,date_dif having count(*)>=3 )t3 group by mid_id )t4;
View Code

 

          ==================================================業務數據處理分析=================================================

 

ODS層跟原始字段要如出一轍;

DWD層
  dwd_order_info訂單表
  dwd_order_detail訂單詳情(訂單和商品)
  dwd_user_info用戶表
  dwd_payment_info支付流水
  dwd_sku_info商品表(增長分類)

每日用戶行爲寬表 dws_user_action

字段: user_id、order_count、order_amount、payment_count、payment_amount 、comment_count

drop table if exists dws_user_action; create external table dws_user_action( user_id string comment '用戶id', order_count bigint comment '用戶下單數', order_amount decimal(16, 2) comment '下單金額', payment_count bigint comment '支付次數', payment_amount decimal(16, 2) comment '支付金額', comment_count bigint comment '評論次數' )comment '每日用戶行爲寬表' partitioned by(`dt` string) stored as parquet location '/warehouse/gmall/dws/dws_user_action/' tblproperties("parquet.compression"="snappy");
View Code

導入數據

0佔位符,第一個字段要有別名

with tmp_order as( select user_id, count(*) order_count, sum(oi.total_amount) order_amount from dwd_order_info oi where date_format(oi.create_time, 'yyyy-MM-dd')='2019-02-10' group by user_id ), tmp_payment as( select user_id, count(*) payment_count, sum(pi.total_amount) payment_amount from dwd_payment_info pi
where date_format(pi.payment_time, 'yyyy-MM-dd')='2019-02-10' group by user_id ), tmp_comment as( select user_id, count(*) comment_count from dwd_comment_log c where date_format(c.dt, 'yyyy-MM-dd')='2019-02-10' group by user_id ) insert overwrite table dws_user_action partition(dt='2019-02-10') select user_actions.user_id, sum(user_actions.order_count), sum(user_actions.order_amount), sum(user_actions.payment_count), sum(user_actions.payment_amount), sum(user_actions.comment_count) from( select user_id, order_count, order_amount, 0 payment_count, 0 payment_amount, 0 comment_count from tmp_order union all select user_id, 0, 0, payment_count, payment_amount, 0 from tmp_payment union all select user_id, 0, 0, 0, 0, comment_count from tmp_comment ) user_actions group by user_id;
View Code

需求四.  GMV(Gross Merchandise Volume):一段時間內的成交總額

GMV拍下訂單金額;包括付款和未付款;

建表ads_gmv_sum_day語句:

drop table if exists ads_gmv_sum_day; create table ads_gmv_sum_day( `dt` string comment '統計日期', `gmv_count` bigint comment '當日GMV訂單個數', `gmv_amount` decimal(16, 2) comment '當日GMV訂單總額', `gmv_payment` decimal(16, 2) comment '當日支付金額' ) comment 'GMV' row format delimited fields terminated by '\t' location '/warehouse/gmall/ads/ads_gmv_sum_day';
View Code

導入數據: from用戶行爲寬表dws_user_action

sum(order_count)  gmv_count 、 sum(order_amount) gmv_amount 、sum(payment_amount) payment_amount  過濾日期,以dt分組;

insert into table ads_gmv_sum_day select '2019-02-10' dt, sum(order_count) gmv_count, sum(order_amount) gmv_amount, sum(payment_amount) gmv_payment from dws_user_action where dt='2019-02-10' group by dt;

編寫腳本:

#/bin/bash APP=gmall hive=/opt/module/hive/bin/hive if [ -n "$1" ]; then do_date=$1
else do_date=`date -d "-1 day" +%F` fi sql=" insert into table "$APP".ads_gmv_sum_day select '$do_date' dt, sum(order_count) gmv_count, sum(order_amount) gmv_amount, sum(payment_amount) gmv_payment from "$APP".dws_user_action where dt='$do_date' group by dt; " $hive -e "$sql";
View Code

需求五. 轉化率=新增用戶/日活用戶

           

 

ads_user_convert_day   dt   uv_m_count 當日活躍設備   new_m_count 當日新增設備   new_m_ratio 新增佔日活比率 ads_uv_count  用戶活躍數(在行爲數倉中;) day_count dt ads_new_mid_count 用戶新增表(行爲數倉中) new_mid_count create_date

 建表ads_user_convert_day

drop table if exists ads_user_convert_day; create table ads_user_convert_day( `dt` string comment '統計日期', `uv_m_count` bigint comment '當日活躍設備', `new_m_count` bigint comment '當日新增設備', `new_m_radio` decimal(10, 2) comment '當日新增佔日活比率' )comment '轉化率' row format delimited fields terminated by '\t' location '/warehouse/gmall/ads/ads_user_convert_day/';
View Code

數據導入

cast(sum( uc.nmc)/sum( uc.dc)*100 as decimal(10,2))  new_m_ratio  ; 使用union all 

insert into table ads_user_convert_day select '2019-02-10', sum(uc.dc) sum_dc, sum(uc.nmc) sum_nmc, cast(sum(uc.nmc)/sum(uc.dc) * 100 as decimal(10, 2)) new_m_radio from(select day_count dc, 0 nmc from ads_uv_count where dt='2019-02-10'
union all select 0 dc, new_mid_count from ads_new_mid_count where create_date='2019-02-10' )uc;
View Code

用戶行爲漏斗分析  

  訪問到下單轉化率| 下單到支付轉化率

ads_user_action_convert_day dt total_visitor_m_count 總訪問人數 order_u_count 下單人數 visitor2order_convert_ratio 訪問到下單轉化率 payment_u_count 支付人數 order2payment_convert_ratio 下單到支付轉化率 dws_user_action (寬表中) user_id order_count order_amount payment_count payment_amount comment_count ads_uv_count 用戶活躍數(行爲數倉中) dt day_count wk_count mn_count is_weekend is_monthend

建表

drop table if exists ads_user_action_convert_day; create table ads_user_action_convert_day( `dt` string comment '統計日期', `total_visitor_m_count` bigint comment '總訪問人數', `order_u_count` bigint comment '下單人數', `visitor2order_convert_radio` decimal(10, 2) comment '訪問到下單轉化率', `payment_u_count` bigint comment '支付人數', `order2payment_convert_radio` decimal(10, 2) comment '下單到支付的轉化率' )COMMENT '用戶行爲漏斗分析' row format delimited fields terminated by '\t' location '/warehouse/gmall/ads/ads_user_convert_day/' ;
View Code

插入數據

insert into table ads_user_action_convert_day select '2019-02-10', uv.day_count, ua.order_count, cast(ua.order_count/uv.day_count * 100 as decimal(10, 2)) visitor2order_convert_radio, ua.payment_count, cast(ua.payment_count/ua.order_count * 100 as decimal(10, 2)) order2payment_convert_radio from( select sum(if(order_count>0, 1, 0)) order_count, sum(if(payment_count>0, 1, 0)) payment_count from dws_user_action where dt='2019-02-10' )ua, ads_uv_count uv where uv.dt='2019-02-10';
View Code

需求六. 品牌復購率

  需求:以月爲單位統計,購買2次以上商品的用戶

用戶購買商品明細表 dws_sale_detail_daycount:(寬表)

建表dws_sale_detail_daycount

drop table if exists dws_sale_detail_daycount; create external table dws_sale_detail_daycount( user_id string  comment '用戶 id', sku_id string comment '商品 Id', user_gender string comment '用戶性別', user_age string  comment '用戶年齡', user_level string comment '用戶等級', order_price decimal(10,2) comment '商品價格', sku_name string   comment '商品名稱', sku_tm_id string   comment '品牌id', sku_category3_id string comment '商品三級品類id', sku_category2_id string comment '商品二級品類id', sku_category1_id string comment '商品一級品類id', sku_category3_name string comment '商品三級品類名稱', sku_category2_name string comment '商品二級品類名稱', sku_category1_name string comment '商品一級品類名稱', spu_id string comment '商品 spu', sku_num int comment '購買個數', order_count string comment '當日下單單數', order_amount string comment '當日下單金額' ) comment '用戶購買商品明細表' partitioned by(`dt` string) stored as parquet location '/warehouse/gmall/dws/dws_sale_detail_daycount' tblproperties("parquet.compression"="snappy");
View Code

數據導入

ods_order_detail訂單詳情表、dwd_user_info用戶表、dwd_sku_info商品表

with tmp_detail as( select user_id, sku_id, sum(sku_num) sku_num, count(*) order_count, sum(od.order_price*sku_num) order_amount from ods_order_detail od where od.dt='2019-02-10' and user_id is not null group by user_id, sku_id ) insert overwrite table dws_sale_detail_daycount partition(dt='2019-02-10') select tmp_detail.user_id, tmp_detail.sku_id, u.gender, months_between('2019-02-10', u.birthday)/12 age, u.user_level, price, sku_name, tm_id, category3_id , category2_id , category1_id , category3_name , category2_name , category1_name , spu_id, tmp_detail.sku_num, tmp_detail.order_count, tmp_detail.order_amount from tmp_detail left join dwd_user_info u on u.id=tmp_detail.user_id and u.dt='2019-02-10'
left join dwd_sku_info s on s.id=tmp_detail.sku_id and s.dt='2019-02-10';
View Code

ADS層 品牌復購率報表分析

建表ads_sale_tm_category1_stat_mn

 buycount 購買人數、buy_twice_last兩次以上購買人數、

 buy_twice_last_ratio '單次復購率'、

buy_3times_last '三次以上購買人數',

    buy_3times_last_ratio 屢次復購率'

drop table ads_sale_tm_category1_stat_mn; create  table ads_sale_tm_category1_stat_mn ( tm_id string comment '品牌id ' , category1_id string comment '1級品類id ', category1_name string comment '1級品類名稱 ', buycount bigint comment  '購買人數', buy_twice_last bigint  comment '兩次以上購買人數', buy_twice_last_ratio decimal(10,2)  comment  '單次復購率', buy_3times_last bigint comment   '三次以上購買人數', buy_3times_last_ratio decimal(10,2)  comment  '屢次復購率' , stat_mn string comment '統計月份', stat_date string comment '統計日期' ) COMMENT '復購率統計' row format delimited fields terminated by '\t' location '/warehouse/gmall/ads/ads_sale_tm_category1_stat_mn/' ;
View Code

插入數據

  sum(if(mn.order_count>=1,1,0)) buycount,

    sum(if(mn.order_count>=2,1,0)) buyTwiceLast,

    sum(if(mn.order_count>=2,1,0))/sum( if(mn.order_count>=1,1,0)) buyTwiceLastRatio,

    sum(if(mn.order_count>=3,1,0))  buy3timeLast  ,

    sum(if(mn.order_count>=3,1,0))/sum( if(mn.order_count>=1,1,0)) buy3timeLastRatio ,

    date_format('2019-02-10' ,'yyyy-MM') stat_mn,

insert into table ads_sale_tm_category1_stat_mn select mn.sku_tm_id, mn.sku_category1_id, mn.sku_category1_name, sum(if(mn.order_count >= 1, 1, 0)) buycount, sum(if(mn.order_count >= 2, 1, 0)) buyTwiceLast, sum(if(mn.order_count >= 2, 1, 0)) / sum(if(mn.order_count >= 1, 1, 0)) buyTwiceLastRatio, sum(if(mn.order_count >= 3, 1, 0)) buy3timeLast, sum(if(mn.order_count >= 3, 1, 0)) / sum(if(mn.order_count >= 1, 1, 0)) buy3timeLastRadio, date_format ('2019-02-10' ,'yyyy-MM') stat_mn, '2019-02-10' stat_date from ( select sd.sku_tm_id, sd.sku_category1_id, sd.sku_category1_name, user_id, sum(order_count) order_count from dws_sale_detail_daycount sd where date_format(dt, 'yyyy-MM') <= date_format('2019-02-10', 'yyyy-MM') group by sd.sku_tm_id, sd.sku_category1_id, user_id, sd.sku_category1_name ) mn group by mn.sku_tm_id, mn.sku_category1_id, mn.sku_category1_name ;
View Code

數據導入腳本

1)在/home/kris/bin目錄下建立腳本ads_sale.sh

[kris@hadoop101 bin]$ vim ads_sale.sh

#!/bin/bash # 定義變量方便修改 APP=gmall hive=/opt/module/hive/bin/hive # 若是是輸入的日期按照取輸入日期;若是沒輸入日期取當前時間的前一天 if [ -n "$1" ] ;then do_date=$1
else do_date=`date  -d "-1 day"  +%F` fi sql="  set hive.exec.dynamic.partition.mode=nonstrict; insert into table "$APP".ads_sale_tm_category1_stat_mn select mn.sku_tm_id, mn.sku_category1_id, mn.sku_category1_name, sum(if(mn.order_count>=1,1,0)) buycount, sum(if(mn.order_count>=2,1,0)) buyTwiceLast, sum(if(mn.order_count>=2,1,0))/sum( if(mn.order_count>=1,1,0)) buyTwiceLastRatio, sum(if(mn.order_count>=3,1,0)) buy3timeLast , sum(if(mn.order_count>=3,1,0))/sum( if(mn.order_count>=1,1,0)) buy3timeLastRatio , date_format('$do_date' ,'yyyy-MM') stat_mn, '$do_date' stat_date from ( select od.sku_tm_id, od.sku_category1_id, od.sku_category1_name, user_id , sum(order_count) order_count from "$APP".dws_sale_detail_daycount od where date_format(dt,'yyyy-MM')<=date_format('$do_date' ,'yyyy-MM') group by od.sku_tm_id, od.sku_category1_id, user_id, od.sku_category1_name ) mn group by mn.sku_tm_id, mn.sku_category1_id, mn.sku_category1_name; " $hive -e "$sql" 增長腳本執行權限 [kris@hadoop101 bin]$ chmod 777 ads_sale.sh 執行腳本導入數據 [kris@hadoop101 bin]$ ads_sale.sh 2019-02-11 查看導入數據 hive (gmall)>select * from ads_sale_tm_category1_stat_mn limit 2;
View Code

品牌復購率結果輸出到MySQL

  1)在MySQL中建立ads_sale_tm_category1_stat_mn表

create table ads_sale_tm_category1_stat_mn ( tm_id varchar(200) comment '品牌id ' , category1_id varchar(200) comment '1級品類id ', category1_name varchar(200) comment '1級品類名稱 ', buycount varchar(200) comment  '購買人數', buy_twice_last varchar(200) comment '兩次以上購買人數', buy_twice_last_ratio varchar(200) comment  '單次復購率', buy_3times_last varchar(200) comment   '三次以上購買人數', buy_3times_last_ratio varchar(200)  comment  '屢次復購率' , stat_mn varchar(200) comment '統計月份', stat_date varchar(200) comment '統計日期' ) 
View Code

  2)編寫Sqoop導出腳本

  在/home/kris/bin目錄下建立腳本sqoop_export.sh

  [kris@hadoop101 bin]$ vim sqoop_export.sh

#!/bin/bash db_name=gmall export_data() { /opt/module/sqoop/bin/sqoop export \ --connect "jdbc:mysql://hadoop101:3306/${db_name}?useUnicode=true&characterEncoding=utf-8" \ --username root \ --password 123456 \ --table $1 \ --num-mappers 1 \ --export-dir /warehouse/$db_name/ads/$1 \ --input-fields-terminated-by "\t" \ --update-key "tm_id,category1_id,stat_mn,stat_date" \ --update-mode allowinsert \ --input-null-string '\\N' \ --input-null-non-string '\\N' } case $1 in
  "ads_sale_tm_category1_stat_mn") export_data "ads_sale_tm_category1_stat_mn" ;; "all") export_data "ads_sale_tm_category1_stat_mn" ;; esac
View Code

3)執行Sqoop導出腳本

  [kris@hadoop101 bin]$ chmod 777 sqoop_export.sh

  [kris@hadoop101 bin]$ sqoop_export.sh all

4)在MySQL中查看結果

  SELECT * FROM ads_sale_tm_category1_stat_mn;

 

求每一個等級的用戶對應的復購率前十的商品排行

1)每一個等級,每種商品,買一次的用戶數,買兩次的用戶數=》得出復購率

2)利用開窗函數,取每一個等級的前十

3)造成腳本

用戶購買明細寬表 dws_sale_detail_daycount

① t1--按user_leval, sku_id, user_id統計下單次數

select user_level, sku_id, user_id, sum(order_count) order_count_sum from dws_sale_detail_daycount where date_format(dt, 'yyyy-MM') = date_format('2019-02-13', 'yyyy-MM') group by user_level, sku_id, user_id limit 10;
View Code

② t2 --求出每一個等級,每種商品,買一次的用戶數,買兩次的用戶數 得出復購率

select t1.user_level, t1.sku_id, sum(if(t1.order_count_sum > 0, 1, 0)) buyOneCount, sum(if(t1.order_count_sum > 1, 1, 0)) buyTwiceCount, sum(if(t1.order_count_sum > 1, 1, 0)) / sum(if(t1.order_count_sum > 0, 1, 0)) * 100 buyTwiceCountRatio, '2019-02-13' stat_date from( select user_level, sku_id, user_id, sum(order_count) order_count_sum from dws_sale_detail_daycount where date_format(dt, 'yyyy-MM') = date_format('2019-02-13', 'yyyy-MM') group by user_level, sku_id, user_id ) t1 group by t1.user_level, t1.sku_id;
View Code

③ t3 --按用戶等級分區,復購率排序

select t2.user_level, t2.sku_id, t2.buyOneCount, t2.buyTwiceCount, t2.buyTwiceCountRatio, t2.stat_date from( select t1.user_level, t1.sku_id, sum(if(t1.order_count_sum > 0, 1, 0)) buyOneCount, sum(if(t1.order_count_sum > 1, 1, 0)) buyTwiceCount, sum(if(t1.order_count_sum > 1, 1, 0)) / sum(if(t1.order_count_sum > 0, 1, 0)) * 100 buyTwiceCountRatio, '2019-02-13' stat_date from( select user_level, sku_id, user_id, sum(order_count) order_count_sum from dws_sale_detail_daycount where date_format(dt, 'yyyy-MM') = date_format('2019-02-13', 'yyyy-MM') group by user_level, sku_id, user_id ) t1 group by t1.user_level, t1.sku_id )t2
View Code

④ -分區排序 rank()

select t2.user_level, t2.sku_id, t2.buyOneCount, t2.buyTwiceCount, t2.buyTwiceCountRatio, rank() over(partition by t2.sku_id order by t2.buyTwiceCount) rankNo from( select t1.user_level, t1.sku_id, sum(if(t1.order_count_sum > 0, 1, 0)) buyOneCount, sum(if(t1.order_count_sum > 1, 1, 0)) buyTwiceCount, sum(if(t1.order_count_sum > 1, 1, 0)) / sum(if(t1.order_count_sum > 0, 1, 0)) * 100 buyTwiceCountRatio, '2019-02-13' stat_date from( select user_level, sku_id, user_id, sum(order_count) order_count_sum from dws_sale_detail_daycount where date_format(dt, 'yyyy-MM') = date_format('2019-02-13', 'yyyy-MM') group by user_level, sku_id, user_id ) t1 group by t1.user_level, t1.sku_id )t2
View Code

⑤  做爲子查詢取前10

select t3.user_level, t3.sku_id, t3.buyOneCount, t3.buyTwiceCount, t3.buyTwiceCountRatio, t3.rankNo from( select t2.user_level, t2.sku_id, t2.buyOneCount, t2.buyTwiceCount, t2.buyTwiceCountRatio, rank() over(partition by t2.sku_id order by t2.buyTwiceCount) rankNo from( select t1.user_level, t1.sku_id, sum(if(t1.order_count_sum > 0, 1, 0)) buyOneCount, sum(if(t1.order_count_sum > 1, 1, 0)) buyTwiceCount, sum(if(t1.order_count_sum > 1, 1, 0)) / sum(if(t1.order_count_sum > 0, 1, 0)) * 100 buyTwiceCountRatio, '2019-02-13' stat_date from( select user_level, sku_id, user_id, sum(order_count) order_count_sum from dws_sale_detail_daycount where date_format(dt, 'yyyy-MM') = date_format('2019-02-13', 'yyyy-MM') group by user_level, sku_id, user_id ) t1 group by t1.user_level, t1.sku_id )t2 ) t3 where rankNo <= 10;
View Code
相關文章
相關標籤/搜索