統計用戶留存率,一次統計多個模塊,多個日期的留存率;html
以前用:SQL-用戶月留存率 經過left join 不等式斷定;web
可是 hive 不支持 在 left join 中 不等式斷定sql
so 經過另一種方式實現 用戶留存率:微信
代碼以下:app
with da_user as (
select
from_unixtime(unix_timestamp(ds,'yyyyMMdd'),'yyyy-MM-dd') as ds
,user_id
,regexp_extract(args,'project_id=(\\d+)',1) as project_id
from ods_view_ypp.ods_all_mobile_log log
where ds between '20190523' and '20190621'
and app_id = 100
and regexp_extract(args,'project_id=(\\d+)',1) = 1034
group by
from_unixtime(unix_timestamp(ds,'yyyyMMdd'),'yyyy-MM-dd')
,user_id
,regexp_extract(args,'project_id=(\\d+)',1)
)
select ds
,total_cnt
,concat_ws('% | ', cast(round(diff_1cnt*100/total_cnt, 2) as string), cast(diff_1cnt as string)) a
,concat_ws('% | ', cast(round(diff_2cnt*100/total_cnt, 2) as string), cast(diff_2cnt as string)) b
,concat_ws('% | ', cast(round(diff_3cnt*100/total_cnt, 2) as string), cast(diff_3cnt as string)) c
,concat_ws('% | ', cast(round(diff_4cnt*100/total_cnt, 2) as string), cast(diff_4cnt as string)) d
from(
select
t1.ds
,count(distinct t1.user_id) as total_cnt
,count(distinct if(datediff(t2.ds,t1.ds)=1,t1.user_id,null)) as diff_1cnt
,count(distinct if(datediff(t2.ds,t1.ds)=2,t1.user_id,null)) as diff_2cnt
,count(distinct if(datediff(t2.ds,t1.ds)=3,t1.user_id,null)) as diff_3cnt
,count(distinct if(datediff(t2.ds,t1.ds)=4,t1.user_id,null)) as diff_4cnt
from da_user t1
left join da_user t2
on (t1.user_id = t2.user_id )
group by t1.ds
)t
結果以下圖所示:spa
參考地址:.net
[hive 關於用戶留存率的計算 - chenpe32cp的博客 - CSDN博客](https://blog.csdn.net/chenpe32cp/article/details/85068184) unix
[【hive】關於用戶留存率的計算 - zzhangyuhang - 博客園](https://www.cnblogs.com/zzhangyuhang/p/9884967.html)code
本文分享自微信公衆號 - SQL數據分析(dianwu_dw)。
若有侵權,請聯繫 support@oschina.cn 刪除。
本文參與「OSC源創計劃」,歡迎正在閱讀的你也加入,一塊兒分享。regexp