背景web
對於數據開發人員來講,手寫sql是比較熟悉的了,就有這樣一道題,面試時須要手寫sql,這就是很是經典的連續登陸問題,大廠小廠都愛問,這種題說簡單也不簡單,說難也不難,關鍵是要有思路。
面試
真題sql
hql統計連續登錄的三天及以上的用戶ruby
這個問題能夠擴展到不少類似的問題:連續幾個月充值會員、連續天數有商品賣出、連續打車、連續逾期。bash
數據提供
微信
用戶ID、登入日期 user01,2018-02-28 user01,2018-03-01 user01,2018-03-02 user01,2018-03-04 user01,2018-03-05 user01,2018-03-06 user01,2018-03-07 user02,2018-03-01 user02,2018-03-02 user02,2018-03-03 user02,2018-03-06
輸出字段
app
+---------+--------+-------------+-------------+--+| uid | times | start_date | end_date |+---------+--------+-------------+-------------+--+
羣內討論編輯器
這道題在羣裏發出後,你們就展開了激烈的討論:學習
解決方案大數據
能夠看出來,有不少種不一樣的解決方案。
這裏就爲你們提供一種比較常見的方案:
建表
create table wedw_dw.t_login_info( user_id string COMMENT '用戶ID',login_date date COMMENT '登陸日期')row format delimitedfields terminated by ',';
導數據
hdfs dfs -put /test/login.txt /data/hive/test/wedw/dw/t_login_info/
驗證數據
select * from wedw_dw.t_login_info;+----------+-------------+--+| user_id | login_date |+----------+-------------+--+| user01 | 2018-02-28 || user01 | 2018-03-01 || user01 | 2018-03-02 || user01 | 2018-03-04 || user01 | 2018-03-05 || user01 | 2018-03-06 || user01 | 2018-03-07 || user02 | 2018-03-01 || user02 | 2018-03-02 || user02 | 2018-03-03 || user02 | 2018-03-06 |+----------+-------------+--+
解決方案
select t2.user_id as user_id,count(1) as times,min(t2.login_date) as start_date,max(t2.login_date) as end_datefrom( select t1.user_id ,t1.login_date ,date_sub(t1.login_date,rn) as date_diff from ( select user_id ,login_date ,row_number() over(partition by user_id order by login_date asc) as rn from wedw_dw.t_login_info ) t1) t2group by t2.user_id,t2.date_diffhaving times >= 3;
結果
+----------+--------+-------------+-------------+--+| user_id | times | start_date | end_date |+----------+--------+-------------+-------------+--+| user01 | 3 | 2018-02-28 | 2018-03-02 || user01 | 4 | 2018-03-04 | 2018-03-07 || user02 | 3 | 2018-03-01 | 2018-03-03 |+----------+--------+-------------+-------------+--+
思路
先把數據按照用戶id分組,根據登陸日期排序
select user_id ,login_date ,row_number() over(partition by user_id order by login_date asc) as rn from wedw_dw.t_login_info
+----------+-------------+-----+--+| user_id | login_date | rn |+----------+-------------+-----+--+| user01 | 2018-02-28 | 1 || user01 | 2018-03-01 | 2 || user01 | 2018-03-02 | 3 || user01 | 2018-03-04 | 4 || user01 | 2018-03-05 | 5 || user01 | 2018-03-06 | 6 || user01 | 2018-03-07 | 7 || user02 | 2018-03-01 | 1 || user02 | 2018-03-02 | 2 || user02 | 2018-03-03 | 3 || user02 | 2018-03-06 | 4 |+----------+-------------+-----+--+
2.用登陸日期減去排序數字rn,獲得的差值日期若是是相等的,則說明這兩天確定是連續的
select t1.user_id ,t1.login_date ,date_sub(t1.login_date,rn) as date_diff from ( select user_id ,login_date ,row_number() over(partition by user_id order by login_date asc) as rn from wedw_dw.t_login_info ) t1 ;
+----------+-------------+-------------+--+| user_id | login_date | date_diff |+----------+-------------+-------------+--+| user01 | 2018-02-28 | 2018-02-27 || user01 | 2018-03-01 | 2018-02-27 || user01 | 2018-03-02 | 2018-02-27 || user01 | 2018-03-04 | 2018-02-28 || user01 | 2018-03-05 | 2018-02-28 || user01 | 2018-03-06 | 2018-02-28 || user01 | 2018-03-07 | 2018-02-28 || user02 | 2018-03-01 | 2018-02-28 || user02 | 2018-03-02 | 2018-02-28 || user02 | 2018-03-03 | 2018-02-28 || user02 | 2018-03-06 | 2018-03-02 |+----------+-------------+-------------+--+
3.根據user_id和日期差date_diff 分組,最小登陸日期即爲這次連續登陸的開始日期start_date,最大登陸日期即爲結束日期end_date,登陸次數即爲分組後的count(1)
select t2.user_id as user_id,count(1) as times,min(t2.login_date) as start_date,max(t2.login_date) as end_datefrom( select t1.user_id ,t1.login_date ,date_sub(t1.login_date,rn) as date_diff from ( select user_id ,login_date ,row_number() over(partition by user_id order by login_date asc) as rn from wedw_dw.t_login_info ) t1) t2group by t2.user_id,t2.date_diffhaving times >= 3;
+----------+--------+-------------+-------------+--+| user_id | times | start_date | end_date |+----------+--------+-------------+-------------+--+| user01 | 3 | 2018-02-28 | 2018-03-02 || user01 | 4 | 2018-03-04 | 2018-03-07 || user02 | 3 | 2018-03-01 | 2018-03-03 |+----------+--------+-------------+-------------+--+
結束語
以上僅提供了一種解決方案,小夥伴有其餘方案的話,能夠進羣交流哦
本文分享自微信公衆號 - 大數據私房菜(datagogogo)。
若有侵權,請聯繫 support@oschina.cn 刪除。
本文參與「OSC源創計劃」,歡迎正在閱讀的你也加入,一塊兒分享。