參考:lxw大數據田地:http://lxw1234.com/archives/2015/04/193.htmcookie
數據準備:大數據
CREATE EXTERNAL TABLE test_data ( month STRING, day STRING, cookieid STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' stored as textfile location '/user/jc_rc_ftp/test_data'; select * from test_data l; +----------+-------------+-------------+--+ | l.month | l.day | l.cookieid | +----------+-------------+-------------+--+ | 2015-03 | 2015-03-10 | cookie1 | | 2015-03 | 2015-03-10 | cookie5 | | 2015-03 | 2015-03-12 | cookie7 | | 2015-04 | 2015-04-12 | cookie3 | | 2015-04 | 2015-04-13 | cookie2 | | 2015-04 | 2015-04-13 | cookie4 | | 2015-04 | 2015-04-16 | cookie4 | | 2015-03 | 2015-03-10 | cookie2 | | 2015-03 | 2015-03-10 | cookie3 | | 2015-04 | 2015-04-12 | cookie5 | | 2015-04 | 2015-04-13 | cookie6 | | 2015-04 | 2015-04-15 | cookie3 | | 2015-04 | 2015-04-15 | cookie2 | | 2015-04 | 2015-04-16 | cookie1 | +----------+-------------+-------------+--+ 14 rows selected (0.249 seconds)
在一個GROUP BY查詢中,根據不一樣的維度組合進行聚合,等價於將不一樣維度的GROUP BY結果集進行UNION ALLspa
SELECT month, day, COUNT(DISTINCT cookieid) AS uv, GROUPING__ID FROM test_data GROUP BY month,day GROUPING SETS (month,day) ORDER BY GROUPING__ID; 等價於 SELECT month,NULL,COUNT(DISTINCT cookieid) AS uv,1 AS GROUPING__ID FROM test_data GROUP BY month UNION ALL SELECT NULL,day,COUNT(DISTINCT cookieid) AS uv,2 AS GROUPING__ID FROM test_data GROUP BY day +----------+-------------+-----+---------------+--+ | month | day | uv | grouping__id | +----------+-------------+-----+---------------+--+ | 2015-04 | NULL | 6 | 1 | | 2015-03 | NULL | 5 | 1 | | NULL | 2015-04-16 | 2 | 2 | | NULL | 2015-04-15 | 2 | 2 | | NULL | 2015-04-13 | 3 | 2 | | NULL | 2015-04-12 | 2 | 2 | | NULL | 2015-03-12 | 1 | 2 | | NULL | 2015-03-10 | 4 | 2 | +----------+-------------+-----+---------------+--+ 8 rows selected (177.299 seconds) SELECT month, day, COUNT(DISTINCT cookieid) AS uv, GROUPING__ID FROM test_data GROUP BY month,day GROUPING SETS (month,day,(month,day)) ORDER BY GROUPING__ID; 等價於 SELECT month,NULL,COUNT(DISTINCT cookieid) AS uv,1 AS GROUPING__ID FROM test_data GROUP BY month UNION ALL SELECT NULL,day,COUNT(DISTINCT cookieid) AS uv,2 AS GROUPING__ID FROM test_data GROUP BY day UNION ALL SELECT month,day,COUNT(DISTINCT cookieid) AS uv,3 AS GROUPING__ID FROM test_data GROUP BY month,day +----------+-------------+-----+---------------+--+ | month | day | uv | grouping__id | +----------+-------------+-----+---------------+--+ | 2015-04 | NULL | 6 | 1 | | 2015-03 | NULL | 5 | 1 | | NULL | 2015-03-10 | 4 | 2 | | NULL | 2015-04-16 | 2 | 2 | | NULL | 2015-04-15 | 2 | 2 | | NULL | 2015-04-13 | 3 | 2 | | NULL | 2015-04-12 | 2 | 2 | | NULL | 2015-03-12 | 1 | 2 | | 2015-04 | 2015-04-16 | 2 | 3 | | 2015-04 | 2015-04-12 | 2 | 3 | | 2015-04 | 2015-04-13 | 3 | 3 | | 2015-03 | 2015-03-12 | 1 | 3 | | 2015-03 | 2015-03-10 | 4 | 3 | | 2015-04 | 2015-04-15 | 2 | 3 | +----------+-------------+-----+---------------+--+
備註:其中的 GROUPING__ID,表示結果屬於哪個分組集合。code
根據GROUP BY的維度的全部組合進行聚合。htm
SELECT month, day, COUNT(DISTINCT cookieid) AS uv, GROUPING__ID FROM test_data GROUP BY month,day WITH CUBE ORDER BY GROUPING__ID; 等價於 SELECT NULL,NULL,COUNT(DISTINCT cookieid) AS uv,0 AS GROUPING__ID FROM test_data UNION ALL SELECT month,NULL,COUNT(DISTINCT cookieid) AS uv,1 AS GROUPING__ID FROM test_data GROUP BY month UNION ALL SELECT NULL,day,COUNT(DISTINCT cookieid) AS uv,2 AS GROUPING__ID FROM test_data GROUP BY day UNION ALL SELECT month,day,COUNT(DISTINCT cookieid) AS uv,3 AS GROUPING__ID FROM test_data GROUP BY month,day +----------+-------------+-----+---------------+--+ | month | day | uv | grouping__id | +----------+-------------+-----+---------------+--+ | NULL | NULL | 7 | 0 | | 2015-03 | NULL | 5 | 1 | | 2015-04 | NULL | 6 | 1 | | NULL | 2015-04-16 | 2 | 2 | | NULL | 2015-04-15 | 2 | 2 | | NULL | 2015-04-13 | 3 | 2 | | NULL | 2015-04-12 | 2 | 2 | | NULL | 2015-03-12 | 1 | 2 | | NULL | 2015-03-10 | 4 | 2 | | 2015-04 | 2015-04-12 | 2 | 3 | | 2015-04 | 2015-04-16 | 2 | 3 | | 2015-03 | 2015-03-12 | 1 | 3 | | 2015-03 | 2015-03-10 | 4 | 3 | | 2015-04 | 2015-04-15 | 2 | 3 | | 2015-04 | 2015-04-13 | 3 | 3 | +----------+-------------+-----+---------------+--+
是CUBE的子集,以最左側的維度爲主,從該維度進行層級聚合。blog
好比,以month維度進行層級聚合: SELECT month, day, COUNT(DISTINCT cookieid) AS uv, GROUPING__ID FROM test_data GROUP BY month,day WITH ROLLUP ORDER BY GROUPING__ID; 能夠實現這樣的上鑽過程:月天的UV->月的UV->總UV +----------+-------------+-----+---------------+--+ | month | day | uv | grouping__id | +----------+-------------+-----+---------------+--+ | NULL | NULL | 7 | 0 | | 2015-04 | NULL | 6 | 1 | | 2015-03 | NULL | 5 | 1 | | 2015-04 | 2015-04-16 | 2 | 3 | | 2015-04 | 2015-04-15 | 2 | 3 | | 2015-04 | 2015-04-13 | 3 | 3 | | 2015-04 | 2015-04-12 | 2 | 3 | | 2015-03 | 2015-03-12 | 1 | 3 | | 2015-03 | 2015-03-10 | 4 | 3 | +----------+-------------+-----+---------------+--+ --把month和day調換順序,則以day維度進行層級聚合: SELECT day, month, COUNT(DISTINCT cookieid) AS uv, GROUPING__ID FROM test_data GROUP BY day,month WITH ROLLUP ORDER BY GROUPING__ID; +-------------+----------+-----+---------------+--+ | day | month | uv | grouping__id | +-------------+----------+-----+---------------+--+ | NULL | NULL | 7 | 0 | | 2015-04-12 | NULL | 2 | 1 | | 2015-04-15 | NULL | 2 | 1 | | 2015-03-12 | NULL | 1 | 1 | | 2015-04-16 | NULL | 2 | 1 | | 2015-03-10 | NULL | 4 | 1 | | 2015-04-13 | NULL | 3 | 1 | | 2015-04-16 | 2015-04 | 2 | 3 | | 2015-04-15 | 2015-04 | 2 | 3 | | 2015-04-13 | 2015-04 | 3 | 3 | | 2015-03-12 | 2015-03 | 1 | 3 | | 2015-03-10 | 2015-03 | 4 | 3 | | 2015-04-12 | 2015-04 | 2 | 3 | +-------------+----------+-----+---------------+--+
能夠實現這樣的上鑽過程:
天月的UV->天的UV->總UV
(這裏,根據天和月進行聚合,和根據天聚合結果同樣,由於有父子關係,若是是其餘維度組合的話,就會不同)io