GROUPING SETS,GROUPING__ID,CUBE,ROLLUPcookie
這幾個分析函數一般用於OLAP中,不能累加,並且須要根據不一樣維度上鑽和下鑽的指標統計,好比,分小時、天、月的UV數。函數
2015-03,2015-03-10,cookie1 2015-03,2015-03-10,cookie5 2015-03,2015-03-12,cookie7 2015-04,2015-04-12,cookie3 2015-04,2015-04-13,cookie2 2015-04,2015-04-13,cookie4 2015-04,2015-04-16,cookie4 2015-03,2015-03-10,cookie2 2015-03,2015-03-10,cookie3 2015-04,2015-04-12,cookie5 2015-04,2015-04-13,cookie6 2015-04,2015-04-15,cookie3 2015-04,2015-04-15,cookie2 2015-04,2015-04-16,cookie1
use cookie; drop table if exists cookie5; create table cookie5(month string, day string, cookieid string) row format delimited fields terminated by ','; load data local inpath "/home/hadoop/cookie5.txt" into table cookie5; select * from cookie5;
在一個GROUP BY查詢中,根據不一樣的維度組合進行聚合,等價於將不一樣維度的GROUP BY結果集進行UNION ALLoop
GROUPING__ID,表示結果屬於哪個分組集合。code
select month, day, count(distinct cookieid) as uv, GROUPING__ID from cookie.cookie5 group by month,day grouping sets (month,day) order by GROUPING__ID;
SELECT month,NULL,COUNT(DISTINCT cookieid) AS uv,1 AS GROUPING__ID FROM cookie5 GROUP BY month UNION ALL SELECT NULL,day,COUNT(DISTINCT cookieid) AS uv,2 AS GROUPING__ID FROM cookie5 GROUP BY day
第一列是按照month進行分組orm
第二列是按照day進行分組blog
第三列是按照month或day分組是,統計這一組有幾個不一樣的cookieidhadoop
第四列grouping_id表示這一組結果屬於哪一個分組集合,根據grouping sets中的分組條件month,day,1是表明month,2是表明daystring
SELECT month, day, COUNT(DISTINCT cookieid) AS uv, GROUPING__ID FROM cookie5 GROUP BY month,day GROUPING SETS (month,day,(month,day)) ORDER BY GROUPING__ID;
SELECT month,NULL,COUNT(DISTINCT cookieid) AS uv,1 AS GROUPING__ID FROM cookie5 GROUP BY month UNION ALL SELECT NULL,day,COUNT(DISTINCT cookieid) AS uv,2 AS GROUPING__ID FROM cookie5 GROUP BY day UNION ALL SELECT month,day,COUNT(DISTINCT cookieid) AS uv,3 AS GROUPING__ID FROM cookie5 GROUP BY month,day
根據GROUP BY的維度的全部組合進行聚合it
SELECT month, day, COUNT(DISTINCT cookieid) AS uv, GROUPING__ID FROM cookie5 GROUP BY month,day WITH CUBE ORDER BY GROUPING__ID;
SELECT NULL,NULL,COUNT(DISTINCT cookieid) AS uv,0 AS GROUPING__ID FROM cookie5 UNION ALL SELECT month,NULL,COUNT(DISTINCT cookieid) AS uv,1 AS GROUPING__ID FROM cookie5 GROUP BY month UNION ALL SELECT NULL,day,COUNT(DISTINCT cookieid) AS uv,2 AS GROUPING__ID FROM cookie5 GROUP BY day UNION ALL SELECT month,day,COUNT(DISTINCT cookieid) AS uv,3 AS GROUPING__ID FROM cookie5 GROUP BY month,day
是CUBE的子集,以最左側的維度爲主,從該維度進行層級聚合table
-- 好比,以month維度進行層級聚合
SELECT month, day, COUNT(DISTINCT cookieid) AS uv, GROUPING__ID FROM cookie5 GROUP BY month,day WITH ROLLUP ORDER BY GROUPING__ID;
能夠實現這樣的上鑽過程:
月天的UV->月的UV->總UV
--把month和day調換順序,則以day維度進行層級聚合:
能夠實現這樣的上鑽過程:
天月的UV->天的UV->總UV
(這裏,根據天和月進行聚合,和根據天聚合結果同樣,由於有父子關係,若是是其餘維度組合的話,就會不同)