Group by後加rollup、cube、Grouping_Sets的用法區別

1、相關分析服務器

一般當聚合率和數據量沒有大於必定程度時,對於不涉及Rollup、Cube、Grouping_Sets這三種操做的聚合不多出現GC問題。對於Rollup、Cube、Grouping_Sets操做可採用以下優化方法避免GC。性能

一、Rollup / Cube / Grouping_Sets時,某些場景下,若是多維度的字段比較多,內存或者GC會形成性能問題。特別的, 在實現這三種操做 時, 記錄數會出現倍數的膨脹, 調優的時候請務必關注 GC 狀況。 若是 GC性能狀況表現不加, 建議用手動改動的方式調優, 一般是把這三種操做等價的用 UNION 多個子查詢 SQL 的方式實現。 對 SQL 改寫至關因而對它們計算內容的同語義翻譯。優化

一、1Rollup的改寫spa

對它等價的拆分改寫結果以下,上下兩個語句的結果相同:翻譯

一、2Cube改寫3d

能夠看出前三個的Union塊的結果等同於一個Cube,因此還能夠改寫爲code

一、3Grouping Sets的改寫blog

對它等價的拆分改寫結果以下,上下兩個語句的結果相同:內存

總結:能夠按照以上所示的對三種操做的改寫形式對語句展開優化,儘量的減小因內存和GC引起的性能問題。可是,通常狀況下,若是GC問題不是特別嚴重,就不用改寫,不然會致使性能更差。io

2、對比Group by、Cube、Rollup

Rollup運算符生成的結果集相似於Cube運算符生成的結果集。
CUBE和Rollup之間的具體區別:
    v一、CUBE生成的結果集顯示了所選列中值的全部組合的聚合
    v二、Rollup生成的結果集顯示了所選列中值的某一層次結構的聚合。
 
Rollup優勢:
    v一、Rollup返回單個結果集,而compute by返回多個結果集,而多個結果集會增長應用程序代碼的複雜性。
    v二、Rollup能夠在服務器遊標中使用,而compute by則不能夠。
    v三、查詢優化器爲Rollup生成的執行計算比爲compute by生成的更爲高效。
3、實例
-1、建立表 CREATE TABLE employee_part(department STRING,name STRING,salary int) CLUSTERED BY (department) INTO 7 BUCKETS STORED AS ORC tblproperties('transactional'='true'); --2、入數據 insert into employee_part values('A','ZHANG',100); insert into employee_part values('A','LI',200); insert into employee_part values('A','WANG',300); insert into employee_part values('A','DUAN',500); insert into employee_part values('B','DUAN',600 ); insert into employee_part values('B','DUAN',700); insert into employee_part values('A','ZHAO',400);
--3、Group by SELECT department,name,sum(salary)AS sum FROM employee_part GROUP BY department,name;
--4、Rollup SELECT department,name,sum(salary)AS sum FROM employee_part GROUP BY Rollup(department,name); 等價於 SELECT department,name,sum(salary)AS sum FROM employee_part GROUP BY department,name union SELECT department,'NULL',SUM(salary)AS sum FROM employee_part GROUP BY department union SELECT 'NULL','NULL',SUM(salary)AS sum FROM employee_part;
--5、CUBE SELECT department,name,sum(salary)AS sum FROM employee_part GROUP BY Cube(department,name); 等價於 SELECT department,name,sum(salary)AS sum FROM employee_part GROUP BY department,name union SELECT department,'NULL',SUM(salary)AS sum FROM employee_part GROUP BY department union SELECT 'NULL','NULL',SUM(salary)AS sum FROM employee_part UNION SELECT 'NULL', name, SUM(Salary) AS sum FROM employee_part GROUP BY name; 等價於 SELECT department,name,sum(salary)AS sum FROM employee_part GROUP BY Rollup(department,name) UNION SELECT 'NULL', name, SUM(Salary) AS sum FROM employee_part GROUP BY name;

能夠看出CUBE的結果集在Rollup結果集上多出了5行,這5行至關於在Rollup結果集上再union上以員工名字爲group by 的結果。

相關文章
相關標籤/搜索