1、相關分析服務器
一般當聚合率和數據量沒有大於必定程度時,對於不涉及Rollup、Cube、Grouping_Sets這三種操做的聚合不多出現GC問題。對於Rollup、Cube、Grouping_Sets操做可採用以下優化方法避免GC。性能
一、Rollup / Cube / Grouping_Sets時,某些場景下,若是多維度的字段比較多,內存或者GC會形成性能問題。特別的, 在實現這三種操做 時, 記錄數會出現倍數的膨脹, 調優的時候請務必關注 GC 狀況。 若是 GC性能狀況表現不加, 建議用手動改動的方式調優, 一般是把這三種操做等價的用 UNION 多個子查詢 SQL 的方式實現。 對 SQL 改寫至關因而對它們計算內容的同語義翻譯。優化
一、1Rollup的改寫spa
對它等價的拆分改寫結果以下,上下兩個語句的結果相同:翻譯
一、2Cube改寫3d
能夠看出前三個的Union塊的結果等同於一個Cube,因此還能夠改寫爲code
一、3Grouping Sets的改寫blog
對它等價的拆分改寫結果以下,上下兩個語句的結果相同:內存
總結:能夠按照以上所示的對三種操做的改寫形式對語句展開優化,儘量的減小因內存和GC引起的性能問題。可是,通常狀況下,若是GC問題不是特別嚴重,就不用改寫,不然會致使性能更差。io
2、對比Group by、Cube、Rollup
-1、建立表 CREATE TABLE employee_part(department STRING,name STRING,salary int) CLUSTERED BY (department) INTO 7 BUCKETS STORED AS ORC tblproperties('transactional'='true'); --2、入數據 insert into employee_part values('A','ZHANG',100); insert into employee_part values('A','LI',200); insert into employee_part values('A','WANG',300); insert into employee_part values('A','DUAN',500); insert into employee_part values('B','DUAN',600 ); insert into employee_part values('B','DUAN',700); insert into employee_part values('A','ZHAO',400);
--3、Group by SELECT department,name,sum(salary)AS sum FROM employee_part GROUP BY department,name;
--4、Rollup SELECT department,name,sum(salary)AS sum FROM employee_part GROUP BY Rollup(department,name); 等價於 SELECT department,name,sum(salary)AS sum FROM employee_part GROUP BY department,name union SELECT department,'NULL',SUM(salary)AS sum FROM employee_part GROUP BY department union SELECT 'NULL','NULL',SUM(salary)AS sum FROM employee_part;
--5、CUBE SELECT department,name,sum(salary)AS sum FROM employee_part GROUP BY Cube(department,name); 等價於 SELECT department,name,sum(salary)AS sum FROM employee_part GROUP BY department,name union SELECT department,'NULL',SUM(salary)AS sum FROM employee_part GROUP BY department union SELECT 'NULL','NULL',SUM(salary)AS sum FROM employee_part UNION SELECT 'NULL', name, SUM(Salary) AS sum FROM employee_part GROUP BY name; 等價於 SELECT department,name,sum(salary)AS sum FROM employee_part GROUP BY Rollup(department,name) UNION SELECT 'NULL', name, SUM(Salary) AS sum FROM employee_part GROUP BY name;
能夠看出CUBE的結果集在Rollup結果集上多出了5行,這5行至關於在Rollup結果集上再union上以員工名字爲group by 的結果。