Hive ERROR: Out of memory due to hash maps used in map-side aggregation .

當hive在執行大數據量的統計查詢語句時,常常會出現下面OOM錯誤,具體錯誤提示以下:html

 

Possible error: Out of memory due to hash maps used in map-side aggregation. Solution: Currently hive.map.aggr.hash.percentmemory is set to 0.5. Try setting it to a lower value. i.e 'set hive.map.aggr.hash.percentmemory = 0.25;'

查看task的失敗信息爲:sql

 

Error:GC overhead limit exceeded

 

對於這個錯誤,一般是由兩種狀況形成的:(1) hive sql寫的不合理,致使執行時hash map過大;(2)hive sql沒有優化的餘地了(要獲得想要的數據只能寫這樣的sql)。ide

對於(1)則改變sql語句,從而下降hash map的大小。對於(2)則能夠調整參數。大數據

下面分別說明(1)和(2)的狀況:優化

(1)改變sql語句spa

 

select count(distinct v) from tbl; 能夠改成select count(1) from (select v from tbl group by v) t;

 

說明:減小了hash map的key個數 .net

 

select collect_set(messageDate)[0],count(*) from incidents_hive group by substr(messageDate,8,2); 能夠改成select hourNum, count(1) from (select substr(messageDate,9,2) as hourNum from incidents_hive ) t group by hourNum;

 

說明:沒有減小hash map的key個數,可是減小了value的大小code

(2)調整參數htm

對於這個sql語句,是沒辦法進行優化(由於keywords的重複率很低,致使map階段裏面維護的一個內存Map對象很是巨大)來下降hash map大小的:對象

 

 

INSERT OVERWRITE TABLE hbase_table_poi_keywords_count SELECT concat(substr(key,0,8), svccode, keywords), substr(key,0,8), svccode, keywords, count(*) where substr(key,0,8)=\"$yesterday\" AND length(keywords)>0 AND svccode is not null GROUP BY substr(key,0,8),svccode,keywords;

 

與mapjoin和map aggregate相關的優化參數有:

hive.map.aggr

hive.groupby.mapaggr.checkinterval

hive.map.aggr.hash.min.reduction

hive.map.aggr.hash.percentmemory

hive.groupby.skewindata

以上參數能夠查看配置文件說明即文檔進行調整。若是需求確實無法經過調整這些參數來達到,那麼set hive.map.aggr=false即是最終的方案,它確定能知足你需求,只是執行速度比map join 和 map aggr慢些,但經過實際跑數據你極可能發現其實它也不慢哈。

 

參考文章:

http://blog.csdn.net/macyang/article/details/9260777
http://www.myexception.cn/open-source/1487747.html
http://blog.csdn.net/lixucpf/article/details/20458617

 

 

 

INSERT OVERWRITE TABLE hbase_table_poi_keywords_count SELECT concat(substr(key,0,8), svccode, keywords), substr(key,0,8), svccode, keywords, count(*) where substr(key,0,8)=\"$yesterday\" AND length(keywords)>0 AND svccode is not null GROUP BY substr(key,0,8),svccode,keywords;

 轉自 http://blog.csdn.net/xyls12345/article/details/25418671

相關文章
相關標籤/搜索