TEZ MRR optimize to MR？

時間 2020-04-18

標籤 tez mrr optimize 简体版

原文原文鏈接

https://issues.apache.org/jira/browse/HIVE-2340apache

select userid,count(*) from u_data group by userid order by userid will product MRR.less

I think when the result of userid,count(*) is small(one reduce can process the result) . This query plan can optimize to MR ?ide

To prevent bad reducer merging, the reducer merging only kicks in when thespa

optimizer thinks it gets a perf boost.pwa

MR -> MRR is not a big win when it comes Tez, due to container-reuse -code

going wide on the large cardinality in case of missing map-sideci

aggregation will be safer.get

If hive.map.aggr=true and the userid set fits within memory, then smushingit

the reducers would be nicer.io

To reset the wide-narrow checks, do

set hive.optimize.reducededuplication.min.reducer=1;

But be aware that it will fail (I1ve seen full disks) as you scale upwards

to the 10+ Tb cases.

Cheers,

Gopal

hive.optimize.reducededuplication.min.reducer

Default Value: 4
Added In: Hive 0.11.0 with HIVE-2340

Reduce deduplication merges two RSs (reduce sink operators) by moving key/parts/reducer-num of the child RS to parent RS. That means if reducer-num of the child RS is fixed (order by or forced bucketing) and small, it can make very slow, single MR. The optimization will be disabled if number of reducers is less than specified value.

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。