MapReduce中的一些自定義-------總結

時間 2020-01-24

標籤 mapreduce 一些自定義總結欄目 Hadoop 简体版

原文原文鏈接

1.計數器：能夠讓開發人員以全局的視角來審查程序運行情況和各個指標。數組

得到計數器：Conter myConter = config.getConter("組的名字"，"計數器名");
網絡

爲計數器設置初值：myConter.setValue(初始值);
ide

增長：myConter.increment();
spa

2.Combiners（規約）排序

每個map會產生大量的輸出，combiner的做用就是在map端對輸出作一次合併，以減小到reduce的數據量，網絡傳輸少。
開發

只能在本地map中進行合併，並不能跨map執行，因此還須要reduce
rem

combiner是選配的，由於對於某些邏輯，使用前與使用後的計算結果不一致。
get

job.setCombinerClass(MyReduce.class);
it

3.Partitioner（分組）io

1.mapreduce的默認partitioner是HashPartitioner

2.自定義

class KpiPartitioner extends Partitioner<Text, KpiWritable>{

@Override

public int getPartition(Text key, LongWritable value, int numPartitions) {

return (key.toString().length()==11)?0:1;

}

而後在main方法中加入

job.setPartitionerClass(KpiPartitioner.class);

job.setNumberReduceTasks(2);

4.排序和分組

1.在map和reduce階段進行排序時，比較的是k2,v2是不參與排序比較的，若是想讓v2參與排序，須要把k2和v2組裝成新的類，做爲k2，才能比較。

2.分組也是按照k2進行的。

class NewGroup implements RawComparator<NewKey>{

/**

* 比較字節數組中指定的字節序列的大小

* b1：第一個參與比較的數組

* b2：第二個參與比較的數組

* s1：第一個參與比較的字節數組的開始位置

* s2：第二個

* l1：比較長度

@Override

public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {

return WritableComparator.compareBytes(b1, s1, 8, b2, s2, 8);

}

@Override

public int compare(NewKey o1, NewKey o2) {

// TODO Auto-generated method stub

return 0;

}

而後在main中

job.setGroupingComparatorClass(NewGroup.class).

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。