Mapreduce程序中reduce的Iterable參數迭代出是同一個對象

時間 2019-12-06

標籤 mapreduce 程序 reduce iterable 參數迭代同一個對象欄目 Hadoop 简体版

原文原文鏈接

今天在對reduce的參數Iterable進行迭代時，發現一個問題，即Iterator的next()方法每次返回的是同一個對象，next()只是修改了Writable對象的值，而不是從新返回一個新的Writable對象。java

使用wordcount來驗證：apache

個人代碼以下：
oop

protected void reduce(Text key, Iterable<IntWritable> values,
        Reducer<Text, IntWritable, Text, IntWritable>.Context context)
        throws IOException, InterruptedException {
    int sum = 0;

    // 保存每一個IntWritable到list
    List<IntWritable> intWritables = new ArrayList<IntWritable>();

    for (IntWritable val : values) {
        intWritables.add(val);
        sum += val.get();
    }

    if(intWritables.size() > 1) {
        // 當list size大於1時，驗證第一個元素和第二個元素是不是同一個對象
        System.out.println("objects is same -> "
                + (intWritables.get(0) == intWritables.get(1)));
    }

    result.set(sum);
    context.write(key, result);
}

日誌輸出：spa

objects is same -> true日誌

這個Iterable的實現是org.apache.hadoop.mapreduce.task.ReduceContextImpl.ValueIterablecode

Iterator實現是org.apache.hadoop.mapreduce.task.ReduceContextImpl.ValueIteratororm

其中next()實現時，調用的是org.apache.hadoop.io.serializer.WritableSerialization的deserialize(Writable w)方法，對象

Writable deserialize(Writable w) IOException {
  Writable writable;
  (w == ) {
    writable 
      = (Writable) ReflectionUtils.(, getConf());
  } {
    writable = w;
  }
  writable.readFields();
  writable;
}

該方法只是調用了入參w的readFields方法，並無建立新對象，除非w是nullhadoop