今天在對reduce的參數Iterable進行迭代時,發現一個問題,即Iterator的next()方法每次返回的是同一個對象,next()只是修改了Writable對象的值,而不是從新返回一個新的Writable對象。java
使用wordcount來驗證:apache
個人代碼以下:
oop
protected void reduce(Text key, Iterable<IntWritable> values, Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException { int sum = 0; // 保存每一個IntWritable到list List<IntWritable> intWritables = new ArrayList<IntWritable>(); for (IntWritable val : values) { intWritables.add(val); sum += val.get(); } if(intWritables.size() > 1) { // 當list size大於1時,驗證第一個元素和第二個元素是不是同一個對象 System.out.println("objects is same -> " + (intWritables.get(0) == intWritables.get(1))); } result.set(sum); context.write(key, result); }
日誌輸出:spa
objects is same -> true日誌
這個Iterable的實現是org.apache.hadoop.mapreduce.task.ReduceContextImpl.ValueIterablecode
Iterator實現是org.apache.hadoop.mapreduce.task.ReduceContextImpl.ValueIteratororm
其中next()實現時,調用的是org.apache.hadoop.io.serializer.WritableSerialization的deserialize(Writable w)方法,對象
Writable deserialize(Writable w) IOException { Writable writable; (w == ) { writable = (Writable) ReflectionUtils.(, getConf()); } { writable = w; } writable.readFields(); writable; }
該方法只是調用了入參w的readFields方法,並無建立新對象,除非w是nullhadoop