MapOutputBuffer中有一個變量叫作mapOutputFile。在sortAndSpill函數中(被flush調用),會經過這個變量拿到文件地址,並寫出中間結果,在該方法中,調用了下文中提到的writer.append(key, value)
來寫出數據。看起來沒有加密的過程。app
在執行shuffle.run()時,會對map的數據進行提取併合並。就會調用merger.close(),
實際會調用到MergeManagerlmpl的close方法,代碼以下:ide
@Override public RawKeyValueIterator close() throws Throwable { // Wait for on-going merges to complete if (memToMemMerger != null) { memToMemMerger.close(); } inMemoryMerger.close(); onDiskMerger.close(); List<InMemoryMapOutput<K, V>> memory = new ArrayList<InMemoryMapOutput<K, V>>(inMemoryMergedMapOutputs); inMemoryMergedMapOutputs.clear(); memory.addAll(inMemoryMapOutputs); inMemoryMapOutputs.clear(); List<CompressAwarePath> disk = new ArrayList<CompressAwarePath>(onDiskMapOutputs); onDiskMapOutputs.clear(); return finalMerge(jobConf, rfs, memory, disk); }
那麼咱們看到了memToMemMerger\inMemoryMerger\onDiskMerger三種不一樣的Merger,定義以下:函數
private IntermediateMemoryToMemoryMerger memToMemMerger; private final MergeThread<InMemoryMapOutput<K,V>, K,V> inMemoryMerger; private final OnDiskMerger onDiskMerger;
其中IntermediateMemoryToMemoryMerger繼承自 MergeThread<InMemoryMapOutput<K, V>, K, V>,然而MergeThread的close方法和run方法以下:oop
public synchronized void close() throws InterruptedException { closed = true; waitForMerge(); interrupt(); } public void run() { while (true) { List<T> inputs = null; try { // Wait for notification to start the merge... synchronized (pendingToBeMerged) { while(pendingToBeMerged.size() <= 0) { pendingToBeMerged.wait(); } // Pickup the inputs to merge. inputs = pendingToBeMerged.removeFirst(); } // Merge merge(inputs); } catch (InterruptedException ie) { numPending.set(0); return; } catch(Throwable t) { numPending.set(0); reporter.reportException(t); return; } finally { synchronized (this) { numPending.decrementAndGet(); notifyAll(); } } }
而imMemoryMerger則是由createInMemoryMerger函數建立,實際上是一個InMemoryMerger的實例。this
這三者都會在merge方法中建立一個Writer變量,並調用Merger.writeFile(iter, writer, reporter, jobConf)
。隨後調用writer.close()
來完成調用。close函數實現以下:加密
public void close() throws IOException { // When IFile writer is created by BackupStore, we do not have // Key and Value classes set. So, check before closing the // serializers if (keyClass != null) { keySerializer.close(); valueSerializer.close(); } // Write EOF_MARKER for key/value length WritableUtils.writeVInt(out, EOF_MARKER); WritableUtils.writeVInt(out, EOF_MARKER); decompressedBytesWritten += 2 * WritableUtils.getVIntSize(EOF_MARKER); //Flush the stream out.flush(); if (compressOutput) { // Flush compressedOut.finish(); compressedOut.resetState(); } // Close the underlying stream iff we own it... if (ownOutputStream) { out.close(); } else { // Write the checksum checksumOut.finish(); } compressedBytesWritten = rawOut.getPos() - start; if (compressOutput) { // Return back the compressor CodecPool.returnCompressor(compressor); compressor = null; } out = null; if(writtenRecordsCounter != null) { writtenRecordsCounter.increment(numRecordsWritten); } }
咱們會發現其中關鍵的就是out。out的建立以下:code
if (codec != null) { this.compressor = CodecPool.getCompressor(codec); if (this.compressor != null) { this.compressor.reset(); this.compressedOut = codec.createOutputStream(checksumOut, compressor); this.out = new FSDataOutputStream(this.compressedOut, null); this.compressOutput = true; } else { LOG.warn("Could not obtain compressor from CodecPool"); this.out = new FSDataOutputStream(checksumOut,null); } } else { this.out = new FSDataOutputStream(checksumOut,null); }
這一部分解釋了黨咱們傳入了壓縮格式的時候,中間結果如何進行壓縮。orm
幾個結論:繼承