大數據： Hadoop reduce階段

時間 2019-11-17

原文原文鏈接

Mapreduce中因爲sort的存在，MapTask和ReduceTask直接是工做流的架構。而不是數據流的架構。在MapTask還沒有結束，其輸出結果還沒有排序及合併前，ReduceTask是又有數據輸入的，所以即便ReduceTask已經建立也只能睡眠等待MapTask完成。從而能夠從MapTask節點獲取數據。一個MapTask最終的數據輸出是一個合併的spill文件，能夠經過Web地址訪問。因此reduceTask通常在MapTask快要完成的時候才啓動。啓動早了浪費container資源。java

ReduceTask是個線程，這個線程運行在YarnChild的Java虛擬機上，咱們從ReduceTask.run開始看Reduce階段。獲取更多大數據視頻資料請加QQ羣：947967114apache

public void run(JobConf job, final TaskUmbilicalProtocol umbilical)api

throws IOException, InterruptedException, ClassNotFoundException {數組

job.setBoolean(JobContext.SKIP_RECORDS, isSkipping());網絡

if (isMapOrReduce()) {架構

/*添加reduce過程須要通過的幾個階段。以便通知TaskTracker目前運行的狀況*/app

copyPhase = getProgress().addPhase("copy");框架

sortPhase = getProgress().addPhase("sort");eclipse

reducePhase = getProgress().addPhase("reduce");socket

}

// start thread that will handle communication with parent

TaskReporter reporter = startReporter(umbilical);

// 設置並啓動reporter進程以便和TaskTracker進行交流

boolean useNewApi = job.getUseNewReducer();

//在job client中初始化job時，默認就是用新的API，詳見Job.setUseNewAPI()方法

initialize(job, getJobID(), reporter, useNewApi);

/*用來初始化任務，主要是進行一些和任務輸出相關的設置，好比建立commiter，設置工做目錄等*/

// check if it is a cleanupJobTask

/*如下4個if語句均是根據任務類型的不一樣進行相應的操做，這些方法均是Task類的方法，因此與任務是MapTask仍是ReduceTask無關*/

if (jobCleanup) {

runJobCleanupTask(umbilical, reporter);

return;//只是爲了JobCleanup，作完就停

}

if () {

runJobSetupTask(umbilical, reporter);

return;

//主要是建立工做目錄的FileSystem對象

}

if (taskCleanup) {

runTaskCleanupTask(umbilical, reporter);

return;

//設置任務目前所處的階段爲結束階段，而且刪除工做目錄

}

下面纔是真正要成爲reducer

// Initialize the codec

codec = initCodec();

RawKeyValueIterator rIter = null;

ShuffleConsumerPlugin shuffleConsumerPlugin = null;

Class combinerClass = conf.getCombinerClass();

CombineOutputCollector combineCollector =

(null != combinerClass) ?

new CombineOutputCollector(reduceCombineOutputCounter, reporter, conf) : null;

//若是須要就建立combineCollector

Classextends ShuffleConsumerPlugin> clazz =

job.getClass(MRConfig.SHUFFLE_CONSUMER_PLUGIN, Shuffle.class, ShuffleConsumerPlugin.class);

//配置文件找mapreduce.job.reduce.shuffle.consumer.plugin.class默認是shuffle.class

shuffleConsumerPlugin = ReflectionUtils.newInstance(clazz, job);

//建立shuffle類對象

LOG.info("Using ShuffleConsumerPlugin: " + shuffleConsumerPlugin);

ShuffleConsumerPlugin.Context shuffleContext =

new ShuffleConsumerPlugin.Context(getTaskID(), job, FileSystem.getLocal(job), umbilical,

super.lDirAlloc, reporter, codec,

combinerClass, combineCollector,

spilledRecordsCounter, reduceCombineInputCounter,

shuffledMapsCounter,

reduceShuffleBytes, failedShuffleCounter,

mergedMapOutputsCounter,

taskStatus, copyPhase, sortPhase, this,

mapOutputFile, localMapFiles);

//建立context對象，ShuffleConsumerPlugin.Context

shuffleConsumerPlugin.init(shuffleContext);

//這裏調用的起始是shuffle的init函數，重點摘要以下。

this.localMapFiles = context.getLocalMapFiles();

scheduler = new ShuffleSchedulerImpl(jobConf, taskStatus, reduceId,

this, copyPhase, context.getShuffledMapsCounter(),

context.getReduceShuffleBytes(), context.getFailedShuffleCounter());

//建立shuffle所需的調度器

merger = createMergeManager(context);

//建立shuffle內部的merge，createMergeManager裏面源碼：

return new MergeManagerImpl(reduceId, jobConf, context.getLocalFS(),

context.getLocalDirAllocator(), reporter, context.getCodec(),

context.getCombinerClass(), context.getCombineCollector(),

context.getSpilledRecordsCounter(),

context.getReduceCombineInputCounter(),

context.getMergedMapOutputsCounter(), this, context.getMergePhase(),

context.getMapOutputFile());

//建立MergeMnagerImpl對象和Merge線程

rIter = shuffleConsumerPlugin.run();

//從各個Mapper複製其輸出文件，並加以合併排序，等待直到完成爲止

// free up the data structures

mapOutputFilesOnDisk.clear();

sortPhase.complete();

//排序階段完成

setPhase(TaskStatus.Phase.REDUCE);

//進入reduce階段

statusUpdate(umbilical);

Class keyClass = job.getMapOutputKeyClass();

Class valueClass = job.getMapOutputValueClass();

RawComparator comparator = job.getOutputValueGroupingComparator();

//3.Reduce 1.Reduce任務的最後一個階段。它會準備好Map的 keyClass（"mapred.output.key.class""mapred.mapoutput.key.class"）,valueClass("mapred.mapoutput.value.class"或"mapred.output.value.class")和 Comparator （「mapred.output.value.groupfn.class」或「mapred.output.key.comparator.class」）

if (useNewApi) {

//2.根據參數useNewAPI判斷執行runNewReduce仍是runOldReduce。分析潤runNewReduce

runNewReducer(job, umbilical, reporter, rIter, comparator,

keyClass, valueClass);

//0.像報告進程書寫一些信息，1.得到一個TaskAttemptContext對象。經過這個對象建立reduce、output及用於跟蹤的統計output的RecordWrit、最後建立用於收集reduce結果的Context，2.reducer.run(reducerContext)開始執行reduce

} else {//老API

runOldReducer(job, umbilical, reporter, rIter, comparator,

keyClass, valueClass);

}

shuffleConsumerPlugin.close();

done(umbilical, reporter);

}

(1)reduce分爲三個階段(copy就是遠程拷貝Map的輸出數據、sort就是對全部的數據作排序、reduce作彙集就是咱們本身寫的reducer)，爲這三個階段分別設置Progress，用來和TaskTracker通訊報道狀態。

(2)上面代碼的15-40行和MapReduce的MapTask任務的運行源碼級分析中對應部分基本相同，可參考之；

(3)codec = initCodec()這句是檢查map的輸出是不是壓縮的，壓縮的則返回壓縮codec實例，不然返回null，這裏討論不壓縮的；

(4)咱們討論徹底分佈式的hadoop，即isLocal==false，而後構造一個ReduceCopier對象reduceCopier，並調用reduceCopier.fetchOutputs()方法拷貝各個Mapper的輸出，到本地；

(5)而後copy階段完成，設置接下來的階段是sort階段，更新狀態信息；

(6)根據isLocal來選擇KV迭代器，徹底分佈式的會使用reduceCopier.createKVIterator(job, rfs, reporter)做爲KV迭代器；

(7)sort階段完成，設置接下來的階段是reduce階段，更新狀態信息；

(8)而後獲取一些配置信息，並根據是否使用新API選擇不一樣的處理方式，這裏是新的API，調用runNewReducer(job, umbilical, reporter, rIter, comparator, keyClass, valueClass)會執行reducer；

(9)done(umbilical, reporter)這個方法用於作結束任務的一些清理工做：更新計數器updateCounters()；若是任務須要提交，設置Taks狀態爲COMMIT_PENDING，並利用TaskUmbilicalProtocol，彙報Task完成，等待提交，而後調用commit提交任務；設置任務結束標誌位；結束Reporter通訊線程；發送最後一次統計報告(經過sendLastUpdate方法)；利用TaskUmbilicalProtocol報告結束狀態（經過sendDone方法)。

有些人將Reduce Task分爲了5個階段：1、shuffle階段：也稱爲Copy階段，就是從各個MapTask上遠程拷貝一片數據，若是大小超過必定閾值就寫到磁盤，不然放入內存；2、Merge階段：在遠程拷貝數據的同時，Reduce Task啓動了兩個後臺線程對內存和磁盤上的文件進行合併，防止內存使用過多和磁盤文件過多；3、sort階段：用戶編寫的reduce方法的輸入數據是按key進行彙集的，須要對copy過來的數據排序，這裏用的是歸併排序，由於Map Task的結果是有序的；4、Reduce階段：將每組數據依次交給用戶編寫的Reduce方法處理；5、write階段：就是將結果寫入HDFS。

上面的5個階段分的比較細了，代碼裏分爲3個階段copy、sort、reduce，咱們在eclipse運行MR程序時，控制檯看到的reduce階段的百分比就分爲3個階段各佔33.3%。

這裏的shuffleConsumerPlugin是實現了ShuffleConsumerPlugin的某個類對象。具體能夠經過配置文件mapreduce.job.reduce.shuffle.consumer.plugin.class選項設置，默認狀況下是使用shuffle。咱們在代碼中分析過完成shuffleConsumerPlugin.run，一般是shuffle.run，由於有了這個過程Mapper的合成的spill文件才能經過HTTP協議傳輸到Reducer端。有了數據才能進行runNewReducer或者runOldReducer。能夠說shuffle對象就是MapTask的搬運工。並且shuffle的搬運方式不是一遍搬運一遍Reducer處理，而是要把MapTask全部的數據都搬運過來，而且進行合併排序以後纔開始提供給對應的Reducer。

通常而言，MapTask和ReduceTask是多對多的關係，假若有M個Mapper有N個Reducer。咱們知道N個Reducer對應着N個partition，因此每一個Mapper都會被劃分紅N個Partition，每一個Reducer承擔着一個Partition部分的操做。這樣每個Reducer從每一個不一樣的Mapper內拿來屬於本身的那部分數據，這樣每一個Reducer就有M份不一樣Mapper的數據，把M份數據合併在一塊兒就是一個最終完整的Partition，有必要還會進行排序，這時候才成爲了Reducer的具體輸入數據。這個數據搬運和重組的過程被叫作shuffle過程。shuffle這個過程開銷頗大，會佔用較大的網絡流量，由於涉及到大量數據的傳輸，shuffle過程也會有延遲，由於M個Mapper的計算有快有慢，可是shuffle要全部的Mapper完成才能開始，Reduce又必須等shuffle完成才能開始，固然這種延遲不是shuffle形成的，若是Reducer不須要所有Partition數據到位並排序，就不用與最慢的Mapper同步，這是排序付出的代價。

因此shuffle在MapReduce框架中起着很是重要的做用。咱們先看shuffle的摘要：獲取更多大數據視頻資料請加QQ羣：947967114

public class Shuffle implements ShuffleConsumerPlugin, ExceptionReporter

private ShuffleConsumerPlugin.Context context;

private TaskAttemptID reduceId;

private JobConf jobConf;

private TaskUmbilicalProtocol umbilical;

private ShuffleSchedulerImpl scheduler;

private MergeManager merger;

private Task reduceTask; //Used for status updates

private Map localMapFiles;

public void init(ShuffleConsumerPlugin.Context context)

public RawKeyValueIterator run() throws IOException, InterruptedException

在ReduceTask.run中看到調用了shuffle.init，在run理建立了ShuffleSchedulerImpl和MergeManagerImpl對象。後面會講解就是是作什麼用的。

以後就是對shuffle.run的調用，shuffle雖然有一個run可是並不是是一個線程，只是用了這個名字而已。

咱們看：ReduceTask.run->Shuffle.run

public RawKeyValueIterator run() throws IOException, InterruptedException {

int eventsPerReducer = Math.max(MIN_EVENTS_TO_FETCH,

MAX_RPC_OUTSTANDING_EVENTS / jobConf.getNumReduceTasks());

int maxEventsToFetch = Math.min(MAX_EVENTS_TO_FETCH, eventsPerReducer);

// Start the map-completion events fetcher thread

final EventFetcher eventFetcher =

new EventFetcher(reduceId, umbilical, scheduler, this,

maxEventsToFetch);

eventFetcher.start();

//經過查看EventFetcher咱們看到他繼承了Thread，因此他是一個線程

// Start the map-output fetcher threads

boolean isLocal = localMapFiles != null;

final int numFetchers = isLocal ? 1 :

jobConf.getInt(MRJobConfig.SHUFFLE_PARALLEL_COPIES, 5);

Fetcher[] fetchers = new Fetcher[numFetchers];

//建立了一個線程池

if (isLocal) {

//若是Mapper和Reducer在同一臺機器上，就在本地fetche

fetchers[0] = new LocalFetcher(jobConf, reduceId, scheduler,

merger, reporter, metrics, this, reduceTask.getShuffleSecret(),

localMapFiles);

//LocalFetcher是對Fetcher的擴展，也是線程。

fetchers[0].start();//本地Fecher只有一個

} else {

//Mapper集合Reducer不在同一個機器上，須要跨多個節點Fecher

for (int i=0; i < numFetchers; ++i) {

//啓動全部的Fecher

fetchers[i] = new Fetcher(jobConf, reduceId, scheduler, merger,

reporter, metrics, this,

reduceTask.getShuffleSecret());

//建立Fecher線程

fetchers[i].start();

//跨節點的Fecher須要好多個，都須要開啓

}

// Wait for shuffle to complete successfully

while (!scheduler.waitUntilDone(PROGRESS_FREQUENCY)) {

reporter.progress();

//等待全部的Fecher都完成，若是有超時狀況就報告進度

synchronized (this) {

if (throwable != null) {

throw new ShuffleError("error in shuffle in " + throwingThreadName,

throwable);

}

// Stop the event-fetcher thread

eventFetcher.shutDown();

//關閉eventFetcher，表明shuffle操做完成，全部的MapTask的數據都拷貝過來了

// Stop the map-output fetcher threads

for (Fetcher fetcher : fetchers) {

fetcher.shutDown();//關閉全部的fetcher。

}

// stop the scheduler

scheduler.close();

//也不須要shuffle的調度，因此關閉

copyPhase.complete(); // copy is already complete

//文件複製階段結束

如下就是Reduce階段的MergeSort了

taskStatus.setPhase(TaskStatus.Phase.SORT);

//完成排序

reduceTask.statusUpdate(umbilical);

//經過umbilical向MRAppMaster彙報，更新狀態

// Finish the on-going merges...

RawKeyValueIterator kvIter = null;

try {

kvIter = merger.close();

//合併和排序，完成後返回一個隊列kvIter 。

} catch (Throwable e) {

throw new ShuffleError("Error while doing final merge " , e);

}

// Sanity check

synchronized (this) {

if (throwable != null) {

throw new ShuffleError("error in shuffle in " + throwingThreadName,

throwable);

}

return kvIter;

}

數據從MapTask轉移到ReduceTask就兩種方式，一MapTask送，二ReduceTask取，hadoop採用的是第二種方式，就是文件的複製。在Shuffle進入run以前，RduceTask.run調用過他的init函數shuffleConsumerPlugin.init(shuffleContext)，在init裏建立了scheduler和用於合併排序的merge，進入run後又建立了EventFetcher線程和若干個Fetcher線程。Fetcher的做用就是拿取，向MapTask節點提取數據。可是咱們要清楚EventFetcher雖然也是Fetcher，可是提取的是event，不是數據自己。咱們能夠認爲它只是對Fetcher過程的一個事件的控制。

Fetcher線程的數量也不必定，Uber模式下，MapTask和ReduceTask在同一個節點上，而且只有一個MapTask，因此只有一個Fetcher就可以完成，並且這個Fetcher是localFetcher。若是不是Uber模式可能會有不少MapTask而且通常和ReduceTask不在同一個節點。這時Fetcher的數量能夠進行配置，默認有5個。數組fetchers就至關於Fetcher的線程池。

建立了EventFetcher和Fetcher線程池後，進入了while循環，可是while循環什麼都不作，一直等待，因此實際的操做都是在線程完成的，也就是經過EventFetcher和若干的Fetcher完成。EventFetcher起到了很是關鍵的樞紐的做用。

咱們查看EventFetcher的源代碼摘要，咱們提取關鍵的東西：

class EventFetcher extends Thread {

private final TaskAttemptID reduce;

private final TaskUmbilicalProtocol umbilical;

private final ShuffleScheduler scheduler;

private final int maxEventsToFetch;

public void run() {

int failures = 0;

LOG.info(reduce + " Thread started: " + getName());

try {

while (!stopped && !Thread.currentThread().isInterrupted()) {//線程沒有被打斷

try {

int numNewMaps = getMapCompletionEvents();

//獲取Map的完成的事件，接着咱們看getMapCompletionEvents源代碼：

protected int getMapCompletionEvents()

throws IOException, InterruptedException {

int numNewMaps = 0;

TaskCompletionEvent events[] = null;

do {

MapTaskCompletionEventsUpdate update =

umbilical.getMapCompletionEvents(

(org.apache.hadoop.mapred.JobID)reduce.getJobID(),

fromEventIdx,

maxEventsToFetch,

(org.apache.hadoop.mapred.TaskAttemptID)reduce);

//彙報umbilical從MRAppMaster獲取Map完成的時間的報告

events = update.getMapTaskCompletionEvents();

//獲取有關具體的MapTask結束運行的狀況

LOG.debug("Got " + events.length + " map completion events from " +

fromEventIdx);

assert !update.shouldReset() : "Unexpected legacy state";

//作了一個斷言獲取更多大數據視頻資料請加QQ羣：947967114

// Update the last seen event ID

fromEventIdx += events.length;

// Process the TaskCompletionEvents:

// 1. Save the SUCCEEDED maps in knownOutputs to fetch the outputs.

// 2. Save the OBSOLETE/FAILED/KILLED maps in obsoleteOutputs to stop

// fetching from those maps.

// 3. Remove TIPFAILED maps from neededOutputs since we don't need their

// outputs at all.

for (TaskCompletionEvent event : events) {

//對於獲取的每一個事件的報告

scheduler.resolve(event);

//這裏使用了ShuffleSchedullerImpl.resolve函數，源代碼以下：

public void resolve(TaskCompletionEvent event) {

switch (event.getTaskStatus()) {

case SUCCEEDED://若是成功

URI u = getBaseURI(reduceId, event.getTaskTrackerHttp());//獲取其URI

addKnownMapOutput(u.getHost() + ":" + u.getPort(),

u.toString(),

event.getTaskAttemptId());

//記錄這個MapTask的節點主機記錄下來，供Fetcher使用，getBaseURI的源代碼：

static URI getBaseURI(TaskAttemptID reduceId, String url) {

StringBuffer baseUrl = new StringBuffer(url);

if (!url.endsWith("/")) {

baseUrl.append("/");

}

baseUrl.append("mapOutput?job=");

baseUrl.append(reduceId.getJobID());

baseUrl.append("&reduce=");

baseUrl.append(reduceId.getTaskID().getId());

baseUrl.append("&map=");

URI u = URI.create(baseUrl.toString());

return u;

獲取各類信息，而後添加都URI對象中。

}

回到源代碼

maxMapRuntime = Math.max(maxMapRuntime, event.getTaskRunTime());

//最大的嘗試時間

break;

case FAILED:

case KILLED:

case OBSOLETE://若是MapTask運行失敗

obsoleteMapOutput(event.getTaskAttemptId());//獲取TaskId

LOG.info("Ignoring obsolete output of " + event.getTaskStatus() +

" map-task: '" + event.getTaskAttemptId() + "'");//寫日誌

break;

case TIPFAILED://若是失敗

tipFailed(event.getTaskAttemptId().getTaskID());

LOG.info("Ignoring output of failed map TIP: '" +

event.getTaskAttemptId() + "'");//寫日誌

break;

}

回到源代碼

if (TaskCompletionEvent.Status.SUCCEEDED == event.getTaskStatus()) {//若是事件成功

++numNewMaps;//增長map數量

}

} while (events.length == maxEventsToFetch);

return numNewMaps;

}

回到源代碼

failures = 0;

if (numNewMaps > 0) {

LOG.info(reduce + ": " + "Got " + numNewMaps + " new map-outputs");

}

LOG.debug("GetMapEventsThread about to sleep for " + SLEEP_TIME);

if (!Thread.currentThread().isInterrupted()) {

Thread.sleep(SLEEP_TIME);

}

} catch (InterruptedException e) {

LOG.info("EventFetcher is interrupted.. Returning");

return;

} catch (IOException ie) {

LOG.info("Exception in getting events", ie);

// check to see whether to abort

if (++failures >= MAX_RETRIES) {

throw new IOException("too many failures downloading events", ie);//失敗數量大於重試的數量

}

// sleep for a bit

if (!Thread.currentThread().isInterrupted()) {

Thread.sleep(RETRY_PERIOD);

}

} catch (InterruptedException e) {

return;

} catch (Throwable t) {

exceptionReporter.reportException(t);

return;

}

MapTask和ReduceTask沒有直接的關係，MapTask不知道ReduceTask在哪些節點上，它只是把進度的時間報告給MRAppMaster。ReduceTask經過「臍帶」執行getMapCompletionEvents操做想MRAppMaster獲取MapTask結束運行的時間報告。有個別的MapTask可能會失敗，可是絕大多數都會成功，只要成功的就經過Fetcher去索取輸出數據，這個信息就是經過shcheduler完成的也就是ShuffleSchedulerImpl對象，ShuffleSchedulerImpl對象並很少，只是個普通的對象。

fetchers就像線程池，裏面有若干線程（默認有5個），這些線程等待EventFetcher的通知，一旦有MapTask完成就前往提取數據。

獲取更多大數據視頻資料請加QQ羣：947967114

咱們看Fetcher線程類的run方法：

public void run() {

try {

while (!stopped && !Thread.currentThread().isInterrupted()) {

MapHost host = null;

try {

// If merge is on, block

merger.waitForResource();

// Get a host to shuffle from

host = scheduler.getHost();

//從scheduler獲取一個已經成功完成的MapTask的節點。

metrics.threadBusy();

//線程變成繁忙狀態

// Shuffle

copyFromHost(host);

//開始複製這個節點的數據

} finally {

if (host != null) {//maphost還有運行中的

scheduler.freeHost(host);

//狀態設置成空閒狀態，等待其完成。

metrics.threadFree();

}

} catch (InterruptedException ie) {

return;

} catch (Throwable t) {

exceptionReporter.reportException(t);

}

這裏的重點是copyFromHost獲取數據的函數。

protected void copyFromHost(MapHost host) throws IOException {

// reset retryStartTime for a new host

//這是在ReduceTask的節點上運行的

retryStartTime = 0;

// Get completed maps on 'host'

List<TaskAttemptID> maps = scheduler.getMapsForHost(host);

//獲取目標節點上的MapTask集合。

// Sanity check to catch hosts with only 'OBSOLETE' maps,

// especially at the tail of large jobs

if (maps.size() == 0) {

return;//沒有完成的直接返回

}

if(LOG.isDebugEnabled()) {

LOG.debug("Fetcher " + id + " going to fetch from " + host + " for: "

+ maps);

}

// List of maps to be fetched yet

Set remaining = new HashSet(maps);

//已經完成、等待shuffle的MapTask集合。

// Construct the url and connect

DataInputStream input = null;

URL url = getMapOutputURL(host, maps);

//生成MapTask所在節點的URL，下面要看getMapOutputURL源碼：

private URL getMapOutputURL(MapHost host, Collection maps

) throws MalformedURLException {

// Get the base url

StringBuffer url = new StringBuffer(host.getBaseUrl());

boolean first = true;

for (TaskAttemptID mapId : maps) {

if (!first) {

url.append(",");

}

url.append(mapId);//在URL後面加上mapid

first = false;

}

LOG.debug("MapOutput URL for " + host + " -> " + url.toString());

//寫日誌

return new URL(url.toString());

//返回URL

}

回到主代碼：

try {

setupConnectionsWithRetry(host, remaining, url);

//和對方主機創建HTTP鏈接，setupConnectionsWithRetry使用了openConnectionWithRetry函數打開連接。

openConnectionWithRetry(host, remaining, url);

這段源代碼有使用了openConnection(url);方式，繼續查看。

以下是連接的主要過程：

protected synchronized void openConnection(URL url)

throws IOException {

HttpURLConnection conn = (HttpURLConnection) url.openConnection();

//使用的是HTTPURL進行鏈接

if (sslShuffle) {//若是是有信任證書的

HttpsURLConnection httpsConn = (HttpsURLConnection) conn;

//強轉conn類型

try {

httpsConn.setSSLSocketFactory(sslFactory.createSSLSocketFactory());//添加一個證書socket的工廠

} catch (GeneralSecurityException ex) {

throw new IOException(ex);

}

httpsConn.setHostnameVerifier(sslFactory.getHostnameVerifier());

}

connection = conn;

}

在setupConnectionsWithRetry中繼續寫到：

setupShuffleConnection(encHash);

//創建了Shuffle連接

connect(connection, connectionTimeout);

// verify that the thread wasn't stopped during calls to connect

if (stopped) {

return;

}

verifyConnection(url, msgToEncode, encHash);

}

//至此鏈接經過。

if (stopped) {

abortConnect(host, remaining);

//這裏邊是關閉鏈接，能夠點進去看一下，知足列表和等待的兩個條件

return;

}

} catch (IOException ie) {

boolean connectExcpt = ie instanceof ConnectException;

ioErrs.increment(1);

LOG.warn("Failed to connect to " + host + " with " + remaining.size() +

" map outputs", ie);

回到主代碼

input = new DataInputStream(connection.getInputStream());

//實例一個輸入流對象。

try {

// Loop through available map-outputs and fetch them

// On any error, faildTasks is not null and we exit

// after putting back the remaining maps to the

// yet_to_be_fetched list and marking the failed tasks.

TaskAttemptID[] failedTasks = null;

while (!remaining.isEmpty() && failedTasks == null) {

//若是須要fetcher的列表不空，而且失敗的task數量沒有

try {

failedTasks = copyMapOutput(host, input, remaining, fetchRetryEnabled);

//複製數據出來copyMapOutput的源代碼以下：

try {

ShuffleHeader header = new ShuffleHeader();

header.readFields(input);

mapId = TaskAttemptID.forName(header.mapId);

//獲取mapID

compressedLength = header.compressedLength;

decompressedLength = header.uncompressedLength;

forReduce = header.forReduce;

} catch (IllegalArgumentException e) {

badIdErrs.increment(1);

LOG.warn("Invalid map id ", e);

//Don't know which one was bad, so consider all of them as bad

return remaining.toArray(new TaskAttemptID[remaining.size()]);

}

InputStream is = input;

is = CryptoUtils.wrapIfNecessary(jobConf, is, compressedLength);

compressedLength -= CryptoUtils.cryptoPadding(jobConf);

decompressedLength -= CryptoUtils.cryptoPadding(jobConf);

//若是須要解壓或解密

// Do some basic sanity verification

if (!verifySanity(compressedLength, decompressedLength, forReduce,

remaining, mapId)) {

return new TaskAttemptID[] {mapId};

}

if(LOG.isDebugEnabled()) {

LOG.debug("header: " + mapId + ", len: " + compressedLength +

", decomp len: " + decompressedLength);

}

try {

mapOutput = merger.reserve(mapId, decompressedLength, id);

//爲merge預留一個MapOutput：是內存仍是磁盤上。

} catch (IOException ioe) {

// kill this reduce attempt

ioErrs.increment(1);

scheduler.reportLocalError(ioe);

//報告錯誤

return EMPTY_ATTEMPT_ID_ARRAY;

}

// Check if we can shuffle *now* ...

if (mapOutput == null) {

LOG.info("fetcher#" + id + " - MergeManager returned status WAIT ...");

//Not an error but wait to process data.

return EMPTY_ATTEMPT_ID_ARRAY;

}

// The codec for lz0,lz4,snappy,bz2,etc. throw java.lang.InternalError

// on decompression failures. Catching and re-throwing as IOException

// to allow fetch failure logic to be processed

try {

// Go!

LOG.info("fetcher#" + id + " about to shuffle output of map "

+ mapOutput.getMapId() + " decomp: " + decompressedLength

+ " len: " + compressedLength + " to " + mapOutput.getDescription());

mapOutput.shuffle(host, is, compressedLength, decompressedLength,

metrics, reporter);

//跨節點把Mapper的文件內容拷貝到reduce的內存或者磁盤上。

} catch (java.lang.InternalError e) {

LOG.warn("Failed to shuffle for fetcher#"+id, e);

throw new IOException(e);

}

// Inform the shuffle scheduler

long endTime = Time.monotonicNow();

// Reset retryStartTime as map task make progress if retried before.

retryStartTime = 0;

scheduler.copySucceeded(mapId, host, compressedLength,

startTime, endTime, mapOutput);

//告訴調度器完成了一個節點的Map輸出的文件拷貝。

remaining.remove(mapId);

//這個MapTask的輸出已經shuffle完畢

metrics.successFetch();

return null;後面的異常失敗信息咱們無論。

這裏的mapOutput是用來容納MapTask輸出文件的存儲空間，根據輸出文件的內容大小和內存的狀況，能夠是內存的Output也能夠是DiskOutput。若是是內存須要預定，由於不止一個Fetcher。咱們以InMemoryMapOutput爲例。

代碼結構;

Fetcher.run-->copyFromHost-->copyMapOutput-->merger.reserve(MergeManagerImpl.reserve)-->InmemoryMapOutput.shuffle

public void shuffle(MapHost host, InputStream input,

long compressedLength, long decompressedLength,

ShuffleClientMetrics metrics,

Reporter reporter) throws IOException {

//跨節點從Mapper拷貝spill文件

IFileInputStream checksumIn =

new IFileInputStream(input, compressedLength, conf);

//校驗和的輸入流

input = checksumIn;

// Are map-outputs compressed?

if (codec != null) {

//若是涉及到了壓縮

decompressor.reset();

//重啓解壓器

input = codec.createInputStream(input, decompressor);

//加了解壓器的輸入流

}

try {

IOUtils.readFully(input, memory, 0, memory.length);

//從Mapper方把特定的Partition數據讀入Reducer的內存緩衝區。

metrics.inputBytes(memory.length);

reporter.progress();//彙報進度

LOG.info("Read " + memory.length + " bytes from map-output for " +

getMapId());

/**

* We've gotten the amount of data we were expecting. Verify the

* decompressor has nothing more to offer. This action also forces the

* decompressor to read any trailing bytes that weren't critical

* for decompression, which is necessary to keep the stream

* in sync.

if (input.read() >= 0 ) {

throw new IOException("Unexpected extra bytes from input stream for " +

getMapId());

}

} catch (IOException ioe) {

// Close the streams

IOUtils.cleanup(LOG, input);

// Re-throw

throw ioe;

} finally {

CodecPool.returnDecompressor(decompressor);

//釋放解壓器

}

從對方把spill文件中屬於本partition數據複製過來，回到copyFromHost中，經過scheduler.copySuccessed告知scheduler，並把這個MapTask的ID從remaining集合中刪除，進入下一個循環，複製下一個MapTask數據。直到把全部的屬於本Partition的數據都複製過來。

以上是Reducer端Fetcher的過程，它向Mapper端發送HTTP GET請求，下載文件。在MapTask就有一個與之對應的Server，這個網絡協議的源代碼不作深究，課下有興趣本身研究。獲取更多大數據視頻資料請加QQ羣：947967114