Mapreduce中因爲sort的存在,MapTask和ReduceTask直接是工做流的架構。而不是數據流的架構。在MapTask還沒有結束,其輸出結果還沒有排序及合併前,ReduceTask是又有數據輸入的,所以即便ReduceTask已經建立也只能睡眠等待MapTask完成。從而能夠從MapTask節點獲取數據。一個MapTask最終的數據輸出是一個合併的spill文件,能夠經過Web地址訪問。因此reduceTask通常在MapTask快要完成的時候才啓動。啓動早了浪費container資源。java
ReduceTask是個線程,這個線程運行在YarnChild的Java虛擬機上,咱們從ReduceTask.run開始看Reduce階段。 獲取更多大數據視頻資料請加QQ羣:947967114apache
public void run(JobConf job, final TaskUmbilicalProtocol umbilical)api
throws IOException, InterruptedException, ClassNotFoundException {數組
job.setBoolean(JobContext.SKIP_RECORDS, isSkipping());網絡
if (isMapOrReduce()) {架構
/*添加reduce過程須要通過的幾個階段。以便通知TaskTracker目前運 行的狀況*/app
copyPhase = getProgress().addPhase("copy");框架
sortPhase = getProgress().addPhase("sort");eclipse
reducePhase = getProgress().addPhase("reduce");socket
}
// start thread that will handle communication with parent
TaskReporter reporter = startReporter(umbilical);
// 設置並啓動reporter進程以便和TaskTracker進行交流
boolean useNewApi = job.getUseNewReducer();
//在job client中初始化job時,默認就是用新的API,詳見Job.setUseNewAPI()方法
initialize(job, getJobID(), reporter, useNewApi);
/*用來初始化任務,主要是進行一些和任務輸出相關的設置,好比建立commiter,設置工做目錄等*/
// check if it is a cleanupJobTask
/*如下4個if語句均是根據任務類型的不一樣進行相應的操做,這些方 法均是Task類的方法,因此與任務是MapTask仍是ReduceTask無關*/
if (jobCleanup) {
runJobCleanupTask(umbilical, reporter);
return;//只是爲了JobCleanup,作完就停
}
if () {
runJobSetupTask(umbilical, reporter);
return;
//主要是建立工做目錄的FileSystem對象
}
if (taskCleanup) {
runTaskCleanupTask(umbilical, reporter);
return;
//設置任務目前所處的階段爲結束階段,而且刪除工做目錄
}
下面纔是真正要成爲reducer
// Initialize the codec
codec = initCodec();
RawKeyValueIterator rIter = null;
ShuffleConsumerPlugin shuffleConsumerPlugin = null;
Class combinerClass = conf.getCombinerClass();
CombineOutputCollector combineCollector =
(null != combinerClass) ?
new CombineOutputCollector(reduceCombineOutputCounter, reporter, conf) : null;
//若是須要就建立combineCollector
Classextends ShuffleConsumerPlugin> clazz =
job.getClass(MRConfig.SHUFFLE_CONSUMER_PLUGIN, Shuffle.class, ShuffleConsumerPlugin.class);
//配置文件找mapreduce.job.reduce.shuffle.consumer.plugin.class默認是shuffle.class
shuffleConsumerPlugin = ReflectionUtils.newInstance(clazz, job);
//建立shuffle類對象
LOG.info("Using ShuffleConsumerPlugin: " + shuffleConsumerPlugin);
ShuffleConsumerPlugin.Context shuffleContext =
new ShuffleConsumerPlugin.Context(getTaskID(), job, FileSystem.getLocal(job), umbilical,
super.lDirAlloc, reporter, codec,
combinerClass, combineCollector,
spilledRecordsCounter, reduceCombineInputCounter,
shuffledMapsCounter,
reduceShuffleBytes, failedShuffleCounter,
mergedMapOutputsCounter,
taskStatus, copyPhase, sortPhase, this,
mapOutputFile, localMapFiles);
//建立context對象,ShuffleConsumerPlugin.Context
shuffleConsumerPlugin.init(shuffleContext);
//這裏調用的起始是shuffle的init函數,重點摘要以下。
this.localMapFiles = context.getLocalMapFiles();
scheduler = new ShuffleSchedulerImpl(jobConf, taskStatus, reduceId,
this, copyPhase, context.getShuffledMapsCounter(),
context.getReduceShuffleBytes(), context.getFailedShuffleCounter());
//建立shuffle所需的調度器
merger = createMergeManager(context);
//建立shuffle內部的merge,createMergeManager裏面源碼:
return new MergeManagerImpl(reduceId, jobConf, context.getLocalFS(),
context.getLocalDirAllocator(), reporter, context.getCodec(),
context.getCombinerClass(), context.getCombineCollector(),
context.getSpilledRecordsCounter(),
context.getReduceCombineInputCounter(),
context.getMergedMapOutputsCounter(), this, context.getMergePhase(),
context.getMapOutputFile());
//建立MergeMnagerImpl對象和Merge線程
rIter = shuffleConsumerPlugin.run();
//從各個Mapper複製其輸出文件,並加以合併排序,等待直到完成爲止
// free up the data structures
mapOutputFilesOnDisk.clear();
sortPhase.complete();
//排序階段完成
setPhase(TaskStatus.Phase.REDUCE);
//進入reduce階段
statusUpdate(umbilical);
Class keyClass = job.getMapOutputKeyClass();
Class valueClass = job.getMapOutputValueClass();
RawComparator comparator = job.getOutputValueGroupingComparator();
//3.Reduce 1.Reduce任務的最後一個階段。它會準備好Map的 keyClass("mapred.output.key.class""mapred.mapoutput.key.class"),valueClass("mapred.mapoutput.value.class"或"mapred.output.value.class")和 Comparator (「mapred.output.value.groupfn.class」或「mapred.output.key.comparator.class」)
if (useNewApi) {
//2.根據參數useNewAPI判斷執行runNewReduce仍是runOldReduce。分析潤runNewReduce
runNewReducer(job, umbilical, reporter, rIter, comparator,
keyClass, valueClass);
//0.像報告進程書寫一些信息,1.得到一個TaskAttemptContext對象。經過這個對象建立reduce、output及用於跟蹤的統計output的RecordWrit、最後建立用於收集reduce結果的Context,2.reducer.run(reducerContext)開始執行reduce
} else {//老API
runOldReducer(job, umbilical, reporter, rIter, comparator,
keyClass, valueClass);
}
shuffleConsumerPlugin.close();
done(umbilical, reporter);
}
(1)reduce分爲三個階段(copy就是遠程拷貝Map的輸出數據、sort就是對全部的數據作排序、reduce作彙集就是咱們本身寫的reducer),爲這三個階段分別設置Progress,用來和TaskTracker通訊報道狀態。
(2)上面代碼的15-40行和MapReduce的MapTask任務的運行源碼級分析中對應部分基本相同,可參考之;
(3)codec = initCodec()這句是檢查map的輸出是不是壓縮的,壓縮的則返回壓縮codec實例,不然返回null,這裏討論不壓縮的;
(4)咱們討論徹底分佈式的hadoop,即isLocal==false,而後構造一個ReduceCopier對象reduceCopier,並調用reduceCopier.fetchOutputs()方法拷貝各個Mapper的輸出,到本地;
(5)而後copy階段完成,設置接下來的階段是sort階段,更新狀態信息;
(6)根據isLocal來選擇KV迭代器,徹底分佈式的會使用reduceCopier.createKVIterator(job, rfs, reporter)做爲KV迭代器;
(7)sort階段完成,設置接下來的階段是reduce階段,更新狀態信息;
(8)而後獲取一些配置信息,並根據是否使用新API選擇不一樣的處理方式,這裏是新的API,調用runNewReducer(job, umbilical, reporter, rIter, comparator, keyClass, valueClass)會執行reducer;
(9)done(umbilical, reporter)這個方法用於作結束任務的一些清理工做:更新計數器updateCounters();若是任務須要提交,設置Taks狀態爲COMMIT_PENDING,並利用TaskUmbilicalProtocol,彙報Task完成,等待提交,而後調用commit提交任務;設置任務結束標誌位;結束Reporter通訊線程;發送最後一次統計報告(經過sendLastUpdate方法);利用TaskUmbilicalProtocol報告結束狀態(經過sendDone方法)。
有些人將Reduce Task分爲了5個階段:1、shuffle階段:也稱爲Copy階段,就是從各個MapTask上遠程拷貝一片數據,若是大小超過必定閾值就寫到磁盤,不然放入內存;2、Merge階段:在遠程拷貝數據的同時,Reduce Task啓動了兩個後臺線程對內存和磁盤上的文件進行合併,防止內存使用過多和磁盤文件過多;3、sort階段:用戶編寫的reduce方法的輸入數據是按key進行彙集的,須要對copy過來的數據排序,這裏用的是歸併排序,由於Map Task的結果是有序的;4、Reduce階段:將每組數據依次交給用戶編寫的Reduce方法處理;5、write階段:就是將結果寫入HDFS。
上面的5個階段分的比較細了,代碼裏分爲3個階段copy、sort、reduce,咱們在eclipse運行MR程序時,控制檯看到的reduce階段的百分比就分爲3個階段各佔33.3%。
這裏的shuffleConsumerPlugin是實現了ShuffleConsumerPlugin的某個類對象。具體能夠經過配置文件mapreduce.job.reduce.shuffle.consumer.plugin.class選項設置,默認狀況下是使用shuffle。咱們在代碼中分析過完成shuffleConsumerPlugin.run,一般是shuffle.run,由於有了這個過程Mapper的合成的spill文件才能經過HTTP協議傳輸到Reducer端。有了數據才能進行runNewReducer或者runOldReducer。能夠說shuffle對象就是MapTask的搬運工。並且shuffle的搬運方式不是一遍搬運一遍Reducer處理,而是要把MapTask全部的數據都搬運過來,而且進行合併排序以後纔開始提供給對應的Reducer。
通常而言,MapTask和ReduceTask是多對多的關係,假若有M個Mapper有N個Reducer。咱們知道N個Reducer對應着N個partition,因此每一個Mapper都會被劃分紅N個Partition,每一個Reducer承擔着一個Partition部分的操做。這樣每個Reducer從每一個不一樣的Mapper內拿來屬於本身的那部分數據,這樣每一個Reducer就有M份不一樣Mapper的數據,把M份數據合併在一塊兒就是一個最終完整的Partition,有必要還會進行排序,這時候才成爲了Reducer的具體輸入數據。這個數據搬運和重組的過程被叫作shuffle過程。shuffle這個過程開銷頗大,會佔用較大的網絡流量,由於涉及到大量數據的傳輸,shuffle過程也會有延遲,由於M個Mapper的計算有快有慢,可是shuffle要全部的Mapper完成才能開始,Reduce又必須等shuffle完成才能開始,固然這種延遲不是shuffle形成的,若是Reducer不須要所有Partition數據到位並排序,就不用與最慢的Mapper同步,這是排序付出的代價。
因此shuffle在MapReduce框架中起着很是重要的做用。咱們先看shuffle的摘要: 獲取更多大數據視頻資料請加QQ羣:947967114
public class Shuffle implements ShuffleConsumerPlugin, ExceptionReporter
private ShuffleConsumerPlugin.Context context;
private TaskAttemptID reduceId;
private JobConf jobConf;
private TaskUmbilicalProtocol umbilical;
private ShuffleSchedulerImpl scheduler;
private MergeManager merger;
private Task reduceTask; //Used for status updates
private Map localMapFiles;
public void init(ShuffleConsumerPlugin.Context context)
public RawKeyValueIterator run() throws IOException, InterruptedException
在ReduceTask.run中看到調用了shuffle.init,在run理建立了ShuffleSchedulerImpl和MergeManagerImpl對象。後面會講解就是是作什麼用的。
以後就是對shuffle.run的調用,shuffle雖然有一個run可是並不是是一個線程,只是用了這個名字而已。
咱們看:ReduceTask.run->Shuffle.run
public RawKeyValueIterator run() throws IOException, InterruptedException {
int eventsPerReducer = Math.max(MIN_EVENTS_TO_FETCH,
MAX_RPC_OUTSTANDING_EVENTS / jobConf.getNumReduceTasks());
int maxEventsToFetch = Math.min(MAX_EVENTS_TO_FETCH, eventsPerReducer);
// Start the map-completion events fetcher thread
final EventFetcher eventFetcher =
new EventFetcher(reduceId, umbilical, scheduler, this,
maxEventsToFetch);
eventFetcher.start();
//經過查看EventFetcher咱們看到他繼承了Thread,因此他是一個線程
// Start the map-output fetcher threads
boolean isLocal = localMapFiles != null;
final int numFetchers = isLocal ? 1 :
jobConf.getInt(MRJobConfig.SHUFFLE_PARALLEL_COPIES, 5);
Fetcher[] fetchers = new Fetcher[numFetchers];
//建立了一個線程池
if (isLocal) {
//若是Mapper和Reducer在同一臺機器上,就在本地fetche
fetchers[0] = new LocalFetcher(jobConf, reduceId, scheduler,
merger, reporter, metrics, this, reduceTask.getShuffleSecret(),
localMapFiles);
//LocalFetcher是對Fetcher的擴展,也是線程。
fetchers[0].start();//本地Fecher只有一個
} else {
//Mapper集合Reducer不在同一個機器上,須要跨多個節點Fecher
for (int i=0; i < numFetchers; ++i) {
//啓動全部的Fecher
fetchers[i] = new Fetcher(jobConf, reduceId, scheduler, merger,
reporter, metrics, this,
reduceTask.getShuffleSecret());
//建立Fecher線程
fetchers[i].start();
//跨節點的Fecher須要好多個,都須要開啓
}
}
// Wait for shuffle to complete successfully
while (!scheduler.waitUntilDone(PROGRESS_FREQUENCY)) {
reporter.progress();
//等待全部的Fecher都完成,若是有超時狀況就報告進度
synchronized (this) {
if (throwable != null) {
throw new ShuffleError("error in shuffle in " + throwingThreadName,
throwable);
}
}
}
// Stop the event-fetcher thread
eventFetcher.shutDown();
//關閉eventFetcher,表明shuffle操做完成,全部的MapTask的數據都拷貝過來了
// Stop the map-output fetcher threads
for (Fetcher fetcher : fetchers) {
fetcher.shutDown();//關閉全部的fetcher。
}
// stop the scheduler
scheduler.close();
//也不須要shuffle的調度,因此關閉
copyPhase.complete(); // copy is already complete
//文件複製階段結束
如下就是Reduce階段的MergeSort了
taskStatus.setPhase(TaskStatus.Phase.SORT);
//完成排序
reduceTask.statusUpdate(umbilical);
//經過umbilical向MRAppMaster彙報,更新狀態
// Finish the on-going merges...
RawKeyValueIterator kvIter = null;
try {
kvIter = merger.close();
//合併和排序,完成後返回一個隊列kvIter 。
} catch (Throwable e) {
throw new ShuffleError("Error while doing final merge " , e);
}
// Sanity check
synchronized (this) {
if (throwable != null) {
throw new ShuffleError("error in shuffle in " + throwingThreadName,
throwable);
}
}
return kvIter;
}
數據從MapTask轉移到ReduceTask就兩種方式,一MapTask送,二ReduceTask取,hadoop採用的是第二種方式,就是文件的複製。在Shuffle進入run以前,RduceTask.run調用過他的init函數shuffleConsumerPlugin.init(shuffleContext),在init裏建立了scheduler和用於合併排序的merge,進入run後又建立了EventFetcher線程和若干個Fetcher線程。Fetcher的做用就是拿取,向MapTask節點提取數據。可是咱們要清楚EventFetcher雖然也是Fetcher,可是提取的是event,不是數據自己。咱們能夠認爲它只是對Fetcher過程的一個事件的控制。
Fetcher線程的數量也不必定,Uber模式下,MapTask和ReduceTask在同一個節點上,而且只有一個MapTask,因此只有一個Fetcher就可以完成,並且這個Fetcher是localFetcher。若是不是Uber模式可能會有不少MapTask而且通常和ReduceTask不在同一個節點。這時Fetcher的數量能夠進行配置,默認有5個。數組fetchers就至關於Fetcher的線程池。
建立了EventFetcher和Fetcher線程池後,進入了while循環,可是while循環什麼都不作,一直等待,因此實際的操做都是在線程完成的,也就是經過EventFetcher和若干的Fetcher完成。EventFetcher起到了很是關鍵的樞紐的做用。
咱們查看EventFetcher的源代碼摘要,咱們提取關鍵的東西:
class EventFetcher extends Thread {
private final TaskAttemptID reduce;
private final TaskUmbilicalProtocol umbilical;
private final ShuffleScheduler scheduler;
private final int maxEventsToFetch;
public void run() {
int failures = 0;
LOG.info(reduce + " Thread started: " + getName());
try {
while (!stopped && !Thread.currentThread().isInterrupted()) {//線程沒有被打斷
try {
int numNewMaps = getMapCompletionEvents();
//獲取Map的完成的事件,接着咱們看getMapCompletionEvents源代碼:
protected int getMapCompletionEvents()
throws IOException, InterruptedException {
int numNewMaps = 0;
TaskCompletionEvent events[] = null;
do {
MapTaskCompletionEventsUpdate update =
umbilical.getMapCompletionEvents(
(org.apache.hadoop.mapred.JobID)reduce.getJobID(),
fromEventIdx,
maxEventsToFetch,
(org.apache.hadoop.mapred.TaskAttemptID)reduce);
//彙報umbilical從MRAppMaster獲取Map完成的時間的報告
events = update.getMapTaskCompletionEvents();
//獲取有關具體的MapTask結束運行的狀況
LOG.debug("Got " + events.length + " map completion events from " +
fromEventIdx);
assert !update.shouldReset() : "Unexpected legacy state";
//作了一個斷言 獲取更多大數據視頻資料請加QQ羣:947967114
// Update the last seen event ID
fromEventIdx += events.length;
// Process the TaskCompletionEvents:
// 1. Save the SUCCEEDED maps in knownOutputs to fetch the outputs.
// 2. Save the OBSOLETE/FAILED/KILLED maps in obsoleteOutputs to stop
// fetching from those maps.
// 3. Remove TIPFAILED maps from neededOutputs since we don't need their
// outputs at all.
for (TaskCompletionEvent event : events) {
//對於獲取的每一個事件的報告
scheduler.resolve(event);
//這裏使用了ShuffleSchedullerImpl.resolve函數,源代碼以下:
public void resolve(TaskCompletionEvent event) {
switch (event.getTaskStatus()) {
case SUCCEEDED://若是成功
URI u = getBaseURI(reduceId, event.getTaskTrackerHttp());//獲取其URI
addKnownMapOutput(u.getHost() + ":" + u.getPort(),
u.toString(),
event.getTaskAttemptId());
//記錄這個MapTask的節點主機記錄下來,供Fetcher使用,getBaseURI的源代碼:
static URI getBaseURI(TaskAttemptID reduceId, String url) {
StringBuffer baseUrl = new StringBuffer(url);
if (!url.endsWith("/")) {
baseUrl.append("/");
}
baseUrl.append("mapOutput?job=");
baseUrl.append(reduceId.getJobID());
baseUrl.append("&reduce=");
baseUrl.append(reduceId.getTaskID().getId());
baseUrl.append("&map=");
URI u = URI.create(baseUrl.toString());
return u;
獲取各類信息,而後添加都URI對象中。
}
回到源代碼
maxMapRuntime = Math.max(maxMapRuntime, event.getTaskRunTime());
//最大的嘗試時間
break;
case FAILED:
case KILLED:
case OBSOLETE://若是MapTask運行失敗
obsoleteMapOutput(event.getTaskAttemptId());//獲取TaskId
LOG.info("Ignoring obsolete output of " + event.getTaskStatus() +
" map-task: '" + event.getTaskAttemptId() + "'");//寫日誌
break;
case TIPFAILED://若是失敗
tipFailed(event.getTaskAttemptId().getTaskID());
LOG.info("Ignoring output of failed map TIP: '" +
event.getTaskAttemptId() + "'");//寫日誌
break;
}
}
回到源代碼
if (TaskCompletionEvent.Status.SUCCEEDED == event.getTaskStatus()) {//若是事件成功
++numNewMaps;//增長map數量
}
}
} while (events.length == maxEventsToFetch);
return numNewMaps;
}
回到源代碼
failures = 0;
if (numNewMaps > 0) {
LOG.info(reduce + ": " + "Got " + numNewMaps + " new map-outputs");
}
LOG.debug("GetMapEventsThread about to sleep for " + SLEEP_TIME);
if (!Thread.currentThread().isInterrupted()) {
Thread.sleep(SLEEP_TIME);
}
} catch (InterruptedException e) {
LOG.info("EventFetcher is interrupted.. Returning");
return;
} catch (IOException ie) {
LOG.info("Exception in getting events", ie);
// check to see whether to abort
if (++failures >= MAX_RETRIES) {
throw new IOException("too many failures downloading events", ie);//失敗數量大於重試的數量
}
// sleep for a bit
if (!Thread.currentThread().isInterrupted()) {
Thread.sleep(RETRY_PERIOD);
}
}
}
} catch (InterruptedException e) {
return;
} catch (Throwable t) {
exceptionReporter.reportException(t);
return;
}
}
MapTask和ReduceTask沒有直接的關係,MapTask不知道ReduceTask在哪些節點上,它只是把進度的時間報告給MRAppMaster。ReduceTask經過「臍帶」執行getMapCompletionEvents操做想MRAppMaster獲取MapTask結束運行的時間報告。有個別的MapTask可能會失敗,可是絕大多數都會成功,只要成功的就經過Fetcher去索取輸出數據,這個信息就是經過shcheduler完成的也就是ShuffleSchedulerImpl對象,ShuffleSchedulerImpl對象並很少,只是個普通的對象。
fetchers就像線程池,裏面有若干線程(默認有5個),這些線程等待EventFetcher的通知,一旦有MapTask完成就前往提取數據。
獲取更多大數據視頻資料請加QQ羣:947967114
咱們看Fetcher線程類的run方法:
public void run() {
try {
while (!stopped && !Thread.currentThread().isInterrupted()) {
MapHost host = null;
try {
// If merge is on, block
merger.waitForResource();
// Get a host to shuffle from
host = scheduler.getHost();
//從scheduler獲取一個已經成功完成的MapTask的節點。
metrics.threadBusy();
//線程變成繁忙狀態
// Shuffle
copyFromHost(host);
//開始複製這個節點的數據
} finally {
if (host != null) {//maphost還有運行中的
scheduler.freeHost(host);
//狀態設置成空閒狀態,等待其完成。
metrics.threadFree();
}
}
}
} catch (InterruptedException ie) {
return;
} catch (Throwable t) {
exceptionReporter.reportException(t);
}
}
這裏的重點是copyFromHost獲取數據的函數。
protected void copyFromHost(MapHost host) throws IOException {
// reset retryStartTime for a new host
//這是在ReduceTask的節點上運行的
retryStartTime = 0;
// Get completed maps on 'host'
List<TaskAttemptID> maps = scheduler.getMapsForHost(host);
//獲取目標節點上的MapTask集合。
// Sanity check to catch hosts with only 'OBSOLETE' maps,
// especially at the tail of large jobs
if (maps.size() == 0) {
return;//沒有完成的直接返回
}
if(LOG.isDebugEnabled()) {
LOG.debug("Fetcher " + id + " going to fetch from " + host + " for: "
+ maps);
}
// List of maps to be fetched yet
Set remaining = new HashSet(maps);
//已經完成、等待shuffle的MapTask集合。
// Construct the url and connect
DataInputStream input = null;
URL url = getMapOutputURL(host, maps);
//生成MapTask所在節點的URL,下面要看getMapOutputURL源碼:
private URL getMapOutputURL(MapHost host, Collection maps
) throws MalformedURLException {
// Get the base url
StringBuffer url = new StringBuffer(host.getBaseUrl());
boolean first = true;
for (TaskAttemptID mapId : maps) {
if (!first) {
url.append(",");
}
url.append(mapId);//在URL後面加上mapid
first = false;
}
LOG.debug("MapOutput URL for " + host + " -> " + url.toString());
//寫日誌
return new URL(url.toString());
//返回URL
}
回到主代碼:
try {
setupConnectionsWithRetry(host, remaining, url);
//和對方主機創建HTTP鏈接,setupConnectionsWithRetry使用了openConnectionWithRetry函數打開連接。
openConnectionWithRetry(host, remaining, url);
這段源代碼有使用了openConnection(url);方式,繼續查看。
以下是連接的主要過程:
protected synchronized void openConnection(URL url)
throws IOException {
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
//使用的是HTTPURL進行鏈接
if (sslShuffle) {//若是是有信任證書的
HttpsURLConnection httpsConn = (HttpsURLConnection) conn;
//強轉conn類型
try {
httpsConn.setSSLSocketFactory(sslFactory.createSSLSocketFactory());//添加一個證書socket的工廠
} catch (GeneralSecurityException ex) {
throw new IOException(ex);
}
httpsConn.setHostnameVerifier(sslFactory.getHostnameVerifier());
}
connection = conn;
}
在setupConnectionsWithRetry中繼續寫到:
setupShuffleConnection(encHash);
//創建了Shuffle連接
connect(connection, connectionTimeout);
// verify that the thread wasn't stopped during calls to connect
if (stopped) {
return;
}
verifyConnection(url, msgToEncode, encHash);
}
//至此鏈接經過。
if (stopped) {
abortConnect(host, remaining);
//這裏邊是關閉鏈接,能夠點進去看一下,知足列表和等待的兩個條件
return;
}
} catch (IOException ie) {
boolean connectExcpt = ie instanceof ConnectException;
ioErrs.increment(1);
LOG.warn("Failed to connect to " + host + " with " + remaining.size() +
" map outputs", ie);
回到主代碼
input = new DataInputStream(connection.getInputStream());
//實例一個輸入流對象。
try {
// Loop through available map-outputs and fetch them
// On any error, faildTasks is not null and we exit
// after putting back the remaining maps to the
// yet_to_be_fetched list and marking the failed tasks.
TaskAttemptID[] failedTasks = null;
while (!remaining.isEmpty() && failedTasks == null) {
//若是須要fetcher的列表不空,而且失敗的task數量沒有
try {
failedTasks = copyMapOutput(host, input, remaining, fetchRetryEnabled);
//複製數據出來copyMapOutput的源代碼以下:
try {
ShuffleHeader header = new ShuffleHeader();
header.readFields(input);
mapId = TaskAttemptID.forName(header.mapId);
//獲取mapID
compressedLength = header.compressedLength;
decompressedLength = header.uncompressedLength;
forReduce = header.forReduce;
} catch (IllegalArgumentException e) {
badIdErrs.increment(1);
LOG.warn("Invalid map id ", e);
//Don't know which one was bad, so consider all of them as bad
return remaining.toArray(new TaskAttemptID[remaining.size()]);
}
InputStream is = input;
is = CryptoUtils.wrapIfNecessary(jobConf, is, compressedLength);
compressedLength -= CryptoUtils.cryptoPadding(jobConf);
decompressedLength -= CryptoUtils.cryptoPadding(jobConf);
//若是須要解壓或解密
// Do some basic sanity verification
if (!verifySanity(compressedLength, decompressedLength, forReduce,
remaining, mapId)) {
return new TaskAttemptID[] {mapId};
}
if(LOG.isDebugEnabled()) {
LOG.debug("header: " + mapId + ", len: " + compressedLength +
", decomp len: " + decompressedLength);
}
try {
mapOutput = merger.reserve(mapId, decompressedLength, id);
//爲merge預留一個MapOutput:是內存仍是磁盤上。
} catch (IOException ioe) {
// kill this reduce attempt
ioErrs.increment(1);
scheduler.reportLocalError(ioe);
//報告錯誤
return EMPTY_ATTEMPT_ID_ARRAY;
}
// Check if we can shuffle *now* ...
if (mapOutput == null) {
LOG.info("fetcher#" + id + " - MergeManager returned status WAIT ...");
//Not an error but wait to process data.
return EMPTY_ATTEMPT_ID_ARRAY;
}
// The codec for lz0,lz4,snappy,bz2,etc. throw java.lang.InternalError
// on decompression failures. Catching and re-throwing as IOException
// to allow fetch failure logic to be processed
try {
// Go!
LOG.info("fetcher#" + id + " about to shuffle output of map "
+ mapOutput.getMapId() + " decomp: " + decompressedLength
+ " len: " + compressedLength + " to " + mapOutput.getDescription());
mapOutput.shuffle(host, is, compressedLength, decompressedLength,
metrics, reporter);
//跨節點把Mapper的文件內容拷貝到reduce的內存或者磁盤上。
} catch (java.lang.InternalError e) {
LOG.warn("Failed to shuffle for fetcher#"+id, e);
throw new IOException(e);
}
// Inform the shuffle scheduler
long endTime = Time.monotonicNow();
// Reset retryStartTime as map task make progress if retried before.
retryStartTime = 0;
scheduler.copySucceeded(mapId, host, compressedLength,
startTime, endTime, mapOutput);
//告訴調度器完成了一個節點的Map輸出的文件拷貝。
remaining.remove(mapId);
//這個MapTask的輸出已經shuffle完畢
metrics.successFetch();
return null;後面的異常失敗信息咱們無論。
這裏的mapOutput是用來容納MapTask輸出文件的存儲空間,根據輸出文件的內容大小和內存的狀況,能夠是內存的Output也能夠是DiskOutput。 若是是內存須要預定,由於不止一個Fetcher。咱們以InMemoryMapOutput爲例。
代碼結構;
Fetcher.run-->copyFromHost-->copyMapOutput-->merger.reserve(MergeManagerImpl.reserve)-->InmemoryMapOutput.shuffle
public void shuffle(MapHost host, InputStream input,
long compressedLength, long decompressedLength,
ShuffleClientMetrics metrics,
Reporter reporter) throws IOException {
//跨節點從Mapper拷貝spill文件
IFileInputStream checksumIn =
new IFileInputStream(input, compressedLength, conf);
//校驗和的輸入流
input = checksumIn;
// Are map-outputs compressed?
if (codec != null) {
//若是涉及到了壓縮
decompressor.reset();
//重啓解壓器
input = codec.createInputStream(input, decompressor);
//加了解壓器的輸入流
}
try {
IOUtils.readFully(input, memory, 0, memory.length);
//從Mapper方把特定的Partition數據讀入Reducer的內存緩衝區。
metrics.inputBytes(memory.length);
reporter.progress();//彙報進度
LOG.info("Read " + memory.length + " bytes from map-output for " +
getMapId());
/**
* We've gotten the amount of data we were expecting. Verify the
* decompressor has nothing more to offer. This action also forces the
* decompressor to read any trailing bytes that weren't critical
* for decompression, which is necessary to keep the stream
* in sync.
*/
if (input.read() >= 0 ) {
throw new IOException("Unexpected extra bytes from input stream for " +
getMapId());
}
} catch (IOException ioe) {
// Close the streams
IOUtils.cleanup(LOG, input);
// Re-throw
throw ioe;
} finally {
CodecPool.returnDecompressor(decompressor);
//釋放解壓器
}
}
從對方把spill文件中屬於本partition數據複製過來,回到copyFromHost中,經過scheduler.copySuccessed告知scheduler,並把這個MapTask的ID從remaining集合中刪除,進入下一個循環,複製下一個MapTask數據。直到把全部的屬於本Partition的數據都複製過來。
以上是Reducer端Fetcher的過程,它向Mapper端發送HTTP GET請求,下載文件。在MapTask就有一個與之對應的Server,這個網絡協議的源代碼不作深究,課下有興趣本身研究。 獲取更多大數據視頻資料請加QQ羣:947967114