Apache YARN (Yet Another Resource Negotiator)從Hadoop2開始。YARN爲集羣提供資源管理和Applications的調度。YARN的API用於操做集羣的資源。node
MapReduce1:分佈式
JobTracker的職責:ide
(1)Job調度(將Tasks與TaskTrackers匹配)oop
(2)Task進程監控(keeping track of tasks, restarting failed orslow tasks, and doing task bookkeeping, such as maintaining counter totals)scala
(3)存儲已經完成的job的歷史信息rest
TaskTracker的職責:blog
運行tasks,向JobTracker發送進展報告進程
Scalability:內存
MapReduce 1 hits scalabilitybottlenecks in the region of 4,000 nodes and 40,000 tasks資源
Yarn is designed to scale up to 10,000 nodes and 100,000 tasks
Availability:
High availability (HA) is usually achieved by replicating the state needed for anotherdaemon to take over the work needed to provide the service, in the event of the service daemon failing.
JobTracker的內存複雜而且不斷變化(each task status is updated every few seconds),很難支持HA。而YARN的RM、NM、AM都支持HA。
Utilization:
MapReduce1中,每一個TaskTracker在配置階段被分配固定大小的slot,分別爲map slot (只能運行map task)和 reduce slot(只能運行reduce task),所以MRv1可能存在只有map slot可用而reduce slot不可用,形成reduce tasks必須等待的狀況。此外,slot太大會浪費資源,slot過小可能致使失敗。
YARN中每一個NodeManager掌管一個資源池,資源是細粒度的,aoo請求所需的資源便可。
Multitenancy:
YARN最大的優點是從Hadoop中抽離出來,可以支持除了MapReduce以外的其餘分佈式Application,好比Spark的ClusterManager能夠使YARN