手動計算yarn和mapareduce的配置

Hadoop YARN同時支持內存和CPU兩種資源的調度,本文介紹如何配置YARN對內存和CPU的使用。html

YARN做爲一個資源調度器,應該考慮到集羣裏面每一臺機子的計算資源,而後根據application申請的資源進行分配Container。Container是YARN裏面資源分配的基本單位,具備必定的內存以及CPU資源。java

在YARN集羣中,平衡內存、CPU、磁盤的資源的很重要的,根據經驗,每兩個container使用一塊磁盤以及一個CPU核的時候可使集羣的資源獲得一個比較好的利用。node

內存配置

關於內存相關的配置能夠參考hortonwork公司的文檔Determine HDP Memory Configuration Settings來配置你的集羣。python

YARN以及MAPREDUCE全部可用的內存資源應該要除去系統運行須要的以及其餘的hadoop的一些程序,總共保留的內存=系統內存+HBASE內存。app

能夠參考下面的表格肯定應該保留的內存:oop

每臺機子內存
系統須要的內存
HBase須要的內存性能

每臺機子內存測試

系統須要的內存spa

hbase須要的內存操作系統

4GB 1GB 1GB
8GB 2GB 2GB
16GB 2GB 2GB
24GB 4GB 4GB
48GB 6GB 6GB
64GB 8GB 8GB
72GB 8GB 8GB
96GB 12GB 12GB
128GB 24GB 24GB
255GB 32GB 32GB
512GB 64GB 64GB

計算每臺機子最多能夠擁有多少個container,可使用下面的公式:

containers = min (2*CORES, 1.8*DISKS, (Total available RAM) / MIN_CONTAINER_SIZE)

說明:

  • CORES爲機器CPU核數
  • DISKS爲機器上掛載的磁盤個數
  • Total available RAM爲機器總內存
  • MIN_CONTAINER_SIZE是指container最小的容量大小,這須要根據具體狀況去設置,能夠參考下面的表格:
每臺機子可用的RAM container最小值
小於4GB 256MB
4GB到8GB之間 512MB
8GB到24GB之間 1024MB
大於24GB 2048MB

每一個container的平均使用內存大小計算方式爲:

RAM-per-container = max(MIN_CONTAINER_SIZE, (Total Available RAM) / containers))

經過上面的計算,YARN以及MAPREDUCE能夠這樣配置:

配置文件 配置設置 默認值 計算值
yarn-site.xml yarn.nodemanager.resource.memory-mb 8192 MB = containers * RAM-per-container
yarn-site.xml yarn.scheduler.minimum-allocation-mb 1024MB = RAM-per-container
yarn-site.xml yarn.scheduler.maximum-allocation-mb 8192 MB = containers * RAM-per-container
yarn-site.xml(check) yarn.app.mapreduce.am.resource.mb 1536 MB = 2 * RAM-per-container
yarn-site.xml(check) yarn.app.mapreduce.am.command-opts -Xmx1024m = 0.8 * 2 * RAM-per-container
mapred-site.xml mapreduce.map.memory.mb 1024 MB = RAM-per-container
mapred-site.xml mapreduce.reduce.memory.mb 1024 MB = 2 * RAM-per-container
mapred-site.xml mapreduce.map.java.opts   = 0.8 * RAM-per-container
mapred-site.xml mapreduce.reduce.java.opts   = 0.8 * 2 * RAM-per-container

舉個例子:對於128G內存、32核CPU的機器,掛載了7個磁盤,根據上面的說明,系統保留內存爲24G,不適應HBase狀況下,系統剩餘可用內存爲104G,計算containers值以下:

containers = min (2*32, 1.8* 7 , (128-24)/2) = min (64, 12.6 , 51) = 13

計算RAM-per-container值以下:

RAM-per-container = max (2, (124-24)/13) = max (2, 8) = 8

你也可使用腳本yarn-utils.py來計算上面的值:

#!/usr/bin/env python
import optparse
from pprint import pprint
import logging
import sys
import math
import ast

''' Reserved for OS + DN + NM,  Map: Memory => Reservation '''
reservedStack = { 4:1, 8:2, 16:2, 24:4, 48:6, 64:8, 72:8, 96:12, 
                   128:24, 256:32, 512:64}
''' Reserved for HBase. Map: Memory => Reservation '''

reservedHBase = {4:1, 8:1, 16:2, 24:4, 48:8, 64:8, 72:8, 96:16, 
                   128:24, 256:32, 512:64}
GB = 1024

def getMinContainerSize(memory):
  if (memory <= 4):
    return 256
  elif (memory <= 8):
    return 512
  elif (memory <= 24):
    return 1024
  else:
    return 2048
  pass

def getReservedStackMemory(memory):
  if (reservedStack.has_key(memory)):
    return reservedStack[memory]
  if (memory <= 4):
    ret = 1
  elif (memory >= 512):
    ret = 64
  else:
    ret = 1
  return ret

def getReservedHBaseMem(memory):
  if (reservedHBase.has_key(memory)):
    return reservedHBase[memory]
  if (memory <= 4):
    ret = 1
  elif (memory >= 512):
    ret = 64
  else:
    ret = 2
  return ret

def main():
  log = logging.getLogger(__name__)
  out_hdlr = logging.StreamHandler(sys.stdout)
  out_hdlr.setFormatter(logging.Formatter(' %(message)s'))
  out_hdlr.setLevel(logging.INFO)
  log.addHandler(out_hdlr)
  log.setLevel(logging.INFO)
  parser = optparse.OptionParser()
  memory = 0
  cores = 0
  disks = 0
  hbaseEnabled = True
  parser.add_option('-c', '--cores', default = 16,
                     help = 'Number of cores on each host')
  parser.add_option('-m', '--memory', default = 64, 
                    help = 'Amount of Memory on each host in GB')
  parser.add_option('-d', '--disks', default = 4, 
                    help = 'Number of disks on each host')
  parser.add_option('-k', '--hbase', default = "True",
                    help = 'True if HBase is installed, False is not')
  (options, args) = parser.parse_args()

  cores = int (options.cores)
  memory = int (options.memory)
  disks = int (options.disks)
  hbaseEnabled = ast.literal_eval(options.hbase)

  log.info("Using cores=" +  str(cores) + " memory=" + str(memory) + "GB" +
            " disks=" + str(disks) + " hbase=" + str(hbaseEnabled))
  minContainerSize = getMinContainerSize(memory)
  reservedStackMemory = getReservedStackMemory(memory)
  reservedHBaseMemory = 0
  if (hbaseEnabled):
    reservedHBaseMemory = getReservedHBaseMem(memory)
  reservedMem = reservedStackMemory + reservedHBaseMemory
  usableMem = memory - reservedMem
  memory -= (reservedMem)
  if (memory < 2):
    memory = 2
    reservedMem = max(0, memory - reservedMem)

  memory *= GB

  containers = int (min(2 * cores,
                         min(math.ceil(1.8 * float(disks)),
                              memory/minContainerSize)))
  if (containers <= 2):
    containers = 3

  log.info("Profile: cores=" + str(cores) + " memory=" + str(memory) + "MB"
           + " reserved=" + str(reservedMem) + "GB" + " usableMem="
           + str(usableMem) + "GB" + " disks=" + str(disks))

  container_ram =  abs(memory/containers)
  if (container_ram > GB):
    container_ram = int(math.floor(container_ram / 512)) * 512
  log.info("Num Container=" + str(containers))
  log.info("Container Ram=" + str(container_ram) + "MB")
  log.info("Used Ram=" + str(int (containers*container_ram/float(GB))) + "GB")
  log.info("Unused Ram=" + str(reservedMem) + "GB")
  log.info("yarn.scheduler.minimum-allocation-mb=" + str(container_ram))
  log.info("yarn.scheduler.maximum-allocation-mb=" + str(containers*container_ram))
  log.info("yarn.nodemanager.resource.memory-mb=" + str(containers*container_ram))
  map_memory = container_ram
  reduce_memory = 2*container_ram if (container_ram <= 2048) else container_ram
  am_memory = max(map_memory, reduce_memory)
  log.info("mapreduce.map.memory.mb=" + str(map_memory))
  log.info("mapreduce.map.java.opts=-Xmx" + str(int(0.8 * map_memory)) +"m")
  log.info("mapreduce.reduce.memory.mb=" + str(reduce_memory))
  log.info("mapreduce.reduce.java.opts=-Xmx" + str(int(0.8 * reduce_memory)) + "m")
  log.info("yarn.app.mapreduce.am.resource.mb=" + str(am_memory))
  log.info("yarn.app.mapreduce.am.command-opts=-Xmx" + str(int(0.8*am_memory)) + "m")
  log.info("mapreduce.task.io.sort.mb=" + str(int(0.4 * map_memory)))
  pass

if __name__ == '__main__':
  try:
    main()
  except(KeyboardInterrupt, EOFError):
    print("\nAborting ... Keyboard Interrupt.")
    sys.exit(1)

執行下面命令:

python yarn-utils.py -c 32 -m 128 -d 7 -k False

返回結果以下:

Using cores=32 memory=128GB disks=7 hbase=False
 Profile: cores=32 memory=106496MB reserved=24GB usableMem=104GB disks=7
 Num Container=13
 Container Ram=8192MB
 Used Ram=104GB
 Unused Ram=24GB
 yarn.scheduler.minimum-allocation-mb=8192
 yarn.scheduler.maximum-allocation-mb=106496
 yarn.nodemanager.resource.memory-mb=106496
 mapreduce.map.memory.mb=8192
 mapreduce.map.java.opts=-Xmx6553m
 mapreduce.reduce.memory.mb=8192
 mapreduce.reduce.java.opts=-Xmx6553m
 yarn.app.mapreduce.am.resource.mb=8192
 yarn.app.mapreduce.am.command-opts=-Xmx6553m
 mapreduce.task.io.sort.mb=3276

這樣的話,每一個container內存爲8G,彷佛有點多,我更願意根據集羣使用狀況任務將其調整爲2G內存,則集羣中下面的參數配置值以下:

配置文件 配置設置 計算值
yarn-site.xml yarn.nodemanager.resource.memory-mb = 52 * 2 =104 G
yarn-site.xml yarn.scheduler.minimum-allocation-mb = 2G
yarn-site.xml yarn.scheduler.maximum-allocation-mb = 52 * 2 = 104G
yarn-site.xml (check) yarn.app.mapreduce.am.resource.mb = 2 * 2=4G
yarn-site.xml (check) yarn.app.mapreduce.am.command-opts = 0.8 * 2 * 2=3.2G
mapred-site.xml mapreduce.map.memory.mb = 2G
mapred-site.xml mapreduce.reduce.memory.mb = 2 * 2=4G
mapred-site.xml mapreduce.map.java.opts = 0.8 * 2=1.6G
mapred-site.xml mapreduce.reduce.java.opts = 0.8 * 2 * 2=3.2G

對應的xml配置爲:

<property>
      <name>yarn.nodemanager.resource.memory-mb</name>
      <value>106496</value>
  </property>
  <property>
      <name>yarn.scheduler.minimum-allocation-mb</name>
      <value>2048</value>
  </property>
  <property>
      <name>yarn.scheduler.maximum-allocation-mb</name>
      <value>106496</value>
  </property>
  <property>
      <name>yarn.app.mapreduce.am.resource.mb</name>
      <value>4096</value>
  </property>
  <property>
      <name>yarn.app.mapreduce.am.command-opts</name>
      <value>-Xmx3276m</value>
  </property>

另外,還有一下幾個參數:

  • yarn.nodemanager.vmem-pmem-ratio:任務每使用1MB物理內存,最多可以使用虛擬內存量,默認是2.1。
  • yarn.nodemanager.pmem-check-enabled:是否啓動一個線程檢查每一個任務正使用的物理內存量,若是任務超出分配值,則直接將其殺掉,默認是true。
  • yarn.nodemanager.vmem-pmem-ratio:是否啓動一個線程檢查每一個任務正使用的虛擬內存量,若是任務超出分配值,則直接將其殺掉,默認是true。

第一個參數的意思是當一個map任務總共分配的物理內存爲2G的時候,該任務的container最多內分配的堆內存爲1.6G,能夠分配的虛擬內存上限爲2*2.1=4.2G。另外,照這樣算下去,每一個節點上YARN能夠啓動的Map數爲104/2=52個。

CPU配置

YARN中目前的CPU被劃分紅虛擬CPU(CPU virtual Core),這裏的虛擬CPU是YARN本身引入的概念,初衷是,考慮到不一樣節點的CPU性能可能不一樣,每一個CPU具備的計算能力也是不同的,好比某個物理CPU的計算能力多是另一個物理CPU的2倍,這時候,你能夠經過爲第一個物理CPU多配置幾個虛擬CPU彌補這種差別。用戶提交做業時,能夠指定每一個任務須要的虛擬CPU個數。

在YARN中,CPU相關配置參數以下:

  • yarn.nodemanager.resource.cpu-vcores:表示該節點上YARN可以使用的虛擬CPU個數,默認是8,注意,目前推薦將該值設值爲與物理CPU核數數目相同。若是你的節點CPU核數不夠8個,則須要調減少這個值,而YARN不會智能的探測節點的物理CPU總數。
  • yarn.scheduler.minimum-allocation-vcores:單個任務可申請的最小虛擬CPU個數,默認是1,若是一個任務申請的CPU個數少於該數,則該對應的值改成這個數。
  • yarn.scheduler.maximum-allocation-vcores:單個任務可申請的最多虛擬CPU個數,默認是32。

對於一個CPU核數較多的集羣來講,上面的默認配置顯然是不合適的,在個人測試集羣中,4個節點每一個機器CPU核數爲31,留一個給操做系統,能夠配置爲:

<property>
      <name>yarn.nodemanager.resource.cpu-vcores</name>
      <value>31</value>
  </property>
  <property>
      <name>yarn.scheduler.maximum-allocation-vcores</name>
      <value>124</value>
  </property>

轉自 http://blog.javachen.com/2015/06/05/yarn-memory-and-cpu-configuration.html

相關文章
相關標籤/搜索