spark 2.1.1html
最近spark任務(spark on yarn)有一個報錯java
Diagnostics: Container [pid=5901,containerID=container_1542879939729_30802_01_000001] is running beyond physical memory limits. Current usage: 11.0 GB of 11 GB physical memory used; 12.2 GB of 23.1 GB virtual memory used. Killing container.
Dump of the process-tree for container_1542879939729_30802_01_000001 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 5901 5899 5901 5901 (bash) 3 4 115843072 361 /bin/bash -c LD_LIBRARY_PATH=/export/App/hadoop-2.6.1/lib/native::/usr/lib/hadoop/lib/native:/usr/lib/hadoop/lib/native::/export/App/hadoop-2.6.1/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop/lib/native::/export/App/hadoop-2.6.1/lib/native:/export/App/hadoop-2.6.1/lib/native /export/App/jdk1.8.0_60/bin/java -server -Xmx10240m -Djava.io.tmpdir=/export/Data/tmp/hadoop-tmp/nm-local-dir/usercache/hadoop/appcache/application_1542879939729_30802/container_1542879939729_30802_01_000001/tmp '-XX:+PrintGCDetails' '-XX:+UseG1GC' '-XX:G1HeapRegionSize=32M' '-XX:+UseGCOverheadLimit' '-XX:+ExplicitGCInvokesConcurrent' '-XX:+HeapDumpOnOutOfMemoryError' '-XX:-UseCompressedClassPointers' '-XX:CompressedClassSpaceSize=3G' '-XX:+PrintGCTimeStamps' '-Xloggc:/export/Logs/hadoop/g1gc.log' -Dspark.yarn.app.container.log.dir=/export/Logs/hadoop/userlogs/application_1542879939729_30802/container_1542879939729_30802_01_000001 org.apache.spark.deploy.yarn.ApplicationMaster --class 'app.package.AppClass' --jar file:/jarpath/app.jar --properties-file /export/Data/tmp/hadoop-tmp/nm-local-dir/usercache/hadoop/appcache/application_1542879939729_30802/container_1542879939729_30802_01_000001/__spark_conf__/__spark_conf__.properties 1> /export/Logs/hadoop/userlogs/application_1542879939729_30802/container_1542879939729_30802_01_000001/stdout 2> /export/Logs/hadoop/userlogs/application_1542879939729_30802/container_1542879939729_30802_01_000001/stderr
|- 6406 5901 5901 5901 (java) 1834301 372741 13026095104 2888407 /export/App/jdk1.8.0_60/bin/java -server -Xmx10240m -Djava.io.tmpdir=/export/Data/tmp/hadoop-tmp/nm-local-dir/usercache/hadoop/appcache/application_1542879939729_30802/container_1542879939729_30802_01_000001/tmp -XX:+PrintGCDetails -XX:+UseG1GC -XX:G1HeapRegionSize=32M -XX:+UseGCOverheadLimit -XX:+ExplicitGCInvokesConcurrent -XX:+HeapDumpOnOutOfMemoryError -XX:-UseCompressedClassPointers -XX:CompressedClassSpaceSize=3G -XX:+PrintGCTimeStamps -Xloggc:/export/Logs/hadoop/g1gc.log -Dspark.yarn.app.container.log.dir=/export/Logs/hadoop/userlogs/application_1542879939729_30802/container_1542879939729_30802_01_000001 org.apache.spark.deploy.yarn.ApplicationMaster --class app.package.AppClass --jar file:/jarpath/app.jar --properties-file /export/Data/tmp/hadoop-tmp/nm-local-dir/usercache/hadoop/appcache/application_1542879939729_30802/container_1542879939729_30802_01_000001/__spark_conf__/__spark_conf__.properties
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Failing this attemptapache
從containerID=container_1542879939729_30802_01_000001,以及org.apache.spark.deploy.yarn.ApplicationMaster,可知這個是yarn的ApplicationMaster,運行的是spark的driver,bash
問題是提交spark任務時參數爲 --driver-moery 10g,並且進程啓動命令中也確實是 -Xmx10240m,爲何container被kill是由於超過11g?app
Container [pid=5901,containerID=container_1542879939729_30802_01_000001] is running beyond physical memory limits. Current usage: 11.0 GB of 11 GB physical memory used;oop
跟進spark任務提交過程,詳見:https://www.cnblogs.com/barneywill/p/9820684.htmlui
org.apache.spark.launcher.SparkSubmitCommandBuilderthis
String tsMemory = isThriftServer(mainClass) ? System.getenv("SPARK_DAEMON_MEMORY") : null; String memory = firstNonEmpty(tsMemory, config.get(SparkLauncher.DRIVER_MEMORY), System.getenv("SPARK_DRIVER_MEMORY"), System.getenv("SPARK_MEM"), DEFAULT_MEM); cmd.add("-Xmx" + memory);
這裏會取driver memory的值,取的地方有一個優先級,firstNonEmptyspa
org.apache.spark.deploy.SparkSubmitrest
// In yarn-cluster mode, use yarn.Client as a wrapper around the user class if (isYarnCluster) { childMainClass = "org.apache.spark.deploy.yarn.Client"
若是--master yarn時,會提交Client類
org.apache.spark.deploy.yarn.Client
// AM related configurations private val amMemory = if (isClusterMode) { sparkConf.get(DRIVER_MEMORY).toInt } else { sparkConf.get(AM_MEMORY).toInt } private val amMemoryOverhead = { val amMemoryOverheadEntry = if (isClusterMode) DRIVER_MEMORY_OVERHEAD else AM_MEMORY_OVERHEAD sparkConf.get(amMemoryOverheadEntry).getOrElse( math.max((MEMORY_OVERHEAD_FACTOR * amMemory).toLong, MEMORY_OVERHEAD_MIN)).toInt } private val amCores = if (isClusterMode) { sparkConf.get(DRIVER_CORES) } else { sparkConf.get(AM_CORES) } // Executor related configurations private val executorMemory = sparkConf.get(EXECUTOR_MEMORY) private val executorMemoryOverhead = sparkConf.get(EXECUTOR_MEMORY_OVERHEAD).getOrElse( math.max((MEMORY_OVERHEAD_FACTOR * executorMemory).toLong, MEMORY_OVERHEAD_MIN)).toInt
其中會設置amMemoryOverhead 和executorMemoryOverhead
val capability = Records.newRecord(classOf[Resource]) capability.setMemory(amMemory + amMemoryOverhead) capability.setVirtualCores(amCores)
而後會根據amMemory+amMemoryOverhead的值來向yarn申請資源;
一些默認值和配置以下:
org.apache.spark.deploy.yarn.YarnSparkHadoopUtil
object YarnSparkHadoopUtil { // Additional memory overhead // 10% was arrived at experimentally. In the interest of minimizing memory waste while covering // the common cases. Memory overhead tends to grow with container size. val MEMORY_OVERHEAD_FACTOR = 0.10 val MEMORY_OVERHEAD_MIN = 384L
org.apache.spark.deploy.yarn.config
private[spark] val DRIVER_MEMORY_OVERHEAD = ConfigBuilder("spark.yarn.driver.memoryOverhead") .bytesConf(ByteUnit.MiB) .createOptional private[spark] val EXECUTOR_MEMORY_OVERHEAD = ConfigBuilder("spark.yarn.executor.memoryOverhead") .bytesConf(ByteUnit.MiB) .createOptional
因此默認的driver memory申請方式爲
1 spark.yarn.driver.memoryOverhead 配置優先
2 driverMemory + overhead
其中 overhead = math.max((0.1 * driverMemory).toLong, 384))
因此--driver-memory 10g時向yarn申請的container內存是11g