場景:
總資源:18G內存,9vcore
剩餘資源:14G,5vcore
4個running application 每一個佔用1G,1vcore
5個accepted applicationhtml
配置:node
Dynamic Resource Pool Configuration:linux
最小資源數:1vcore,512mbweb
最大資源數:9vcore,18gbapache
最大運行數:6app
Application Master 最大份額:0.4(限制可用於運行 Application Master 的資源池公平份額的比例。例如,若是設爲 1.0,葉池中的 AM 最多可以使用 100% 的內存和 CPU 公平份額。若是值爲 -1.0,則此功能被禁用,Application Master 份額不會被檢查。默認值爲 0.5。)webapp
有三個節點:每一個節點6gb內存,3vcoreide
yarn-site.xmloop
<?xml version="1.0" encoding="UTF-8"?> <!--Autogenerated by Cloudera Manager--> <configuration> <property> <name>yarn.acl.enable</name> <value>true</value> </property> <property> <name>yarn.admin.acl</name> <value>*</value> </property> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>259200</value> </property> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.ha.automatic-failover.enabled</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.ha.automatic-failover.embedded</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.recovery.enabled</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.zk-address</name> <value>sz280111:2181,sz280113:2181,sz280112:2181</value> </property> <property> <name>yarn.resourcemanager.store.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value> </property> <property> <name>yarn.client.failover-sleep-base-ms</name> <value>100</value> </property> <property> <name>yarn.client.failover-sleep-max-ms</name> <value>2000</value> </property> <property> <name>yarn.resourcemanager.cluster-id</name> <value>yarnRM</value> </property> <property> <name>yarn.resourcemanager.work-preserving-recovery.enabled</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.address.rm49</name> <value>sz280111:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address.rm49</name> <value>sz280111:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address.rm49</name> <value>sz280111:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address.rm49</name> <value>sz280111:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm49</name> <value>sz280111:8088</value> </property> <property> <name>yarn.resourcemanager.webapp.https.address.rm49</name> <value>sz280111:8090</value> </property> <property> <name>yarn.resourcemanager.address.rm61</name> <value>sz280112:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address.rm61</name> <value>sz280112:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address.rm61</name> <value>sz280112:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address.rm61</name> <value>sz280112:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm61</name> <value>sz280112:8088</value> </property> <property> <name>yarn.resourcemanager.webapp.https.address.rm61</name> <value>sz280112:8090</value> </property> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm49,rm61</value> </property> <property> <name>yarn.nodemanager.recovery.enabled</name> <value>true</value> </property> <property> <name>yarn.nodemanager.recovery.dir</name> <value>/qhapp/cdh/var/lib/hadoop-yarn/yarn-nm-recovery</value> </property> <property> <name>yarn.resourcemanager.client.thread-count</name> <value>50</value> </property> <property> <name>yarn.resourcemanager.scheduler.client.thread-count</name> <value>50</value> </property> <property> <name>yarn.resourcemanager.admin.client.thread-count</name> <value>1</value> </property> <property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>512</value> </property> <property> <name>yarn.scheduler.increment-allocation-mb</name> <value>256</value> </property> <property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>2048</value> </property> <property> <name>yarn.scheduler.minimum-allocation-vcores</name> <value>1</value> </property> <property> <name>yarn.scheduler.increment-allocation-vcores</name> <value>1</value> </property> <property> <name>yarn.scheduler.maximum-allocation-vcores</name> <value>8</value> </property> <property> <name>yarn.resourcemanager.amliveliness-monitor.interval-ms</name> <value>1000</value> </property> <property> <name>yarn.am.liveness-monitor.expiry-interval-ms</name> <value>600000</value> </property> <property> <name>yarn.resourcemanager.am.max-attempts</name> <value>2</value> </property> <property> <name>yarn.resourcemanager.container.liveness-monitor.interval-ms</name> <value>600000</value> </property> <property> <name>yarn.resourcemanager.nm.liveness-monitor.interval-ms</name> <value>1000</value> </property> <property> <name>yarn.nm.liveness-monitor.expiry-interval-ms</name> <value>600000</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.client.thread-count</name> <value>50</value> </property> <property> <name>yarn.application.classpath</name> <value>$HADOOP_CLIENT_CONF_DIR,$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*</value> </property> <property> <name>yarn.resourcemanager.scheduler.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value> </property> <property> <name>yarn.nodemanager.container-monitor.interval-ms</name> <value>3000</value> </property> <property> <name>yarn.resourcemanager.max-completed-applications</name> <value>1000</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.nodemanager.local-dirs</name> <value>/qhapp/cdh/var/lib/yarn/nm</value> </property> <property> <name>yarn.nodemanager.log-dirs</name> <value>/qhapp/cdh/var/log/yarn/container-logs</value> </property> <property> <name>yarn.nodemanager.webapp.address</name> <value>sz280108:8042</value> </property> <property> <name>yarn.nodemanager.webapp.https.address</name> <value>sz280108:8044</value> </property> <property> <name>yarn.nodemanager.address</name> <value>sz280108:8041</value> </property> <property> <name>yarn.nodemanager.admin-env</name> <value>MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX</value> </property> <property> <name>yarn.nodemanager.env-whitelist</name> <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,HADOOP_YARN_HOME</value> </property> <property> <name>yarn.nodemanager.container-manager.thread-count</name> <value>20</value> </property> <property> <name>yarn.nodemanager.delete.thread-count</name> <value>4</value> </property> <property> <name>yarn.resourcemanager.nodemanagers.heartbeat-interval-ms</name> <value>100</value> </property> <property> <name>yarn.nodemanager.localizer.address</name> <value>sz280108:8040</value> </property> <property> <name>yarn.nodemanager.localizer.cache.cleanup.interval-ms</name> <value>600000</value> </property> <property> <name>yarn.nodemanager.localizer.cache.target-size-mb</name> <value>5120</value> </property> <property> <name>yarn.nodemanager.localizer.client.thread-count</name> <value>5</value> </property> <property> <name>yarn.nodemanager.localizer.fetch.thread-count</name> <value>4</value> </property> <property> <name>yarn.nodemanager.log.retain-seconds</name> <value>10800</value> </property> <property> <name>yarn.nodemanager.remote-app-log-dir</name> <value>/tmp/logs</value> </property> <property> <name>yarn.nodemanager.remote-app-log-dir-suffix</name> <value>logs</value> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>6144</value> </property> <property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>3</value> </property> <property> <name>yarn.nodemanager.delete.debug-delay-sec</name> <value>0</value> </property> <property> <name>yarn.nodemanager.health-checker.script.path</name> <value></value> </property> <property> <name>yarn.nodemanager.health-checker.script.opts</name> <value></value> </property> <property> <name>yarn.nodemanager.disk-health-checker.interval-ms</name> <value>120000</value> </property> <property> <name>yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb</name> <value>0</value> </property> <property> <name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name> <value>90.0</value> </property> <property> <name>yarn.nodemanager.disk-health-checker.min-healthy-disks</name> <value>0.25</value> </property> <property> <name>mapreduce.shuffle.max.threads</name> <value>80</value> </property> <property> <name>yarn.log.server.url</name> <value>http://sz280111:19888/jobhistory/logs/</value> </property> <property> <name>yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user</name> <value>nobody</value> </property> <property> <name>yarn.nodemanager.linux-container-executor.resources-handler.class</name> <value>org.apache.hadoop.yarn.server.nodemanager.util.DefaultLCEResourcesHandler</value> </property> <property> <name>yarn.nodemanager.vmem-pmem-ratio</name> <value>6</value> </property> </configuration>
解決思路1:fetch
增長 Application Master 最大份額 到0.8
"Application Master 最大份額" 控制可用於AM容器的總羣集內存,和cpu的百分比。若是你有幾個做業,那麼每一個AM將爲每一個容器消耗所需的內存和cpu。
若是這超出了給定的總羣集內存的百分比,下一個AM運行將等待,直到它有空閒的資源纔會運行。
參考:https://community.hortonworks.com/questions/77454/tez-job-hang-waiting-for-am-container-to-be-alloca.html