spark on yarn

說明

這篇文章記錄下 spark提交左右在yarn上運行java

hadoop配置

主要配置yarn-site.xml文件,咱們目前使用mapreduce_shuffle,而有些公司也增長了spark_shufflenode

  • 只使用mapreduce_shuffleapache

    <property>
      <name>yarn.nodemanager.aux-services</name>
      <value>mapreduce_shuffle</value>
    </property>
    
    <property>
      <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
      <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    
    <property>
      <name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
      <value>org.apache.spark.network.yarn.YarnShuffleService</value>
    </property>
  • 使用mapreduce_shuffle & spark_shuffle服務器

    <property>
      <name>yarn.nodemanager.aux-services</name>
      <value>mapreduce_shuffle,spark_shuffle</value>
    </property>
    
    <property>
      <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
      <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    
    <property>
      <name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
      <value>org.apache.spark.network.yarn.YarnShuffleService</value>
    </property>

當提交hadoop MR 就啓用,mapreduce_shuffle,當提交spark做業 就使用spark_shuffle,但我的感受spark_shuffle 效率通常,shuffle是很大瓶頸,還有 若是你使用spark_shuffle 你須要把spark-yarn_2.10-1.4.1.jar 這個jar copy 到HADOOP_HOME/share/hadoop/lib下 ,不然 hadoop 運行報錯 class not find execeptionapp

spark配置

$SPARK_HOME/conf/spark-env.shoop

export YARN_CONF_DIR=/home/cluster/apps/hadoop/etc/hadoop

export JAVA_HOME=/home/cluster/share/java1.7
export SCALA_HOME=/home/cluster/share/scala-2.10.5
export HADOOP_HOME=/home/cluster/apps/hadoop
export HADOOP_CONF_DIR=/home/cluster/apps/hadoop/etc/hadoop
export SPARK_MASTER_IP=master

export SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:/home/cluster/apps/hadoop/lib/native
export SPARK_CLASSPATH=$SPARK_CLASSPATH:/home/cluster/apps/hadoop/share/hadoop/yarn/*:/home/cluster/apps/hadoop/share/hadoop/yarn/lib/*:/home/cluster/apps/hadoop/share/hadoop/common/*:/home/cluster/apps/hadoop/share/hadoop/common/lib/*:/home/cluster/apps/hadoop/share/hadoop/hdfs/*:/home/cluster/apps/hadoop/share/hadoop/hdfs/lib/*:/home/cluster/apps/hadoop/share/hadoop/mapreduce/*:/home/cluster/apps/hadoop/share/hadoop/mapreduce/lib/*:/home/cluster/apps/hadoop/share/hadoop/tools/lib/*:/home/cluster/apps/spark/spark-1.4.1/lib/*

SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080 -Dspark.history.retainedApplications=3 -Dspark.history.fs.logDirectory=hdfs://master:8020/var/log/spark"

參數解釋:
YARN_CONF_DIR:指定yarn配置所在路徑,若是不增長這行,在提交做業時候增長以下代碼:ui

export YARN_CONF_DIR=/home/cluster/apps/hadoop/etc/hadoop

HADOOP_HOME:指定hadoop 根目錄
HADOOP_CONF_DIR:hadoop配置文件,這個是在spark,如操做hdfs時候讀取hadoop配置文件
SPARK_LIBRARY_PATH:告訴spark讀取本地的.so文件
SPARK_CLASSPATH:spark加載各類須要的jar包
SPARK_HISTORY_OPTS:配置啓動spark history 服務spa

前置條件

若是操做hdfs,須要啓動namenode&datanode
還有yarn服務器,resourcemanger&nodemanager.net

/home/cluster/apps$ jps
29368 MainGenericRunner
29510 Jps
22885 Main
29210 NodeManager
28952 NameNode
29158 ResourceManager
29023 DataNode

提交做業

  1. PI:
  • yarn-cluster模式:scala

    /home/cluster/apps/spark/spark-1.4.1/bin/spark-submit --master yarn-cluster --executor-memory 3g   --driver-memory 1g  --class org.apache.spark.examples.SparkPi /home/cluster/apps/spark/spark-1.4.1/examples/target/scala-2.10/spark-examples-1.4.1-hadoop2.3.0-cdh5.1.0.jar  10
  • yarn-client模式:

    /home/cluster/apps/spark/spark-1.4.1/bin/spark-submit --master yarn-client --executor-memory 3g   --driver-memory 1g  --class org.apache.spark.examples.SparkPi /home/cluster/apps/spark/spark-1.4.1/examples/target/scala-2.10/spark-examples-1.4.1-hadoop2.3.0-cdh5.1.0.jar  10
  1. wordcount:
  • yarn-cluster模式:

    /home/cluster/apps/spark/spark-1.4.1/bin/spark-submit --master yarn-cluster --executor-memory 3g   --driver-memory 1g  --class org.apache.spark.examples.JavaWordCount /home/cluster/apps/spark/spark-1.4.1/examples/target/scala-2.10/spark-examples-1.4.1-hadoop2.3.0-cdh5.1.0.jar /data/hadoop/wordcount/
  • yarn-client模式:

    /home/cluster/apps/spark/spark-1.4.1/bin/spark-submit --master yarn-client --executor-memory 3g   --driver-memory 1g  --class org.apache.spark.examples.JavaWordCount /home/cluster/apps/spark/spark-1.4.1/examples/target/scala-2.10/spark-examples-1.4.1-hadoop2.3.0-cdh5.1.0.jar /data/hadoop/wordcount/

    結果截圖

    這裏寫圖片描述
    這四條記錄從下往上看,分別是PI:yarn-cluster模式,PI:yarn-client模式,wordcount:yarn-cluster模式,wordcount:yarn-client模式

尊重原創,拒絕轉載
http://blog.csdn.net/stark_summer/article/details/48661317

相關文章
相關標籤/搜索