如下配置適合spark on yarn clusterjava
spark | scala | hadoop |
---|---|---|
2.3.1 | 2.11 | 2.9.1 |
spark節點和yarn nodemanager節點保持一致node
將yarn依賴到spark中web
./build/mvn -Pyarn -Phadoop-2.9 -Dhadoop.version=2.9.1 -DskipTests clean package
scp hadoop/etc/hadoop/core-site.xml spark/conf scp hadoop/etc/hadoop/hdfs-site.xml spark/conf
配置spark-env.shsql
添加hadoop環境變量apache
export HADOOP_HOME=/ddhome/bin/hadoop export HADOOP_CONF_DIR=/ddhome/bin/hadoop/etc/hadoop
hadoop fs -mkdir -p /spark/jars hadoop fs -put spark/jars/*.jar /spark/jars
配置spark-default.xmlbash
添加spark.yarn.jars參數app
spark.yarn.jars hdfs://masters/spark/jars/*.jar
配置spark-env.shoop
添加zookeeper參數ui
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=dddbva:2181,dddbvb:2181,dddcva:2181 -Dspark.deploy.zookeeper.dir=/spark"
spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --driver-memory 1024m --executor-memory 1024m --executor-cores 1 spark/examples/jars/spark-examples_2.11-2.3.1.jar 10
# 只有幾行輸出,看不到詳細的內容,當初覺得是掛了 [root@ddcve hadoop]# spark-sql --master yarn 18/09/04 08:57:57 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/09/04 08:57:58 WARN HiveConf: HiveConf of name hive.server2.webui.port does not exist 18/09/04 08:57:58 WARN HiveConf: HiveConf of name hive.server2.webui.host does not exist 18/09/04 08:58:01 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME. # 解決措施:將日誌級別改爲INFO/DEBUG就能夠看到具體輸出了
ERROR: 2014-08-11 20:10:59,795 INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8030<br> 2014-08-11 20:11:01,838 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)<br> # 解決措施:在spark節點上添加hadoop節點
3 java.net.UnknownHostException: mastersurl
解決措施:複製hdfs core-site.xml, hdfs-site.xml到spark conf目錄
4 找不到spark yarn關聯包, 須要對spark源碼手動編譯yarn模式打包
手動編譯打包
WARN: 2018-09-04 08:59:05 WARN Client:66 - Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.<br> 2018-09-04 08:59:07 INFO Client:54 - Uploading resource file:/tmp/spark-73483914-a54f-4e9e-ad19-5f0326a65c43/__spark_libs__1336817845101923206.zip -> hdfs://masters/user/root/.sparkStaging/application_1535967010469_0004/__spark_libs__1336817845101923206.zip # 解決措施: # spark yarn build ./build/mvn -Pyarn -Phadoop-2.9 -Dhadoop.version=2.9.1 -DskipTests clean package # 將spark/jars目錄下的全部jar包上傳到hdfs hadoop fs -mkdir -p /spark/jars hadoop fs -put jars/*.jar /spark/jars # 複製spark-defaults.conf.template, 添加spark.yarn.jars配置項 cp conf/spark-defaults.conf.template conf/spark-defaults.conf spark.yarn.jars hdfs://masters/spark/jars/*.jar