寫一個小小的Demo測試一下Spark提交程序的流程java
Maven的pom文件node
<properties> <maven.compiler.source>1.7</maven.compiler.source> <maven.compiler.target>1.7</maven.compiler.target> <encoding>UTF-8</encoding> <spark.version>1.6.1</spark.version> </properties> <dependencies> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.10</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>redis.clients</groupId> <artifactId>jedis</artifactId> <version>2.7.1</version> </dependency> </dependencies> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <configuration> <source>1.7</source> <target>1.7</target> </configuration> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-shade-plugin</artifactId> <version>2.4.3</version> <executions> <execution> <phase>package</phase> <goals> <goal>shade</goal> </goals> <configuration> <filters> <filter> <artifact>*:*</artifact> <excludes> <exclude>META-INF/*.SF</exclude> <exclude>META-INF/*.DSA</exclude> <exclude>META-INF/*.RSA</exclude> </excludes> </filter> </filters> </configuration> </execution> </executions> </plugin> </plugins> </build>
編寫一個蒙特卡羅求PI的代碼redis
import java.util.ArrayList; import java.util.List; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.api.java.function.Function; import org.apache.spark.api.java.function.Function2; import redis.clients.jedis.Jedis; /** * Computes an approximation to pi * Usage: JavaSparkPi [slices] */ public final class JavaSparkPi { public static void main(String[] args) throws Exception { SparkConf sparkConf = new SparkConf().setAppName("JavaSparkPi")/*.setMaster("local[2]")*/; JavaSparkContext jsc = new JavaSparkContext(sparkConf); Jedis jedis = new Jedis("192.168.49.151",19000); int slices = (args.length == 1) ? Integer.parseInt(args[0]) : 2; int n = 100000 * slices; List<Integer> l = new ArrayList<Integer>(n); for (int i = 0; i < n; i++) { l.add(i); } JavaRDD<Integer> dataSet = jsc.parallelize(l, slices); int count = dataSet.map(new Function<Integer, Integer>() { @Override public Integer call(Integer integer) { double x = Math.random() * 2 - 1; double y = Math.random() * 2 - 1; return (x * x + y * y < 1) ? 1 : 0; } }).reduce(new Function2<Integer, Integer, Integer>() { @Override public Integer call(Integer integer, Integer integer2) { return integer + integer2; } }); jedis.set("Pi", String.valueOf(4.0 * count / n)); System.out.println("Pi is roughly " + 4.0 * count / n); jsc.stop(); } }
前提條件的setMaster("local[2]") 沒有在代碼中hard codeapache
本地模式測試狀況:# Run application locally on 8 cores
spark-submit \
--master local[8] \
--class com.spark.test.JavaSparkPi \
--executor-memory 4g \
--executor-cores 4 \
/home/dinpay/test/Spark-SubmitTest.jar 100
運行結果在本地:運行在本地一塊兒提交8個Task,不會在WebUI的8080端口上看見提交的任務
api
-------------------------------------
spark-submit \
--master local[8] \
--class com.spark.test.JavaSparkPi \
--executor-memory 8G \
--total-executor-cores 8 \
hdfs://192.168.46.163:9000/home/test/Spark-SubmitTest.jar 100 app
運行報錯:java.lang.ClassNotFoundException: com.spark.test.JavaSparkPidom
------------------------------------maven
spark-submit \
--master local[8] \
--deploy-mode cluster \
--supervise \
--class com.spark.test.JavaSparkPi \
--executor-memory 8G \
--total-executor-cores 8 \
/home/dinpay/test/Spark-SubmitTest.jar 100 ide
運行報錯:Error: Cluster deploy mode is not compatible with master "local"oop
====================================================================
Standalone模式client模式 # Run on a Spark standalone cluster in client deploy mode
spark-submit \
--master spark://hadoop-namenode-02:7077 \
--class com.spark.test.JavaSparkPi \
--executor-memory 8g \
--tital-executor-cores 8 \
/home/dinpay/test/Spark-SubmitTest.jar 100
運行結果以下:
-------------------------------------------
spark-submit \
--master spark://hadoop-namenode-02:7077 \
--class com.spark.test.JavaSparkPi \
--executor-memory 4g \
--executor-cores 4g \
hdfs://192.168.46.163:9000/home/test/Spark-SubmitTest.jar 100
運行報錯:java.lang.ClassNotFoundException: com.spark.test.JavaSparkPi
=======================================================================
standalone模式下的cluster模式 # Run on a Spark standalone cluster in cluster deploy mode with supervise
spark-submit \
--master spark://hadoop-namenode-02:7077 \
--class com.spark.test.JavaSparkPi \
--deploy-mode cluster \
--supervise \
--executor-memory 4g \
--executor-cores 4 \
/home/dinpay/test/Spark-SubmitTest.jar 100
運行報錯:java.io.FileNotFoundException: /home/dinpay/test/Spark-SubmitTest.jar (No such file or directory)
-------------------------------------------
spark-submit \
--master spark://hadoop-namenode-02:7077 \
--class com.spark.test.JavaSparkPi \
--deploy-mode cluster \
--supervise \
--driver-memory 4g \
--driver-cores 4 \
--executor-memory 2g \
--total-executor-cores 4 \
hdfs://192.168.46.163:9000/home/test/Spark-SubmitTest.jar 100
運行結果以下:
=============================================
若是代碼中寫定了.setMaster("local[2]");
則提交的集羣模式也會運行driver,可是不會有對應的application並行運行
spark-submit --deploy-mode cluster \--master spark://hadoop-namenode-02:6066 \--class com.dinpay.bdp.rcp.service.Window12HzStat \--driver-memory 2g \--driver-cores 2 \--executor-memory 1g \--total-executor-cores 2 \hdfs://192.168.46.163:9000/home/dinpay/RCP-HZ-TASK-0.0.1-SNAPSHOT.jar若是代碼中限定了.setMaster("local[2]");則提交方式仍是本地模式,會找一臺worker進行本地化運行任務