spark-submit提交任務到集羣


1.參數選取

當咱們的代碼寫完,打好jar,就能夠經過bin/spark-submit 提交到集羣,命令以下:html

./bin/spark-submit \  
 --class <main-class> 
 --master <master-url> \ 
 --deploy-mode <deploy-mode> \ 
 --conf <key>=<value> \  
     ... # other options  
 <application-jar> \  
 [application-arguments]

通常狀況下使用上面這幾個參數就夠用了
java

  • --class: The entry point for your application (e.g. org.apache.spark.examples.SparkPi)node

  • --master: The master URL for the cluster (e.g. spark://23.195.26.187:7077)python

  • --deploy-mode: Whether to deploy your driver on the worker nodes (cluster) or locally as an external client (client) (default: clientapache

  • --conf: Arbitrary Spark configuration property in key=value format. For values that contain spaces wrap 「key=value」 in quotes (as shown).app

  • application-jar: Path to a bundled jar including your application and all dependencies. The URL must be globally visible inside of your cluster, for instance, an hdfs:// path or a file:// path that is present on all nodes.ide

  • application-arguments: Arguments passed to the main method of your main class, if anyurl

    對於不一樣的集羣管理,對spark-submit的提交列舉幾個簡單的例子
    spa

# Run application locally on 8 cores

 ./bin/spark-submit \
 --class org.apache.spark.examples.SparkPi \
--master local[8] \ 
  /path/to/examples.jar \
100

 # Run on a Spark standalone cluster in client deploy mode
 
./bin/spark-submit \
 --class org.apache.spark.examples.SparkPi \
--master spark://207.184.161.138:7077 \
--executor-memory 20G \ 
--total-executor-cores 100 \
/path/to/examples.jar \ 
 1000 
 
 # Run on a Spark standalone cluster in cluster deploy mode with supervise 
 # make sure that the driver is automatically restarted if it fails with non-zero exit code
 
 ./bin/spark-submit \
 --class org.apache.spark.examples.SparkPi \
 --master spark://207.184.161.138:7077 \ 
 --deploy-mode cluster
--supervise 
 --executor-memory 20G \
--total-executor-cores 100 \
 /path/to/examples.jar \ 
   1000
   
# Run on a YARN cluster export HADOOP_CONF_DIR=XXX

./bin/spark-submit \ 
 --class org.apache.spark.examples.SparkPi \
--master yarn-cluster \  # can also be `yarn-client` for client mode 
 --executor-memory 20G \ 
  --num-executors 50 \ 
  /path/to/examples.jar \ 
 1000 
 
 # Run a Python application on a Spark standalone cluster
 
./bin/spark-submit \ 
  --master spark://207.184.161.138:7077 \ 
 examples/src/main/python/pi.py \ 
1000

2.具體提交步驟

代碼實現一個簡單的統計scala

public class SimpleSample {
	public static void main(String[] args) {
		String logFile = "/home/bigdata/spark-1.5.1/README.md"; 
		SparkConf conf = new SparkConf().setAppName("Simple Application");
		JavaSparkContext sc = new JavaSparkContext(conf);
		JavaRDD<String> logData = sc.textFile(logFile).cache();

		long numAs = logData.filter(new Function<String, Boolean>() {
			public Boolean call(String s) {
				return s.contains("a");
			}
		}).count();

		long numBs = logData.filter(new Function<String, Boolean>() {
			public Boolean call(String s) {
				return s.contains("b");
			}
		}).count();

		System.out.println("Lines with a: " + numAs + ", lines with b: " + numBs);
	}

}

打成jar

上傳命令

./bin/spark-submit --class cs.spark.SimpleSample --master spark://spark1:7077 /home/jar/spark-test-0.0.1-SNAPSHOT.jar
相關文章
相關標籤/搜索