當咱們的代碼寫完,打好jar,就能夠經過bin/spark-submit
提交到集羣,命令以下:html
./bin/spark-submit \ --class <main-class> --master <master-url> \ --deploy-mode <deploy-mode> \ --conf <key>=<value> \ ... # other options <application-jar> \ [application-arguments]
通常狀況下使用上面這幾個參數就夠用了
java
--class
: The entry point for your application (e.g. org.apache.spark.examples.SparkPi
)node
--master
: The master URL for the cluster (e.g. spark://23.195.26.187:7077
)python
--deploy-mode
: Whether to deploy your driver on the worker nodes (cluster
) or locally as an external client (client
) (default: client
) †apache
--conf
: Arbitrary Spark configuration property in key=value format. For values that contain spaces wrap 「key=value」 in quotes (as shown).app
application-jar
: Path to a bundled jar including your application and all dependencies. The URL must be globally visible inside of your cluster, for instance, an hdfs://
path or a file://
path that is present on all nodes.ide
application-arguments
: Arguments passed to the main method of your main class, if anyurl
對於不一樣的集羣管理,對spark-submit的提交列舉幾個簡單的例子
spa
# Run application locally on 8 cores ./bin/spark-submit \ --class org.apache.spark.examples.SparkPi \ --master local[8] \ /path/to/examples.jar \ 100 # Run on a Spark standalone cluster in client deploy mode ./bin/spark-submit \ --class org.apache.spark.examples.SparkPi \ --master spark://207.184.161.138:7077 \ --executor-memory 20G \ --total-executor-cores 100 \ /path/to/examples.jar \ 1000 # Run on a Spark standalone cluster in cluster deploy mode with supervise # make sure that the driver is automatically restarted if it fails with non-zero exit code ./bin/spark-submit \ --class org.apache.spark.examples.SparkPi \ --master spark://207.184.161.138:7077 \ --deploy-mode cluster --supervise --executor-memory 20G \ --total-executor-cores 100 \ /path/to/examples.jar \ 1000 # Run on a YARN cluster export HADOOP_CONF_DIR=XXX ./bin/spark-submit \ --class org.apache.spark.examples.SparkPi \ --master yarn-cluster \ # can also be `yarn-client` for client mode --executor-memory 20G \ --num-executors 50 \ /path/to/examples.jar \ 1000 # Run a Python application on a Spark standalone cluster ./bin/spark-submit \ --master spark://207.184.161.138:7077 \ examples/src/main/python/pi.py \ 1000
代碼實現一個簡單的統計scala
public class SimpleSample { public static void main(String[] args) { String logFile = "/home/bigdata/spark-1.5.1/README.md"; SparkConf conf = new SparkConf().setAppName("Simple Application"); JavaSparkContext sc = new JavaSparkContext(conf); JavaRDD<String> logData = sc.textFile(logFile).cache(); long numAs = logData.filter(new Function<String, Boolean>() { public Boolean call(String s) { return s.contains("a"); } }).count(); long numBs = logData.filter(new Function<String, Boolean>() { public Boolean call(String s) { return s.contains("b"); } }).count(); System.out.println("Lines with a: " + numAs + ", lines with b: " + numBs); } }
打成jar
上傳命令
./bin/spark-submit --class cs.spark.SimpleSample --master spark://spark1:7077 /home/jar/spark-test-0.0.1-SNAPSHOT.jar