示例: spark-submit [--option value] <application jar> [application arguments]
參數名稱html |
含義java |
--master MASTER_URLshell |
yarnapache |
--deploy-mode DEPLOY_MODEapp |
Driver程序運行的地方:client、clusteride |
--class CLASS_NAMEui |
The FQCN of the class containing the main method of the application. this For example, org.apache.spark.examples.SparkPi.spa
應用程序主類名稱,含包名code |
--name NAME |
應用程序名稱 |
--jars JARS |
Driver和Executor依賴的第三方jar包 |
--properties-file FILE |
應用程序屬性的文件路徑,默認是conf/spark-defaults.conf |
如下設置Driver |
|
--driver-cores NUM |
Driver程序使用的CPU核數(只用於cluster),默認爲1 |
--driver-memory MEM |
Driver程序使用內存大小 |
--driver-library-path |
Driver程序的庫路徑 |
--driver-class-path |
Driver程序的類路徑 |
--driver-java-options |
|
如下設置Executor |
|
--num-executors NUM |
The total number of YARN containers to allocate for this application. Alternatively, you can use the spark.executor.instances configuration parameter.
啓動的executor的數量,默認爲2 |
--executor-cores NUM |
Number of processor cores to allocate on each executor
每一個executor使用的CPU核數,默認爲1 |
--executor-memory MEM |
The maximum heap size to allocate to each executor. Alternatively, you can use the spark.executor.memory configuration parameter.
每一個executor內存大小,默認爲1G |
--queue QUEUE_NAME |
The YARN queue to submit to.
提交應用程序給哪一個YARN的隊列,默認是default隊列 |
--archives ARCHIVES |
|
--files FILES |
用逗號隔開的要放置在每一個executor工做目錄的文件列表
|
1.部署模式概述
2.部署模式:Cluster
In cluster mode, the driver runs in the ApplicationMaster on a cluster host chosen by YARN.
This means that the same process, which runs in a YARN container, is responsible for both driving the application and requesting resources from YARN.
The client that launches the application doesn't need to continue running for the entire lifetime of the application.
Cluster mode is not well suited to using Spark interactively.
Spark applications that require user input, such as spark-shell and pyspark, need the Spark driver to run inside the client process that initiates the Spark application.
3.部署模式:Client
In client mode, the driver runs on the host where the job is submitted.
The ApplicationMaster is merely present to request executor containers from YARN.
The client communicates with those containers to schedule work after they start:
4.參考文檔:
https://www.cloudera.com/documentation/enterprise/5-4-x/topics/cdh_ig_running_spark_on_yarn.html
http://spark.apache.org/docs/1.3.0/running-on-yarn.html