在spark1.0中推出spark-submit來統一提交applicaiton數據庫
./bin/spark-submit \ --class <main-class> --master <master-url> \ --deploy-mode <deploy-mode> \ ... # other options <application-jar> \ [application-arguments]
--class:application的入口點;app
--master:集羣的master url;oop
--deploy-mode:driver在集羣中的部署模式;ui
application-jar:application代碼的jar包, 能夠放在HDFS上,也能夠放在本地文件系統上;url
standalone模式案例:spa
spark-submit \ --name SparkSubmit_Demo \ --class com.luogankun.spark.WordCount \ --master spark://hadoop000:7077 \ --executor-memory 1G \ --total-executor-cores 1 \ /home/spark/data/spark.jar \ hdfs://hadoop000:8020/hello.txt
須要在master中設置spark集羣的master地址;日誌
yarn-client模式案例:code
spark-submit \ --name SparkSubmit_Demo \ --class com.luogankun.spark.WordCount \ --master yarn-client \ --executor-memory 1G \ --total-executor-cores 1 \ /home/spark/data/spark.jar \ hdfs://hadoop000:8020/hello.txt
yarn-cluster模式案例:blog
spark-submit \ --name SparkSubmit_Demo \ --class com.luogankun.spark.WordCount \ --master yarn-cluster \ --executor-memory 1G \ --total-executor-cores 1 \ /home/spark/data/spark.jar \ hdfs://hadoop000:8020/hello.txt
注:提交yarn上執行須要配置HADOOP_CONF_DIRhadoop
yarn-client和yarn-cluser的區別:以Driver的位置來區分
yarn-client:
Client和Driver運行在一塊兒,ApplicationMaster只用來獲取資源;結果實時輸出在客戶端控制檯上,能夠方便的看到日誌信息,推薦使用該模式;
提交到yarn後,yarn先啓動ApplicationMaster和Executor,二者都是運行在Container中。注意:一個container中只運行一個executorbackend;
yarn-cluser:
Driver和ApplicationMaster運行在一塊兒,因此運行結果不能在客戶端控制檯顯示,須要將結果須要存放在HDFS或者寫到數據庫中;
driver在集羣上運行,可經過ui界面訪問driver的狀態。