spark-submit 提交Application

 在spark1.0中推出spark-submit來統一提交applicaiton數據庫

./bin/spark-submit \
  --class <main-class>
  --master <master-url> \
  --deploy-mode <deploy-mode> \
  ... # other options
  <application-jar> \
  [application-arguments]

 

--class:application的入口點;app

--master:集羣的master url;oop

--deploy-mode:driver在集羣中的部署模式;ui

application-jar:application代碼的jar包, 能夠放在HDFS上,也能夠放在本地文件系統上;url

 

standalone模式案例:spa

spark-submit \
--name SparkSubmit_Demo \
--class com.luogankun.spark.WordCount \
--master spark://hadoop000:7077 \
--executor-memory 1G \
--total-executor-cores 1 \
/home/spark/data/spark.jar \
hdfs://hadoop000:8020/hello.txt

 

須要在master中設置spark集羣的master地址;日誌

 

yarn-client模式案例:code

spark-submit \
--name SparkSubmit_Demo \
--class com.luogankun.spark.WordCount \
--master yarn-client \
--executor-memory 1G \
--total-executor-cores 1 \
/home/spark/data/spark.jar \
hdfs://hadoop000:8020/hello.txt

 

yarn-cluster模式案例:blog

spark-submit \
--name SparkSubmit_Demo \
--class com.luogankun.spark.WordCount \
--master yarn-cluster \
--executor-memory 1G \
--total-executor-cores 1 \
/home/spark/data/spark.jar \
hdfs://hadoop000:8020/hello.txt

 

注:提交yarn上執行須要配置HADOOP_CONF_DIRhadoop

 

yarn-client和yarn-cluser的區別:以Driver的位置來區分

yarn-client:

  Client和Driver運行在一塊兒,ApplicationMaster只用來獲取資源;結果實時輸出在客戶端控制檯上,能夠方便的看到日誌信息,推薦使用該模式;

  提交到yarn後,yarn先啓動ApplicationMaster和Executor,二者都是運行在Container中。注意:一個container中只運行一個executorbackend;

yarn-cluser:

  Driver和ApplicationMaster運行在一塊兒,因此運行結果不能在客戶端控制檯顯示,須要將結果須要存放在HDFS或者寫到數據庫中;

  driver在集羣上運行,可經過ui界面訪問driver的狀態。

相關文章
相關標籤/搜索