spark安裝

一、下載地址 http://spark.apache.org/downloads.htmlhtml

二、解壓java

tar -zxvf spark-2.4.4-bin-hadoop2.7.tgz -C /opt/module/

 

三、本地模式運行第一個程web

bin/spark-submit --class org.apache.spark.examples.SparkPi --executor-memory 1G --total-executor-cores 2 ./examples/jars/spark-examples_2.11-2.4.4.jar 200
... ...
19/09/05 11:13:27 INFO Executor: Running task 198.0 in stage 0.0 (TID 198)
19/09/05 11:13:27 INFO Executor: Finished task 198.0 in stage 0.0 (TID 198). 824 bytes result sent to driver
19/09/05 11:13:27 INFO TaskSetManager: Starting task 199.0 in stage 0.0 (TID 199, localhost, executor driver, partition 199, PROCESS_LOCAL, 7866 bytes)
19/09/05 11:13:27 INFO TaskSetManager: Finished task 198.0 in stage 0.0 (TID 198) in 6 ms on localhost (executor driver) (199/200)
19/09/05 11:13:27 INFO Executor: Running task 199.0 in stage 0.0 (TID 199)
19/09/05 11:13:27 INFO Executor: Finished task 199.0 in stage 0.0 (TID 199). 781 bytes result sent to driver
19/09/05 11:13:27 INFO TaskSetManager: Finished task 199.0 in stage 0.0 (TID 199) in 9 ms on localhost (executor driver) (200/200)
19/09/05 11:13:27 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
19/09/05 11:13:27 INFO DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 3.129 s
19/09/05 11:13:27 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 3.262553 s
Pi is roughly 3.1416157570807877
19/09/05 11:13:27 INFO SparkUI: Stopped Spark web UI at http://vmhome10.com:4040
19/09/05 11:13:27 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
19/09/05 11:13:27 INFO MemoryStore: MemoryStore cleared
19/09/05 11:13:27 INFO BlockManager: BlockManager stopped
19/09/05 11:13:27 INFO BlockManagerMaster: BlockManagerMaster stopped
19/09/05 11:13:27 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
19/09/05 11:13:27 INFO SparkContext: Successfully stopped SparkContext
19/09/05 11:13:27 INFO ShutdownHookManager: Shutdown hook called
19/09/05 11:13:27 INFO ShutdownHookManager: Deleting directory /tmp/spark-7a49f112-3630-4ef6-b4dc-1c46af32c133
19/09/05 11:13:27 INFO ShutdownHookManager: Deleting directory /tmp/spark-6ee58588-7298-4623-b10b-6310e628060d

參數說明:sql

./bin/spark-submit \
--class <main-class>
--master <master-url> \
--deploy-mode <deploy-mode> \
--conf <key>=<value> \
... # other options
<application-jar> \
[application-arguments]
參數說明:
--master spark://vmhome10.com:7077 指定Master的地址
--class: 你的應用的啓動類 (如 org.apache.spark.examples.SparkPi)
--deploy-mode: 是否發佈你的驅動到worker節點(cluster) 或者做爲一個本地客戶端 (client) (default: client)*
--conf: 任意的Spark配置屬性, 格式key=value. 若是值包含空格,能夠加引號「key=value」 
application-jar: 打包好的應用jar,包含依賴. 這個URL在集羣中全局可見。 好比hdfs:// 共享存儲系統, 若是是 file:// path, 那麼全部的節點的path都包含一樣的jar
application-arguments: 傳給main()方法的參數
--executor-memory 1G 指定每一個executor可用內存爲1G
--total-executor-cores 2 指定每一個executor使用的cup核數爲2個

 

 

四、進入shell編程模式shell

bin/spark-shell
19/09/05 11:42:00 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://vmhome10.com:4040
Spark context available as 'sc' (master = local[*], app id = local-1567654930914).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.4
      /_/
         
Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_201)
Type in expressions to have them evaluated.
Type :help for more information.

 若是啓動spark shell時沒有指定master地址,可是也能夠正常啓動spark shell和執行spark shell中的程序,實際上是啓動了spark的local模式,該模式僅在本機啓動一個進程,沒有與集羣創建聯繫.express

 

帶參數啓動shell:apache

bin/spark-shell \
--master spark://vmhome10.com:7077 \
--executor-memory 1g \
--total-executor-cores 2

 

Spark Shell中已經默認將SparkContext類初始化爲對象sc。用戶代碼若是須要用到,則直接應用sc便可,  sparksession  是sparksql編程

在shell中執行wordcount。session

scala> sc.textFile("/home/hadoop/1.txt").flatMap(_.split(",")).map((_,1)).reduceByKey(_+_).collect
res2: Array[(String, Int)] = Array((192.168.1.1,2), (mytest,1), (wow,5), (1990,1), (xu.dm,4), (192.168.1.3,1), (dnf,4), (sword,2), (192.168.1.2,2), (hdfs,2), (blade,2), (2000,3))
相關文章
相關標籤/搜索