目錄java
在Java和Scala中,只須要給你的應用添加一個對於spark-core的Maven依賴.python
在Python中,能夠把應用寫成腳本,而後使用Spark自帶的bin/spark-submit腳原本運行.spark-submit會引入Python程序的Spark依賴.使用方式以下所示.
/PATH_TO_SPARK/bin/spark-submit my_python_script.py
shell
SparkConf
對象來配置應用SparkConf
建立一個SparkContext
對象from pyspark import SparkConf, SparkContext conf = SparkConf().setMaster("local").setAppName("My App") sc = SparkContext(conf = conf)
spark-submit spark-app.py
import org.apache.spark.SparkConf import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ val conf = new SparkConf().setMaster("local").setAppName("My App") val sc = new SparkContext(conf)
import org.apache.spark.SparkConf import org.apache.spark.api.java.JavaSparkContext SparkConf conf = new SparkConf().setMaster("local").setAppName("My App"); JavaSparkContext sc = new JavaSparkContext(conf);
上述例子是建立SparkContext的最基本的方法,你只需傳遞兩個參數:apache
建立空白目錄,在新建目錄下,新建文件simpleApp.Scala
,添加以下代碼.api
import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.SparkConf object SimpleApp { def main(args: Array[String]) { val logFile = "README.md" val conf = new SparkConf().setAppName("Simple Application") val sc = new SparkContext(conf) val logData = sc.textFile(logFile, 2).cache() val numAs = logData.filter(line => line.contains("a")).count() val numBs = logData.filter(line => line.contains("b")).count() println("Lines with a: %s, lines with b: %s".format(numAs, numBs)) } }
在新建目錄下,新建文件simple.sbt
,複製以下代碼.bash
name := "Simple Application" version := "1.0" scalaVersion := "2.11.8" libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.0"
scala -version
命令查看scala版本,使用spark-shell
能夠查看spark版本及scala版本,使用:quit
命令退出spark-shellREADME.md
文件中包含a
和b
的行數README.md
放到Spark使用的文件系統的相應位置.好比,若是使用的是HDFS,README.md
應該放在/user/YOUR_USER_NAME/
目錄下, 或者將val logFile = "README.md"
中的文件路徑改成絕對路徑,例如:val logFile = "/user/mint/README.md"
.$ ls simpleApp.scala simple.sbt
$ sbt package [info] Set current project to Simple Project (in build file:/home/public/program/scala/self-cont-app/) [info] Updating {file:/home/public/program/scala/self-cont-app/}self-cont-app... [info] Resolving jline#jline;2.12.1 ... [info] Done updating. [info] Compiling 1 Scala source to /home/public/program/scala/self-cont-app/target/scala-2.11/classes... [info] Packaging /home/public/program/scala/self-cont-app/target/scala-2.11/simple-project_2.11-1.0.jar ... [info] Done packaging. [success] Total time: 11 s, completed Sep 8, 2016 3:12:31 PM
$ spark-submit --class "SimpleApp" --master local[4] ./target/scala-2.11/simple-project_2.11-1.0.jar SLF4J: Class path contains multiple SLF4J bindings. ... Lines with a: 61, lines with b: 27