Currently there are 3 modes, Standalone, Mesos, YARNpython
First, place Spark on each machine, to do this, can use Shell script, Docker
to use Shell script,web
to launch manually, start the master server by ./sbin/start-master.sh
Once started, the master will print out a spark://HOST:PORT URL
for itself, workers can connect to this URL
For each worker, connect to master through command ./sbin/start-slave.sh <master-spark-URL>
app
We can go to the master web UI http://localhost:8080 to check current statusui
go to this link:this
pass the spark://IP:PORT
URL of the master to SparkContext()
constructorurl
conf = SparkConf().setAppName(appName).setMaster(masterURL) sc = SparkContext(conf=conf)
alternatively, launching with spark-submit is recommendedspa
After the cluster is installed, the app can be submitted to the cluster,
we can use spark-submit script to submit, general format iscode
./bin/spark-submit \ --class <main-class> \ --master <master-url> \ --deploy-mode <deploy-mode> \ --conf <key>=<value> \ ... # other options <application-jar> \ [application-arguments]
This example shows how to launch a program by spark-submitorm
# Run a Python application on a Spark standalone cluster ./bin/spark-submit \ --master spark://207.184.161.138:7077 \ examples/src/main/python/pi.py \ 1000 # pi.py if __name__ == "__main__": spark = SparkSession.builder.appName("PythonPi").getOrCreate() count = spark.sparkContext.parallelize(range(1, n + 1)).map(f).reduce(add)