May 17, 2021 Spark Programming guide
The first step in Spark programming is to create a
SparkContext object
that tells Spark how to access the cluster.
Before you
SparkContext
you need to
build a SparkConf
object that contains some information about your application.
val conf = new SparkConf().setAppName(appName).setMaster(master)
new SparkContext(conf)
appName
parameter is the name of your program and it appears on the cluster UI.
master
is
the URL of a Spark, Mesos, or YARN
cluster, or uses the private string "local" when running in local mode. I
n practice, when an application runs on a cluster, you
master
into your program, you can pass it when you start your application with
spark-submit.
However, you can run the Spark process using "local" in local tests and unit tests.
In Spark shell, there is a proprietary SparkContext that has been created for you. I
t is called
sc
S
parkContext, which you created yourself, won't work. Y
ou
--master
parameter to set up the cluster that SparkContext
--jars
the JAR packages that need to be added to classpath, if there are multiple JAR packages that connect them using comma splits.
For example, run
bin/spark-shell
using:
$ ./bin/spark-shell --master local[4]
Or add a code in
code.jar
to use:
$ ./bin/spark-shell --master local[4] --jars code.jar
Perform
spark-shell --help
complete list of options.
After that, calling
spark-shell
more
common than spark-submit
scripts.