May 17, 2021 Spark Programming guide
In order to initialize the Spark Streaming program, a StreamingContext object must be created, which is the primary entry point for all Spark Streaming flow operations. A StreamingContext object can be created with the SparkConf object.
import org.apache.spark._
import org.apache.spark.streaming._
val conf = new SparkConf().setAppName(appName).setMaster(master)
val ssc = new StreamingContext(conf, Seconds(1))
appName
the name that your application displays on the cluster
master
is
a Spark, Mesos, YARN
cluster URL, or a special string, "local", which means that the program is running in local mode. W
hen the program is running in a cluster, you don't want to hardcode master in
master
you want to start the application with
spark-submit
and get the value of master from
spark-submit
master
F
or local or unit tests, you can pass the "local" string to run Spark Streaming within the same process.
Note that it creates a SparkContext object internally, which you can access through
ssc.sparkContext
Batch time slices need to be set based on the potential needs of your program and the resources available in the cluster, and you can get detailed information in the Performance Tuning section.
You can create
SparkContext
an
StreamingContext
object.
import org.apache.spark.streaming._
val sc = ... // existing SparkContext
val ssc = new StreamingContext(sc, Seconds(1))
When a context is defined, you must follow these steps
streamingContext.start()
method to receive and process data;
streamingContext.stop()
is called.
A few points to note:
stop()
also closes the SparkContext object.
If you only want to close the StreamingContext object, set the optional parameter of
stop()
to false