Coding With Fun
Home Docker Django Node.js Articles Python pip guide FAQ Policy

Initialize StreamingContext


May 17, 2021 Spark Programming guide


Table of contents


Initialize StreamingContext

In order to initialize the Spark Streaming program, a StreamingContext object must be created, which is the primary entry point for all Spark Streaming flow operations. A StreamingContext object can be created with the SparkConf object.

import org.apache.spark._
import org.apache.spark.streaming._
val conf = new SparkConf().setAppName(appName).setMaster(master)
val ssc = new StreamingContext(conf, Seconds(1))

appName the name that your application displays on the cluster master is a Spark, Mesos, YARN cluster URL, or a special string, "local", which means that the program is running in local mode. W hen the program is running in a cluster, you don't want to hardcode master in master you want to start the application with spark-submit and get the value of master from spark-submit master F or local or unit tests, you can pass the "local" string to run Spark Streaming within the same process. Note that it creates a SparkContext object internally, which you can access through ssc.sparkContext

Batch time slices need to be set based on the potential needs of your program and the resources available in the cluster, and you can get detailed information in the Performance Tuning section.

You can create SparkContext an StreamingContext object.

import org.apache.spark.streaming._
val sc = ...                // existing SparkContext
val ssc = new StreamingContext(sc, Seconds(1))

When a context is defined, you must follow these steps

  • Define the input source;
  • Prepare flow calculation instructions;
  • Using streamingContext.start() method to receive and process data;
  • The processing continues until the streamingContext.stop() is called.

A few points to note:

  • Once a context has been started, no new streaming calculator can be established or added to the context.
  • Once a context has stopped, it cannot be restarted
  • In JVM, only one StreamingContext can be active at the same time
  • Calling the stop() method on stop() also closes the SparkContext object. If you only want to close the StreamingContext object, set the optional parameter of stop() to false
  • A SparkContext object can be reused to create multiple StreamingContext objects, provided that the previous StreamingContext is closed (without closing SparkContext) before the later StreamingContext is created.