May 17, 2021 Spark Programming guide
Discrete streaming or DStreams is the basic abstraction provided by Spark Streaming, which represents a continuous stream of data. I t is either an input stream obtained from the source or a processed data stream generated by the input stream through a conversion solver. I nternally, DStreams consists of a series of continuous RDDs. Each RDD in DStreams contains data for a defined time interval, as shown in the following image:
Any operation on DStreams is converted to an operation on the RDD implied by DStreams. I
n the
previous
example, the
flatMap
to each RDD of
lines
which is DStreams, to generate
words
of words, which is DStreams.
The process looks like this:
These implicit RDD conversions are calculated by the Spark engine. T he DStreams operation hides most of the details and provides developers with a higher level of API for added convenience. The details of these operations are discussed in the following sections.