May 17, 2021 Spark Programming guide
Adjusting memory usage and garbage collection behavior for Spark applications are detailed in the Spark Optimization Guide. In this section, we highlight a few highly recommended customization options that reduce the associated pauses associated with Spark Streaming application garbage collection for a more stable batch time.
StorageLevel.MEMORY_ONLY_SER
and RDD
StorageLevel.MEMORY_ONLY
Even if saving the data as a serialized pattern increases the cost of serialization/deseration, the pause for garbage collection can be significantly reduced.
spark.streaming.unpersist
for true to make the unpersist RDD smarter. T
his configuration enables the system to identify RDDs that do not need to be maintained frequently and then de-persist them.
This can reduce the memory usage of Spark RDD and may also improve garbage collection behavior.