May 17, 2021 Spark Programming guide
GraphX is a new (alpha) Spark API for the calculation of graphs and parallel diagrams. G raphX extends Spark RDD by introducing The Xbox Part Property Graph: a two-way multiplicity graph with vertes and edge properties. T o support graph computing, GraphX exposes a basic set of functional operations and an optimization of the Pregel API. In addition, GraphX includes a growing collection of graph algorithms and figure builders to simplify graph analysis tasks.
From social networking to language modeling, growing scale and the importance of graphical data have driven the development of many new
graph-parallel
systems such as
Giraph
and GraphLab.
By limiting the types of computations that can be expressed and introducing new techniques to divide and distribute diagrams, these systems can efficiently execute complex graphical
data-parallel
system.
However, this limitation can improve performance, but it is difficult to represent many important stages in a typical graph analysis path (constructing a diagram, modifying its structure, or representing calculations across multiple diagrams). In addition, how we view the data depends on our goals, and the same raw data may have many different views of tables and diagrams.
The conclusion is that there is often a need to be able to move between diagrams and tables.
However, existing
graph-parallel
systems must be formed to migrate and replicate big data and generate a complex programming model.
data- parallel
The purpose of the GraphX project is to unify
graph-parallel
and
data-parallel
into a system that has a unique combination API. G
raphX allows users to think of data as a graph and a collection (RDD) without the need for data movement or replication.
By integrating the latest developments
graph-parallel
system, GraphX is able to optimize the execution of graph operations.