May 17, 2021 Spark Programming guide
GraphX provides several ways to construct diagrams from RDDs or vertes and edge collections on disk. B y default, no graph constructor re-partitions the edges of the graph, but leaves them in the default partition (for example, their original blocks in HDFS). Graph.groupEdges: Graph (VD, ED) needs to re-partition the graph because it assumes that the same edges will be assigned to the same partition, so you must call Graph.partitionBy before you call groupEdges
object GraphLoader {
def edgeListFile(
sc: SparkContext,
path: String,
canonicalOrientation: Boolean = false,
minEdgePartitions: Int = 1)
: Graph[Int, Int]
}
GraphLoader.edgeListFile provides
a way to load a diagram from the list of edges on the disk.
It resolves the connection table in the following form (source vertred ID, target vertred ID), skipping the
#
that begins with .
# This is a comment
2 1
4 1
1 2
It creates a graph from the specified edge, automatically creating all the vertes mentioned by the edge. A
ll verte and edge properties default to 1.
canonicalOrientation
allows you to redirect the edges of the positive direction (srcId slt; dstId). T
his is required in the
connected components
algorithm.
minEdgePartitions
the minimum number of edge partitions generated.
Edge partitions may contain more partitions than specified, for example, an HDFS file contains more blocks.
object Graph {
def apply[VD, ED](
vertices: RDD[(VertexId, VD)],
edges: RDD[Edge[ED]],
defaultVertexAttr: VD = null)
: Graph[VD, ED]
def fromEdges[VD, ED](
edges: RDD[Edge[ED]],
defaultValue: VD): Graph[VD, ED]
def fromEdgeTuples[VD](
rawEdges: RDD[(VertexId, VertexId)],
defaultValue: VD,
uniqueEdges: Option[PartitionStrategy] = None): Graph[VD, Int]
}
Graph.apply (ClassTag (VD), ClassTag (ED)): Graph (VD, ED)) allows you to create a diagram from the RDD of the vertes and edges. Repeated vertes can be selected arbitrarily, assigning default properties in edge RDD instead of vertes found in vertes RDD.
Graph.fromEdges allows you to create a graph from just one edge RDD, which automatically creates edge-mentioned vertocies and assigns the default values for those vertes.
Graph.fromEdgeTuples
(ClassTag(VD)): Graph (VD, Int)) allows you to create a diagram from only one side tuple on the RDD. T
he value assigned to the edge is 1. I
t automatically creates edge-mentioned vertocies and assigns the default values for those vertes. I
t also supports the removal of edges. I
n order to remove edges, you need to pass
a PartitionStrategy-valued
Some
as
uniqueEdges
parameter (such as uniqueEdges s Some (PartitionStrategy.RandomVertexCut).
Assign the same edges to the same partition so that they can be deleted, and a partitioning policy is required.