Coding With Fun
Home Docker Django Node.js Articles Python pip guide FAQ Policy

Spark GraphX diagram constructor


May 17, 2021 Spark Programming guide


Table of contents


Spark GraphX diagram constructor

GraphX provides several ways to construct diagrams from RDDs or vertes and edge collections on disk. B y default, no graph constructor re-partitions the edges of the graph, but leaves them in the default partition (for example, their original blocks in HDFS). Graph.groupEdges: Graph (VD, ED) needs to re-partition the graph because it assumes that the same edges will be assigned to the same partition, so you must call Graph.partitionBy before you call groupEdges

object GraphLoader {
  def edgeListFile(
      sc: SparkContext,
      path: String,
      canonicalOrientation: Boolean = false,
      minEdgePartitions: Int = 1)
    : Graph[Int, Int]
}

GraphLoader.edgeListFile provides a way to load a diagram from the list of edges on the disk. It resolves the connection table in the following form (source vertred ID, target vertred ID), skipping the # that begins with .

# This is a comment
2 1
4 1
1 2

It creates a graph from the specified edge, automatically creating all the vertes mentioned by the edge. A ll verte and edge properties default to 1. canonicalOrientation allows you to redirect the edges of the positive direction (srcId slt; dstId). T his is required in the connected components algorithm. minEdgePartitions the minimum number of edge partitions generated. Edge partitions may contain more partitions than specified, for example, an HDFS file contains more blocks.

object Graph {
  def apply[VD, ED](
      vertices: RDD[(VertexId, VD)],
      edges: RDD[Edge[ED]],
      defaultVertexAttr: VD = null)
    : Graph[VD, ED]
  def fromEdges[VD, ED](
      edges: RDD[Edge[ED]],
      defaultValue: VD): Graph[VD, ED]
  def fromEdgeTuples[VD](
      rawEdges: RDD[(VertexId, VertexId)],
      defaultValue: VD,
      uniqueEdges: Option[PartitionStrategy] = None): Graph[VD, Int]
}

Graph.apply (ClassTag (VD), ClassTag (ED)): Graph (VD, ED)) allows you to create a diagram from the RDD of the vertes and edges. Repeated vertes can be selected arbitrarily, assigning default properties in edge RDD instead of vertes found in vertes RDD.

Graph.fromEdges allows you to create a graph from just one edge RDD, which automatically creates edge-mentioned vertocies and assigns the default values for those vertes.

Graph.fromEdgeTuples (ClassTag(VD)): Graph (VD, Int)) allows you to create a diagram from only one side tuple on the RDD. T he value assigned to the edge is 1. I t automatically creates edge-mentioned vertocies and assigns the default values for those vertes. I t also supports the removal of edges. I n order to remove edges, you need to pass a PartitionStrategy-valued Some as uniqueEdges parameter (such as uniqueEdges s Some (PartitionStrategy.RandomVertexCut). Assign the same edges to the same partition so that they can be deleted, and a partitioning policy is required.