GraphX圖的建立

如何構建圖

GraphX構建圖的方式很簡單,分爲3步:app

  1. 構建邊RDD
  2. 構建頂點RDD
  3. 生成Graph對象
val myVertices: RDD[(Long, String)] = spark.sparkContext.makeRDD(Array((1L, "a"), (2L, "b"), (3L, "c"), (4L, "d"), (5L, "e")))
val myEdges: RDD[Edge[String]] = spark.sparkContext.makeRDD(Array(Edge(1L, 2L, "is-friends-with"), Edge(2L, 3L, "is-friends-with"),
      Edge(3L, 4L, "is-friends-with"), Edge(4L, 5L, "Likes-status"), Edge(3L, 5L, "Wrote-status")))
val myGraph: Graph[String, String] = Graph(myVertices, myEdges)

Graph伴生對象中定義了apply方法,所以代碼Graph(myVertices, myEdges)其實是調用Graph.apply(),源碼以下,this

def apply[VD: ClassTag, ED: ClassTag](
      vertices: RDD[(VertexId, VD)],
      edges: RDD[Edge[ED]],
      defaultVertexAttr: VD = null.asInstanceOf[VD],
      edgeStorageLevel: StorageLevel = StorageLevel.MEMORY_ONLY,
      vertexStorageLevel: StorageLevel = StorageLevel.MEMORY_ONLY): Graph[VD, ED] = {
    GraphImpl(vertices, edges, defaultVertexAttr, edgeStorageLevel, vertexStorageLevel)
  }

vertices

在GraphX中,對於一個構建好的圖,調用vertices,能夠返回圖的頂點集VertexRDDspa

myGraph.vertices.collect().foreach(println)

(4,d)
(1,a)
(5,e)
(2,b)
(3,c)

VertexRDD源碼以下,scala

abstract class VertexRDD[VD](
    sc: SparkContext,
    deps: Seq[Dependency[_]]) extends RDD[(VertexId, VD)](sc, deps)

VertexRDD[VD]繼承自RDD[(VertexId, VD)],其中VertexId表示頂點Id,GraphX將VertexId定義爲64位的Long類型(type VertexId = Long),VD則表示頂點屬性的類型。3d

edges

在GraphX中,對於一個構建好的圖,調用edges,能夠返回圖的邊集EdgeRDDcode

myGraph.edges.collect().foreach(println)

Edge(1,2,is-friends-with)
Edge(2,3,is-friends-with)
Edge(3,4,is-friends-with)
Edge(3,5,Wrote-status)
Edge(4,5,Likes-status)

EdgeRDD源碼以下,對象

abstract class EdgeRDD[ED](
    sc: SparkContext,
    deps: Seq[Dependency[_]]) extends RDD[Edge[ED]](sc, deps)

EdgeRDD[ED]繼承自RDD[Edge[ED]],其中ED表示邊的屬性類型,Edge[ED]的源碼以下,它用來表示一條包含源頂點、目標頂點和邊屬性的有向邊。blog

case class Edge[@specialized(Char, Int, Boolean, Byte, Long, Float, Double) ED] (
    var srcId: VertexId = 0,
    var dstId: VertexId = 0,
    var attr: ED = null.asInstanceOf[ED])
  extends Serializable

其中,ED爲邊屬性的類型,srcId表示源頂點的VertexId,dstId表示目的頂點的VertexId,attr爲邊的屬性值。繼承

triplets

對於一個構建好的圖,調用triplets返回RDD[EdgeTriplet[VD, ED]],RDD的類型爲EdgeTriplet[VD, ED],ip

myGraph.triplets.collect().foreach(println)

((1,a),(2,b),is-friends-with)
((2,b),(3,c),is-friends-with)
((3,c),(4,d),is-friends-with)
((3,c),(5,e),Wrote-status)
((4,d),(5,e),Likes-status)

EdgeTriplet源碼以下,

class EdgeTriplet[VD, ED] extends Edge[ED] {
  /**
   * The source vertex attribute
   */
  var srcAttr: VD = _ // nullValue[VD]
  /**
   * The destination vertex attribute
   */
  var dstAttr: VD = _ // nullValue[VD]
  /**
   * Set the edge properties of this triplet.
   */
  protected[spark] def set(other: Edge[ED]): EdgeTriplet[VD, ED] = {
    srcId = other.srcId
    dstId = other.dstId
    attr = other.attr
    this
  }
  ...
}

EdgeTriplet[VD, ED]繼承自Edge[ED],其中,VD爲頂點的屬性類型,ED爲邊的屬性類型。此外還增長了兩個成員變量srcAttrdstAttr,分別爲源頂點和目標頂點的屬性值。

GraphX中Graph類及其依賴的UML圖以下,
GraphX的UML.png

參考

  1. Spark GraphX in Action,Michael S. Malak.
  2. https://www.bookstack.cn/read...
相關文章
相關標籤/搜索