GraphX構建圖的方式很簡單,分爲3步:app
val myVertices: RDD[(Long, String)] = spark.sparkContext.makeRDD(Array((1L, "a"), (2L, "b"), (3L, "c"), (4L, "d"), (5L, "e"))) val myEdges: RDD[Edge[String]] = spark.sparkContext.makeRDD(Array(Edge(1L, 2L, "is-friends-with"), Edge(2L, 3L, "is-friends-with"), Edge(3L, 4L, "is-friends-with"), Edge(4L, 5L, "Likes-status"), Edge(3L, 5L, "Wrote-status"))) val myGraph: Graph[String, String] = Graph(myVertices, myEdges)
Graph
伴生對象中定義了apply
方法,所以代碼Graph(myVertices, myEdges)其實是調用Graph.apply()
,源碼以下,this
def apply[VD: ClassTag, ED: ClassTag]( vertices: RDD[(VertexId, VD)], edges: RDD[Edge[ED]], defaultVertexAttr: VD = null.asInstanceOf[VD], edgeStorageLevel: StorageLevel = StorageLevel.MEMORY_ONLY, vertexStorageLevel: StorageLevel = StorageLevel.MEMORY_ONLY): Graph[VD, ED] = { GraphImpl(vertices, edges, defaultVertexAttr, edgeStorageLevel, vertexStorageLevel) }
在GraphX中,對於一個構建好的圖,調用vertices
,能夠返回圖的頂點集VertexRDD
,spa
myGraph.vertices.collect().foreach(println) (4,d) (1,a) (5,e) (2,b) (3,c)
VertexRDD
源碼以下,scala
abstract class VertexRDD[VD]( sc: SparkContext, deps: Seq[Dependency[_]]) extends RDD[(VertexId, VD)](sc, deps)
VertexRDD[VD]
繼承自RDD[(VertexId, VD)]
,其中VertexId
表示頂點Id,GraphX將VertexId定義爲64位的Long類型(type VertexId = Long
),VD
則表示頂點屬性的類型。3d
在GraphX中,對於一個構建好的圖,調用edges
,能夠返回圖的邊集EdgeRDD
,code
myGraph.edges.collect().foreach(println) Edge(1,2,is-friends-with) Edge(2,3,is-friends-with) Edge(3,4,is-friends-with) Edge(3,5,Wrote-status) Edge(4,5,Likes-status)
EdgeRDD
源碼以下,對象
abstract class EdgeRDD[ED]( sc: SparkContext, deps: Seq[Dependency[_]]) extends RDD[Edge[ED]](sc, deps)
EdgeRDD[ED]
繼承自RDD[Edge[ED]]
,其中ED
表示邊的屬性類型,Edge[ED]
的源碼以下,它用來表示一條包含源頂點、目標頂點和邊屬性的有向邊。blog
case class Edge[@specialized(Char, Int, Boolean, Byte, Long, Float, Double) ED] ( var srcId: VertexId = 0, var dstId: VertexId = 0, var attr: ED = null.asInstanceOf[ED]) extends Serializable
其中,ED
爲邊屬性的類型,srcId
表示源頂點的VertexId,dstId
表示目的頂點的VertexId,attr
爲邊的屬性值。繼承
對於一個構建好的圖,調用triplets
返回RDD[EdgeTriplet[VD, ED]]
,RDD的類型爲EdgeTriplet[VD, ED],ip
myGraph.triplets.collect().foreach(println) ((1,a),(2,b),is-friends-with) ((2,b),(3,c),is-friends-with) ((3,c),(4,d),is-friends-with) ((3,c),(5,e),Wrote-status) ((4,d),(5,e),Likes-status)
EdgeTriplet
源碼以下,
class EdgeTriplet[VD, ED] extends Edge[ED] { /** * The source vertex attribute */ var srcAttr: VD = _ // nullValue[VD] /** * The destination vertex attribute */ var dstAttr: VD = _ // nullValue[VD] /** * Set the edge properties of this triplet. */ protected[spark] def set(other: Edge[ED]): EdgeTriplet[VD, ED] = { srcId = other.srcId dstId = other.dstId attr = other.attr this } ... }
EdgeTriplet[VD, ED]
繼承自Edge[ED]
,其中,VD
爲頂點的屬性類型,ED
爲邊的屬性類型。此外還增長了兩個成員變量srcAttr
和dstAttr
,分別爲源頂點和目標頂點的屬性值。
GraphX中Graph
類及其依賴的UML圖以下,