Spark SQL編程之DataSet篇

             Spark SQL編程之DataSetes6

                                     做者:尹正傑sql

版權聲明:原創做品,謝絕轉載!不然將追究法律責任。apache

 

 

 

一.建立DataSet編程

  舒適提示:
    Dataset是具備強類型的數據集合,須要提供對應的類型信息。下面是具體案例。


scala> case class Person(name: String, age: Long)            #建立一個樣例類
defined class Person

scala> val caseClassDS = Seq(Person("YinZhengjie", 18)).toDS()    #建立DataSet
caseClassDS: org.apache.spark.sql.Dataset[Person] = [name: string, age: bigint]

scala> caseClassDS.show                            #不難發現DataSet的方法和DataFrame的方法使用上很類似。
+-----------+---+
|       name|age|
+-----------+---+
|YinZhengjie| 18|
+-----------+---+


scala> caseClassDS.createTempView("person")

scala> spark.sql("select * from person").show
+-----------+---+
|       name|age|
+-----------+---+
|YinZhengjie| 18|
+-----------+---+


scala> 

 

二.RDD轉換爲DataSetspa

scala> case class Person(name: String, age: Long)            #建立一個樣例類
defined class Person

scala> val listRDD = sc.makeRDD(List(("YinZhengjie",18),("Jason Yin",20),("Danny",28)))      #建立一個RDD
listRDD: org.apache.spark.rdd.RDD[(Int, String, Int)] = ParallelCollectionRDD[84] at makeRDD at <console>:27

scala> val mapRDD = listRDD.map( t => { Person( t._1,t._2) })    #使用map算子將listRDD各元素轉換成Person對象
mapRDD: org.apache.spark.rdd.RDD[Person] = MapPartitionsRDD[102] at map at <console>:30

scala> val ds = mapRDD.toDS                        #將rdd轉換爲DataSet
ds: org.apache.spark.sql.Dataset[Person] = [name: string, age: bigint]

scala> ds.show
+-----------+---+
|       name|age|
+-----------+---+
|YinZhengjie| 18|
|  Jason Yin| 20|
|      Danny| 28|
+-----------+---+


scala> 

 

三.DataSet轉換爲RDDscala

scala> ds.show      #查看DataSet數據
+-----------+---+
|       name|age|
+-----------+---+
|YinZhengjie| 18|
|  Jason Yin| 20|
|      Danny| 28|
+-----------+---+


scala> ds
res6: org.apache.spark.sql.Dataset[Person] = [name: string, age: bigint]

scala> ds.rdd        #將DataSet轉換成RDD
res7: org.apache.spark.rdd.RDD[Person] = MapPartitionsRDD[26] at rdd at <console>:29

scala> res7.collect     #查看RDD的數據
res8: Array[Person] = Array(Person(YinZhengjie,18), Person(Jason Yin,20), Person(Danny,28))

scala> 
相關文章
相關標籤/搜索