今天工做時猛然發現,spark1.6.1和spark1.3.1的區別,真是開源的說該就改了啊!!!java
<dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_2.10</artifactId> <version>1.6.1</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_2.10</artifactId> <version>1.3.1</version> </dependency>
在1.6.1版本中spark-sql已經添加了DataFrameReader、DataFrameWriter,然而在1.3.1中並無這兩個,這是否是個重大的發現呢各位,來來來看看怎麼用的吧,一塊兒研究研究咱們!mysql
var rdd = sqlContext.sql("select * from mydb.t_user")
rdd: org.apache.spark.sql.DataFrame = [id: int, name: string]
rdd.show()
rdd.write.json("/upload/data_json")sql
讀取data_json的數據
val dataFrame = sqlContext.read.json("/upload/data_json/part*")
dataFrame: org.apache.spark.sql.DataFrame = [id: bigint, name: string]數據庫
將dataFrame註冊成表
val f1 = dataFrame.registerTempTable("tb_user")
寫入到本地文件系統
f1.write.text("/upload/data_text")apache
寫入到另外一張表
f1.write.saveAsTable("temp_user")
向表裏面追加數據
f1.write.insertInto("temp_user")json
寫入到關係型數據庫
val prop = new java.util.Properties();
prop.put("user", "root")
prop.put("password", "123456")
//將dataFrame中的數據寫入到關係型數據庫
f1.write.jdbc("jdbc:mysql://localhost:3306/hibernate","t_user",prop)spa
讀取關係型數據庫的數據
建立dataFrame
經過sqlContext對象的jdbc方法將關係型數據庫的數據裝載到dataFrame
val user_rdd = sqlContext.read.jdbc("jdbc:mysql://localhost:3306/hibernate","t_user",prop)
user_rdd: org.apache.spark.sql.DataFrame = [id: int, username: string, password: string, email: string, birthday: timestamp]hibernate