場景:sql
有兩個表,表能夠是文本或Json數據,結構化後分別是Table1(A,B,C)和Table2(C、D、E),兩個表經過C關聯,要求求出D+E之和,並以(A、B、D+E)三列返回apache
解答:json
思路:SparkSQL支持讀取Json建立表,同時建立的表能夠作聯合查詢,相似傳統Sql語句進行關聯查詢和統計分析app
代碼:ui
package study import org.apache.spark.SparkContext import org.apache.spark.sql.SparkSession object TestDataFrame2{ def main(args:Array[String]):Unit={ import org.apache.spark.sql.SparkSession val spark = SparkSession .builder() .master("local[*]") .appName("Spark SQL basic example") .config("spark.some.config.option", "some-value") .getOrCreate() spark.sql("""create table table1 using org.apache.spark.sql.json options(path "F://0002_BigData//Soft//comoceanspark//src//resources//Table1.json")""") spark.sql("""create table table2 using org.apache.spark.sql.json options(path "F://0002_BigData//Soft//comoceanspark//src//resources//Table2.json")""") spark.sql("show tables").show() spark.sql("select A,B,(D+E) as DE from table1 inner join table2 on table1.C = table2.C order by DE desc limit 5").show() } }
Table1.json:spa
{"A":"A1", "B":30, "C":1} {"A":"A2", "B":31, "C":2} {"A":"A3", "B":32, "C":3} {"A":"A4", "B":33, "C":4} {"A":"A5", "B":34, "C":5} {"A":"A6", "B":35, "C":6} {"A":"A7", "B":36, "C":7} {"A":"A8", "B":37, "C":8} {"A":"A9", "B":38, "C":9}
Table2.json:code
{"C":1, "D":1, "E":1} {"C":2, "D":2, "E":2} {"C":3, "D":3, "E":3} {"C":4, "D":4, "E":4} {"C":5, "D":5, "E":5} {"C":6, "D":6, "E":6} {"C":7, "D":7, "E":7} {"C":8, "D":8, "E":8} {"C":9, "D":9, "E":9}
結果:blog
表顯示get
計算結果顯示:it