Spark SQL裏concat_ws和collect_set的做用

concat_ws: 用指定的字符鏈接字符串mysql

例如:sql

鏈接字符串:數組

concat_ws("_", field1, field2),輸出結果將會是:「field1_field2」。app

數組元素鏈接:ide

concat_ws("_", [a,b,c]),輸出結果將會是:"a_b_c"。ui

collect_set: 把聚合的數據組合成一個數組,通常搭配group by 使用。url

例若有下表T_course;spa

id name course
1 zhang san Chinese
2 zhang san Math
3 zhang san English
spark.sql("select name, collect_set(course) as course_set from T_course group by name");

結果是:code

name course_set
zhang san [Chinese,Math,English]

貼上套牌車項目代碼:orm

public class TpcCompute2 { public static void main(String[] args) { SparkSession spark = SparkSession.builder().enableHiveSupport().appName("TpcCompute2").master("local").getOrCreate(); JavaSparkContext sc = new JavaSparkContext(spark.sparkContext()); sc.setLogLevel("ERROR"); //hphm,id,tgsj,lonlat&
        spark.udf().register("getTpc", new ComputeUDF(), DataTypes.StringType); spark.sql("use traffic"); spark.sql("select hphm,concat_ws('&',collect_set(concat_ws('_',id,kk_lon_lat,tgsj))) as concatValue from t_cltgxx t where t.tgsj>'2015-01-01 00:00:00' group by hphm").show(false); Dataset<Row> cltgxxDF = spark.sql("select hphm,concatValue from (select hphm,getTpc(concat_ws('&',collect_set(concat_ws('_',id,kk_lon_lat,tgsj)))) as concatValue from t_cltgxx t where t.tgsj>'2015-01-01 00:00:00' group by hphm) where concatValue is not null"); cltgxxDF.show(); //建立集合累加器
        CollectionAccumulator<String> acc = sc.sc().collectionAccumulator(); cltgxxDF.foreach(new ForeachFunction<Row>() { @Override public void call(Row row) throws Exception { acc.add(row.getAs("concatValue")); } }); List<String> values = acc.value(); for (String id : accValues) { System.out.println("accValues: " + id); Dataset<Row> resultDF = spark.sql("select hphm,clpp,clys,tgsj,kkbh from t_cltgxx where id in (" + id.split("_")[0] + "," + id.split("_")[1] + ")"); resultDF.show(); Dataset<Row> resultDF2 = resultDF.withColumn("jsbh", functions.lit(new Date().getTime())) .withColumn("create_time", functions.lit(new Timestamp(new Date().getTime()))); resultDF2.show(); resultDF2.write() .format("jdbc") .option("url","jdbc:mysql://lin01.cniao5.com:3306/traffic?characterEncoding=UTF-8") .option("dbtable","t_tpc_result") .option("user","root") .option("password","123456") .mode(SaveMode.Append) .save(); } }

spark.sql語句輸出樣式:

相關文章
相關標籤/搜索