1、agg(expers:column*) 返回dataframe類型 ,同數學計算求值 df.agg(max("age"), avg("salary")) df.groupBy().agg(max("age"), avg("salary")) 2、 agg(exprs: Map[String, String]) 返回dataframe類型 ,同數學計算求值 map類型的 df.agg(Map("age" -> "max", "salary" -> "avg")) df.groupBy().agg(Map("age" -> "max", "salary" -> "avg")) 3、 agg(aggExpr: (String, String), aggExprs: (String, String)*) 返回dataframe類型 ,同數學計算求值 df.agg(Map("age" -> "max", "salary" -> "avg")) df.groupBy().agg(Map("age" -> "max", "salary" -> "avg")) 例子1: scala> spark.version res2: String = 2.0.2 scala> case class Test(bf: Int, df: Int, duration: Int, tel_date: Int) defined class Test scala> val df = Seq(Test(1,1,1,1), Test(1,1,2,2), Test(1,1,3,3), Test(2,2,3,3), Test(2,2,2,2), Test(2,2,1,1)).toDF df: org.apache.spark.sql.DataFrame = [bf: int, df: int ... 2 more fields] scala> df.show +---+---+--------+--------+ | bf| df|duration|tel_date| +---+---+--------+--------+ | 1| 1| 1| 1| | 1| 1| 2| 2| | 1| 1| 3| 3| | 2| 2| 3| 3| | 2| 2| 2| 2| | 2| 2| 1| 1| +---+---+--------+--------+ scala> df.groupBy("bf", "df").agg(("duration","sum"),("tel_date","min"),("tel_date","max")).show() +---+---+-------------+-------------+-------------+ | bf| df|sum(duration)|min(tel_date)|max(tel_date)| +---+---+-------------+-------------+-------------+ | 2| 2| 6| 1| 3| | 1| 1| 6| 1| 3| +---+---+-------------+-------------+-------------+ 注意:此處df已經少了列duration和tel_date,只有groupby的key和agg中的字段 例子2: import pyspark.sql.functions as func agg(func.max("event_time").alias("max_event_tm"),func.min("event_time").alias("min_event_tm"))