Null value appeared in non-nullable field java.lang.NullPointerException: Null value appeared in non-nullable field: top level row object If the schema is inferred from a Scala tuple/case class, or a Java bean, please try to use scala.Option[_] or other nullable types (e.g. java.lang.Integer instead of int/scala.Int).
root |-- window: long (nullable = false) |-- linkId: long (nullable = false) |-- mapVersion: integer (nullable = false) |-- passthrough: long (nullable = false) |-- resident: long (nullable = false) |-- driverId: string (nullable = true) |-- inLink: map (nullable = true) | |-- key: long | |-- value: integer (valueContainsNull = false) |-- outLink: map (nullable = true) | |-- key: long | |-- value: integer (valueContainsNull = false)
有些不能夠爲null的字段被賦值爲null了java
一、過濾爲這些字段爲null的數據sql
二、將字段聲明爲能夠爲null的類型app
val path: String = ??? val peopleDF = spark.read .option("inferSchema","true") .option("header", "true") .option("delimiter", ",") .csv(path) peopleDF.printSchema
輸出爲: ide
root |-- name: string (nullable = true) |-- age: long (nullable = false) |-- stat: string (nullable = true)
peopleDF.where($"age".isNull).show
輸出爲:spa
+----+----+----+ |name| age|stat| +----+----+----+ | xyz|null| s| +----+----+----+
接下來將Dataset[Row]
轉換爲 Dataset[Person]
scala
val peopleDS = peopleDF.as[Person] peopleDS.printSchema
運行以下代碼code
peopleDS.where($"age" > 30).show
結果get
+----+---+----+ |name|age|stat| +----+---+----+ +----+---+----+
sql認爲null是有效值string
運行以下代碼it
peopleDS.filter(_.age > 30)
報上面的錯誤
緣由是由於scala中Long類型不能爲null
解決辦法,用Option類
case class Person(name: String, age: Option[Long], stat: String)
peopleDS.filter(_.age.map(_ > 30).getOrElse(false))
結果
+----+---+----+ |name|age|stat| +----+---+----+ +----+---+----+