Null value appeared in non-nullable field java.lang.NullPointerException

報錯

Null value appeared in non-nullable field
java.lang.NullPointerException: Null value appeared in non-nullable field: top level row object
If the schema is inferred from a Scala tuple/case class, or a Java bean, please try to use scala.Option[_] or other nullable types (e.g. java.lang.Integer instead of int/scala.Int).

dataset schema

root
 |-- window: long (nullable = false)
 |-- linkId: long (nullable = false)
 |-- mapVersion: integer (nullable = false)
 |-- passthrough: long (nullable = false)
 |-- resident: long (nullable = false)
 |-- driverId: string (nullable = true)
 |-- inLink: map (nullable = true)
 |    |-- key: long
 |    |-- value: integer (valueContainsNull = false)
 |-- outLink: map (nullable = true)
 |    |-- key: long
 |    |-- value: integer (valueContainsNull = false)

報錯緣由

有些不能夠爲null的字段被賦值爲null了java

解決辦法

一、過濾爲這些字段爲null的數據sql

二、將字段聲明爲能夠爲null的類型app

例子

val path: String = ???

val peopleDF = spark.read
  .option("inferSchema","true")
  .option("header", "true")
  .option("delimiter", ",")
  .csv(path)

peopleDF.printSchema

輸出爲: ide

root
|-- name: string (nullable = true)
|-- age: long (nullable = false)
|-- stat: string (nullable = true)
peopleDF.where($"age".isNull).show

輸出爲:spa

+----+----+----+
|name| age|stat|
+----+----+----+
| xyz|null|   s|
+----+----+----+

接下來將Dataset[Row] 轉換爲 Dataset[Person]scala

val peopleDS = peopleDF.as[Person]

peopleDS.printSchema

運行以下代碼code

peopleDS.where($"age" > 30).show

結果get

+----+---+----+
|name|age|stat|
+----+---+----+
+----+---+----+

sql認爲null是有效值string

運行以下代碼it

peopleDS.filter(_.age > 30)

報上面的錯誤

緣由是由於scala中Long類型不能爲null

解決辦法,用Option類

case class Person(name: String, age: Option[Long], stat: String)
peopleDS.filter(_.age.map(_ > 30).getOrElse(false))

結果

+----+---+----+
|name|age|stat|
+----+---+----+
+----+---+----+
相關文章
相關標籤/搜索