Spark2.3.1使用技巧

Spark 2.3.1 使用技巧

Spark-SQL 讀取JSON文件時反射表頭

case class StudentInfo(id:Long,name:String,age:Int)

val example = spark.read.json("/data/result.json").as(StudentInfo)
example.show()

動態定義schema

在須要根據不一樣數據定義不一樣schemahtml

val schemaInfo = "name age"
val fields = schemaInfo.map(item=> item.split(" ")
     .map(item=>StructField(item,StringType,nullable=true))
val schema = StructType(fields)

val rowRDD = peopleRDD.map(_.split(" ").map(attributes=>Row(attributes(0),attributes(1))

val peopleDF = spark.createDataFrame(rowRDD,schema)

peopleDF.show()

Spark 2.3.1 on YARN

spark-submit 限制參數未生效

由於在spark-submit時配置的executor-memory 2g等沒有生效,後來問同事說他也碰到這樣的問題,解決方案就是動態的分配executor,官方文檔,中文文檔apache

--conf spark.yarn.maxAppAttempts=1 --conf spark.dynamicAllocation.minExecutors=2 --conf spark.dynamicAllocation.maxExecutors=4 --conf spark.dynamicAllocation.initialExecutors=4
相關文章
相關標籤/搜索