若是數據太大直接用dataframe轉list內存會不夠,因此能夠經過foreachPartition遍歷讀取java
System.setProperty("hadoop.home.dir","h:\\hadoop2.3.7"); string mastor="local" string name="wordcount"+system.currentTimeMillis() sparkSeesion spark=sparkSeesion.builder().appName(neme).master(mastor).getOrCreate(); Data<Row> dataset=spark.read().json("src/j.json")
Dataset<String> jsons=dataset.toJSON();
JavaRDD<String> rdd=json.javaRDD();
rdd.foreachPartition(new VoidFunction<Iterator<String>>() {
@Override
public void call(Iterator<String> iter) throws Exception {
while(iter.hasNext()) {
String next=iter.next();
System.out.println("獲取"+next);
}
}
});
參考https://blog.csdn.net/wyqwilliam/article/details/81142324json