1,ERROR Utils: Aborting task java.io.IOException: key out of order: For after packagejava
先上代碼apache
public static void main(String[] args) throws ClassNotFoundException { SparkConf sparkConf = new SparkConf().setAppName("SparkTest-MapFile").setMaster("local"); sparkConf.registerKryoClasses(new Class<?>[] { Class.forName("org.apache.hadoop.io.IntWritable"), Class.forName("org.apache.hadoop.io.Text") }); JavaSparkContext ctx = new JavaSparkContext(sparkConf); JavaRDD<String> jpr = ctx.textFile("/README.md"); jpr.foreach(v -> { System.out.println(v); }); JavaRDD<String> words = jpr.flatMap(line -> Arrays.asList(line.split(" ")).iterator()); JavaPairRDD<String, Integer> counts = words.mapToPair(w -> new Tuple2<String, Integer>(w, 1)) .reduceByKey((x, y) -> x + y); counts.mapToPair(v -> new Tuple2<Text, IntWritable>(new Text(v._1()), new IntWritable(v._2()))) .saveAsNewAPIHadoopFile("/tmp/test6", Text.class, IntWritable.class, MapFileOutputFormat.class); }
按理說這樣的寫法是沒有問題的,可是就是一直報錯,不明緣由,各類找也沒有找到相應的解決辦法,也沒有找到有人跟我同樣出現問題,而後本身就開始各類無頭蒼蠅亂找,後面MapFileOutputFormat改成了SequenceFileOutputFormat的方式保存就沒問題,這就讓糾結了,後面無心間看到 hadoop的Map存儲方式有有序的,就想到是否是存儲前要本身手動排序下,修改代碼加入排序.sortByKey(),而後在進行運行,完美運行成功。oop
public static void main(String[] args) throws ClassNotFoundException { SparkConf sparkConf = new SparkConf().setAppName("SparkTest-MapFile").setMaster("local"); sparkConf.registerKryoClasses(new Class<?>[] { Class.forName("org.apache.hadoop.io.IntWritable"), Class.forName("org.apache.hadoop.io.Text") }); JavaSparkContext ctx = new JavaSparkContext(sparkConf); JavaRDD<String> jpr = ctx.textFile("/README.md"); jpr.foreach(v -> { System.out.println(v); }); JavaRDD<String> words = jpr.flatMap(line -> Arrays.asList(line.split(" ")).iterator()); JavaPairRDD<String, Integer> counts = words.mapToPair(w -> new Tuple2<String, Integer>(w, 1)) .reduceByKey((x, y) -> x + y); counts.mapToPair(v -> new Tuple2<Text, IntWritable>(new Text(v._1()), new IntWritable(v._2()))) .sortByKey().saveAsNewAPIHadoopFile("/tmp/test6", Text.class, IntWritable.class, MapFileOutputFormat.class); }
猜想可能JavaPairRDD對map方式的存儲須要本身排序的,真正是否是這樣的緣由就不知道了,但願哪位大神知道的話能夠告知下spa