背景
今天在開發SparkRDD的過程當中出現Buffer Overflow錯誤,查看具體Yarn日誌後發現是由於Kryo序列化緩衝區溢出了,日誌建議調大spark.kryoserializer.buffer.max的value,搜索了一下設置keyo序列化緩衝區的方法,特此整理記錄下來。html
20/01/08 17:12:55 WARN scheduler.TaskSetManager: Lost task 1.0 in stage 1.0 (TID 4, s015.test.com, executor 1): org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow. Available: 0, required: 10300408. To avoid this, increase spark.kryoserializer.buffer.max value. at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:315) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:367) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
方法一:經過conf參數設置spark.kryoserializer.buffer.max
spark-submit在提交spark做業時能夠帶不少參數,其中有一個參數--conf
能夠設置spark.kryoserializer.buffer.max的大小,具體以下。java
./bin/spark-submit \ --class <main-class> \ --master <master-url> \ --deploy-mode <deploy-mode> \ --conf spark.kryoserializer.buffer.max=512m \ ... # other options <application-jar> \ [application-arguments]
上面的--conf spark.kryoserializer.buffer.max=512m
即表明把Kryo序列化緩衝區的buffer大小設置爲512mb。shell
方法二:經過程序中拿到sparkConf對象設置spark.kryoserializer.buffer.max
1.設置Kryo爲序列化類
//設置Kryo爲序列化類(默認爲Java序列類) sparkConf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer");
2.設置spark.kryoserializer.buffer.max的值
//兩種設置方法 sparkConf.set("spark.kryoserializer.buffer.max", "128m"); sparkConf.set("spark.kryoserializer.buffer.max.mb", "128");
3.檢查是否成功設置Kryo參數
//打印日誌,檢查是否成功設置 System.out.println( sparkConf.get("spark.kryoserializer.buffer.max") );
參考文獻
[1]<a href="https://www.jianshu.com/p/1326199ec3f5" target="_blank">【大數據進擊】如何設置spark.kryoserializer.buffer.max value</a> [2]<a href="http://spark.apache.org/docs/latest/submitting-applications.html" target="_blank">Spark official docs: Submitting Applications</a>apache
原文出處:https://www.cnblogs.com/JasonCeng/p/12169233.htmlapp