PredictionIO+Universal Recommender快速開發部署推薦引擎的問題總結(3)

PredictionIO+Universal Recommender雖然能夠幫助中小企業快速的搭建部署基於用戶行爲協同過濾的個性化推薦引擎,單純從引擎層面來看,開發成本近乎於零,但仍然須要一些前提條件。好比說,組織內部最好已經搭建了較穩定的Hadoop,Spark集羣,至少要擁有一部分熟悉Spark平臺的開發和運維人員,不然會須要技術團隊花費很長時間來踩坑,試錯。java

本文列舉了一些PredictionIO+Universal Recommender的使用過程當中會遇到的Spark平臺相關的異常信息,以及其解決思路和最終的解決辦法,供參考。node

 

1,執行訓練時,發生java.lang.StackOverflowError錯誤git

這個問題比較簡單,查看文檔,執行訓練時,經過參數指定內存大小能夠避免該問題,例如:github

pio train  -- --driver-memory 8g --executor-memory 8g --verbose

 

2,執行訓練時,發生找不到EmptyRDD方法的錯誤apache

Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.SparkContext.emptyRDD(Lscala/reflect/ClassTag;)Lorg/apache/spark/rdd/EmptyRDD;
        at com.actionml.URAlgorithm.getRanksRDD(URAlgorithm.scala:534)
        at com.actionml.URAlgorithm.calcAll(URAlgorithm.scala:340)
        at com.actionml.URAlgorithm.train(URAlgorithm.scala:285)
        at com.actionml.URAlgorithm.train(URAlgorithm.scala:175)

這個是編譯和執行環境的Spark版本不一致致使的。json

Spark2.1.1 ,查看github上的spark源碼發現
這個emptyRDD方法,雖然存在
/** Get an RDD that has no partitions or elements. */def emptyRDD[T: ClassTag]: RDD[T] = new EmptyRDD[T](this)
返回值類型和老版本相比,卻發生了變化,不是EmptyRDD。因此在1.4.0下編譯經過,2.1.1下執行失敗。該方法的不一樣版本產生了不兼容。
若是採用我上一篇備忘錄中所記述的方式修改過build.sbt,是能夠避免這個問題的。
 
 
3,yarn和spark使用的jersey版本不一致的問題
[INFO] [ServerConnector] Started ServerConnector@bd93bc3{HTTP/1.1}{0.0.0.0:4040}
[INFO] [Server] Started @6428ms
Exception in thread "main" java.lang.NoClassDefFoundError: com/sun/jersey/api/client/config/ClientConfig
        at org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:55)
        at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createTimelineClient(YarnClientImpl.java:181)
        at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:168)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:151)
        at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
        at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:156)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
        at org.apache.predictionio.workflow.WorkflowContext$.apply(WorkflowContext.scala:45)
        at org.apache.predictionio.workflow.CoreWorkflow$.runTrain(CoreWorkflow.scala:59)
        at org.apache.predictionio.workflow.CreateWorkflow$.main(CreateWorkflow.scala:250)
        at org.apache.predictionio.workflow.CreateWorkflow.main(CreateWorkflow.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: com.sun.jersey.api.client.config.ClientConfig
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 21 more
修改方法: engine.json中的sparkConf中設置
"spark.hadoop.yarn.timeline-service.enabled""false",
 
更深刻了解此問題,參考:https://markobigdata.com/2016/08/01/apache-spark-2-0-0-installation-and-configuration/
 
 
4,yarn的空參數處理BUG
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@7772d266{/jobs,null,UNAVAILABLE}
[WARN] [YarnSchedulerBackend$YarnSchedulerEndpoint] Attempted to request executors before the AM has registered!
[WARN] [MetricsSystem] Stopping a MetricsSystem that is not running
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
        at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:154)
        at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:152)
        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
        at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.setEnvFromInputString(YarnSparkHadoopUtil.scala:152)
        at org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$6.apply(Client.scala:775)
        at org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$6.apply(Client.scala:773)
        at scala.Option.foreach(Option.scala:257)
        at org.apache.spark.deploy.yarn.Client.setupLaunchEnv(Client.scala:773)
        at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:867)
        at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:170)
        at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
        at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:156)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
        at org.apache.predictionio.workflow.WorkflowContext$.apply(WorkflowContext.scala:45)
        at org.apache.predictionio.workflow.CoreWorkflow$.runTrain(CoreWorkflow.scala:59)
        at org.apache.predictionio.workflow.CreateWorkflow$.main(CreateWorkflow.scala:250)
        at org.apache.predictionio.workflow.CreateWorkflow.main(CreateWorkflow.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
是yarn的一個bug,沒法正常處理空參數
 
解決方式:修改 spark-env.sh, 強制設置一個假參數,能夠繞過這個問題
修改 spark/conf/spark-env.sh,增長下面這句話
export SPARK_YARN_USER_ENV="HADOOP_CONF_DIR=/home/hadoop/apache-hadoop/etc/hadoop"

 

5,yarn的軟鏈接BUG
[WARN] [TaskSetManager] Lost task 3.0 in stage 173.0 (TID 250, bigdata01, executor 3): java.lang.Error: Multiple ES-Hadoop versions detected in the classpath; please use only one
jar:file:/home/hadoop/apache-hadoop/hadoop/var/yarn/local-dir/usercache/hadoop/appcache/application_1504083960020_0030/container_e235_1504083960020_0030_01_000005/universal-recommender-assembly-0.6.0-deps.jar
jar:file:/home/hadoop/apache-hadoop/hadoop-2.7.2/var/yarn/local-dir/usercache/hadoop/appcache/application_1504083960020_0030/container_e235_1504083960020_0030_01_000005/universal-recommender-assembly-0.6.0-deps.jar

        at org.elasticsearch.hadoop.util.Version.<clinit>(Version.java:73)
        at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:570)
        at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
        at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:107)
        at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:107)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:99)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

這不知道算不算一個BUG,總之,yarn的配置中若是使用了軟鏈接來指定hadoop文件夾的路徑,將有可能發生此問題。參考 https://interset.zendesk.com/hc/en-us/articles/230751687-PhoenixToElasticSearchJob-Fails-with-Multiple-ES-Hadoop-versions-detected-in-the-classpath-bootstrap

解決方式也很簡單,nodemanager修改全部採用Hadoop文件夾的軟鏈接的配置,改成真正的路徑便可。api

 

6,Spark的JOB執行出錯app

[WARN] [Utils] Service 'sparkDriver' could not bind on port 0. Attempting port 1.
[ERROR] [SparkContext] Error initializing SparkContext.
Exception in thread "main" java.net.BindException: Cannot assign requested address: Service 'sparkDriver' failed after 60 retries (starting from 0)! Consider explicitly setting the appropriate port for the service 'sparkDriver' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries.
        at sun.nio.ch.Net.bind0(Native Method)
        at sun.nio.ch.Net.bind(Net.java:437)
        at sun.nio.ch.Net.bind(Net.java:429)
        at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
        at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:127)
        at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:501)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1218)
        at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:506)
        at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:491)
        at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:965)
        at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:210)
        at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:353)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:408)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:455)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:140)
        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
        at java.lang.Thread.run(Thread.java:745)
這個錯誤,網上的有不少文章讓修改spark-env.sh ,增長 export SPARK_LOCAL_IP="127.0.0.1"
但這些網文其實只適用於單機SPARK的狀況。這個IP是SPARK回調本機的地址,因此應該設置爲本機的IP地址(用ifconfig查看本機真實IP)
相關文章
相關標籤/搜索