PredictionIO+Universal Recommender雖然能夠幫助中小企業快速的搭建部署基於用戶行爲協同過濾的個性化推薦引擎,單純從引擎層面來看,開發成本近乎於零,但仍然須要一些前提條件。好比說,組織內部最好已經搭建了較穩定的Hadoop,Spark集羣,至少要擁有一部分熟悉Spark平臺的開發和運維人員,不然會須要技術團隊花費很長時間來踩坑,試錯。java
本文列舉了一些PredictionIO+Universal Recommender的使用過程當中會遇到的Spark平臺相關的異常信息,以及其解決思路和最終的解決辦法,供參考。node
1,執行訓練時,發生java.lang.StackOverflowError錯誤git
這個問題比較簡單,查看文檔,執行訓練時,經過參數指定內存大小能夠避免該問題,例如:github
pio train -- --driver-memory 8g --executor-memory 8g --verbose
2,執行訓練時,發生找不到EmptyRDD方法的錯誤apache
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.SparkContext.emptyRDD(Lscala/reflect/ClassTag;)Lorg/apache/spark/rdd/EmptyRDD; at com.actionml.URAlgorithm.getRanksRDD(URAlgorithm.scala:534) at com.actionml.URAlgorithm.calcAll(URAlgorithm.scala:340) at com.actionml.URAlgorithm.train(URAlgorithm.scala:285) at com.actionml.URAlgorithm.train(URAlgorithm.scala:175)
這個是編譯和執行環境的Spark版本不一致致使的。json
/** Get an RDD that has no partitions or elements. */def emptyRDD[T: ClassTag]: RDD[T] = new EmptyRDD[T](this)
[INFO] [ServerConnector] Started ServerConnector@bd93bc3{HTTP/1.1}{0.0.0.0:4040} [INFO] [Server] Started @6428ms Exception in thread "main" java.lang.NoClassDefFoundError: com/sun/jersey/api/client/config/ClientConfig at org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:55) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createTimelineClient(YarnClientImpl.java:181) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:168) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:151) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:156) at org.apache.spark.SparkContext.<init>(SparkContext.scala:509) at org.apache.predictionio.workflow.WorkflowContext$.apply(WorkflowContext.scala:45) at org.apache.predictionio.workflow.CoreWorkflow$.runTrain(CoreWorkflow.scala:59) at org.apache.predictionio.workflow.CreateWorkflow$.main(CreateWorkflow.scala:250) at org.apache.predictionio.workflow.CreateWorkflow.main(CreateWorkflow.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: com.sun.jersey.api.client.config.ClientConfig at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 21 more
[INFO] [ContextHandler] Stopped o.s.j.s.ServletContextHandler@7772d266{/jobs,null,UNAVAILABLE} [WARN] [YarnSchedulerBackend$YarnSchedulerEndpoint] Attempted to request executors before the AM has registered! [WARN] [MetricsSystem] Stopping a MetricsSystem that is not running Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:154) at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$setEnvFromInputString$1.apply(YarnSparkHadoopUtil.scala:152) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.setEnvFromInputString(YarnSparkHadoopUtil.scala:152) at org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$6.apply(Client.scala:775) at org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$6.apply(Client.scala:773) at scala.Option.foreach(Option.scala:257) at org.apache.spark.deploy.yarn.Client.setupLaunchEnv(Client.scala:773) at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:867) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:170) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:156) at org.apache.spark.SparkContext.<init>(SparkContext.scala:509) at org.apache.predictionio.workflow.WorkflowContext$.apply(WorkflowContext.scala:45) at org.apache.predictionio.workflow.CoreWorkflow$.runTrain(CoreWorkflow.scala:59) at org.apache.predictionio.workflow.CreateWorkflow$.main(CreateWorkflow.scala:250) at org.apache.predictionio.workflow.CreateWorkflow.main(CreateWorkflow.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
export SPARK_YARN_USER_ENV="HADOOP_CONF_DIR=/home/hadoop/apache-hadoop/etc/hadoop"
[WARN] [TaskSetManager] Lost task 3.0 in stage 173.0 (TID 250, bigdata01, executor 3): java.lang.Error: Multiple ES-Hadoop versions detected in the classpath; please use only one jar:file:/home/hadoop/apache-hadoop/hadoop/var/yarn/local-dir/usercache/hadoop/appcache/application_1504083960020_0030/container_e235_1504083960020_0030_01_000005/universal-recommender-assembly-0.6.0-deps.jar jar:file:/home/hadoop/apache-hadoop/hadoop-2.7.2/var/yarn/local-dir/usercache/hadoop/appcache/application_1504083960020_0030/container_e235_1504083960020_0030_01_000005/universal-recommender-assembly-0.6.0-deps.jar at org.elasticsearch.hadoop.util.Version.<clinit>(Version.java:73) at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:570) at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58) at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:107) at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:107) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
這不知道算不算一個BUG,總之,yarn的配置中若是使用了軟鏈接來指定hadoop文件夾的路徑,將有可能發生此問題。參考 https://interset.zendesk.com/hc/en-us/articles/230751687-PhoenixToElasticSearchJob-Fails-with-Multiple-ES-Hadoop-versions-detected-in-the-classpath-bootstrap
解決方式也很簡單,nodemanager修改全部採用Hadoop文件夾的軟鏈接的配置,改成真正的路徑便可。api
6,Spark的JOB執行出錯app
[WARN] [Utils] Service 'sparkDriver' could not bind on port 0. Attempting port 1. [ERROR] [SparkContext] Error initializing SparkContext. Exception in thread "main" java.net.BindException: Cannot assign requested address: Service 'sparkDriver' failed after 60 retries (starting from 0)! Consider explicitly setting the appropriate port for the service 'sparkDriver' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries. at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:437) at sun.nio.ch.Net.bind(Net.java:429) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:127) at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:501) at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1218) at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:506) at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:491) at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:965) at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:210) at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:353) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:408) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:455) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:140) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144) at java.lang.Thread.run(Thread.java:745)