Spark源碼編譯和問題的解決

     對源碼進行編譯雖然有點自虐,但能夠幫助本身更好地瞭解其中的細節,爲之後的深刻和解決配置問題打下基礎,不然遇到問題可能會一籌莫展。這裏介紹Spark的編譯過程[來自於:http://www.iteblog.com/archives/1038],可是開源軟件的演進是很快 的,Spark的最新版本已經到1.5了,Hadoop的最新版本已經2.6了,須要根據實際狀況進行摸索和調整。  java

      目前Spark已經更新到1.0.0了,在本博客的《Spark 1.0.0於5月30日正式發佈》中已經介紹了Spark 1.0.0的一些新特性。咱們能夠看到Spark 1.0.0帶來了許多很不錯的感覺。本篇文章來介紹如何用Maven編譯Spark 1.0.0源碼。步驟主要以下:apache

1、先去Spark官網下載好源碼。
1 # wget http://d3kbcqa49mib13.cloudfront.net/spark-1.0.0.tgz
2 # tar -zxf spark-1.0.0.tgz
2、設置MAVEN_OPTS參數

  在編譯Spark的時候Maven須要不少內存,不然會出現相似下面的錯誤信息:maven

01 Exception in thread "main" java.lang.OutOfMemoryError: PermGen space
02     at org.apache.maven.cli.MavenCli.execute(MavenCli.java:545)
03     at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:196)
04     at org.apache.maven.cli.MavenCli.main(MavenCli.java:141)
05     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
06     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
07     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
08     at java.lang.reflect.Method.invoke(Method.java:597)
09     at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:290)
10     at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:230)
11     at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:409)
12     at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:352)

解決方法是:oop

1 export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
3、 Cannot run program "javac": java.io.IOException:

  若是編譯的過程出現如下錯誤,請設置一下Java path。ui

1 [ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.1.6:
2 compile (scala-compile-first) on project spark-core_2.10: wrap:
3 java.io.IOException: Cannot run program "javac": java.io.IOException:
4  error=2, No such file or directory -> [Help 1]
4、 Please set the SCALA_HOME

  這個錯誤很明顯沒有設置SCALA_HOME,去下載一個scala,而後設置一下便可。spa

1 [ERROR] Failed to execute goal org.apache.maven.plugins:
2 maven-antrun-plugin:1.7:run (default) on project spark-core_2.10:
3  An Ant BuildException has occured: Please set the SCALA_HOME
4  (or SCALA_LIBRARY_PATH if scala is on the path) environment
5 variables and retry.
6 [ERROR] around Ant part ...<fail message="Please set the SCALA_HOME
7 (or SCALA_LIBRARY_PATH if scala is on the path) environment variables
8 and retry.">... @ 6:126 in spark-1.0.0/core/target/antrun/build-main.xml
5、選擇相應的Hadoop和Yarn版本

  由於不一樣版本的HDFS在協議上是不兼容的,因此若是你想用你的Spark從HDFS上讀取數據,那麼你就的選擇相應版本的HDFS來編譯 Spark,這個能夠在編譯的時候經過設置hadoop.version來選擇,默認狀況下,Spark是用Hadoop 1.0.4版本。.net

Hadoop version Profile required
0.23.x hadoop-0.23
1.x to 2.1.x (none)
2.2.x hadoop-2.2
2.3.x hadoop-2.3
2.4.x hadoop-2.4

  (1)、對於Apache Hadoop 1.x、Cloudera CDH的mr1發行版,這些版本沒有 YARN,因此咱們能夠用下面的命令來編譯Sparkscala

1 # Apache Hadoop 1.2.1
2 mvn -Dhadoop.version=1.2.1 -DskipTests clean package
3
4 # Cloudera CDH 4.2.0 with MapReduce v1
5 mvn -Dhadoop.version=2.0.0-mr1-cdh4.2.0 -DskipTests clean package
6
7 # Apache Hadoop 0.23.x
8 mvn -Phadoop-0.23 -Dhadoop.version=0.23.7 -DskipTests clean package

  (2)、對於Apache Hadoop 2.x, 0.23.x,Cloudera CDH以及其它一些版本的Hadoop,它們都是帶有YARN,因此你能夠啓用「yarn-alpha」或者「yarn」配置選項,並經過 yarn.version來設置不一樣版本的YARN,可選擇的值以下:code

YARN version Profile required
0.23.x 到 2.1.x yarn-alpha
2.2.x和以後版本 yarn

  咱們能夠經過下面命令來編譯Sparkxml

01 # Apache Hadoop 2.0.5-alpha
02 mvn -Pyarn-alpha -Dhadoop.version=2.0.5-alpha -DskipTests clean package
03
04 # Cloudera CDH 4.2.0
05 mvn -Pyarn-alpha -Dhadoop.version=2.0.0-cdh4.2.0 -DskipTests clean package
06
07 # Apache Hadoop 0.23.x
08 mvn -Pyarn-alpha -Phadoop-0.23 -Dhadoop.version=0.23.7 -DskipTests clean package
09
10 # Apache Hadoop 2.2.X
11 mvn -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 -DskipTests clean package
12
13 # Apache Hadoop 2.3.X
14 mvn -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -DskipTests clean package
15
16 # Apache Hadoop 2.4.X
17 mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package
18
19 # Different versions of HDFS and YARN.
20 mvn -Pyarn-alpha -Phadoop-2.3 -Dhadoop.version=2.3.0 -Dyarn.version=0.23.7
21                               -DskipTests clean package

  固然(1)咱們也能夠用sbt來編譯Spark,本博客的《Spark 0.9.1源碼編譯》有詳細的介紹,你們能夠去參考。
  (2)、本身編譯Spark能夠學到許多東西,不過你徹底能夠去下載已經編譯好的Spark,這徹底由你本身去決定。
  (3)、本文原文出自: 《用Maven編譯Spark 1.0.0源碼以錯誤解決》: http://www.iteblog.com/archives/1038
  (4)、在下載下來的Spark源碼中的同一級目錄下有個make-distribution.sh腳本,這個腳本能夠打包Spark的發行包,make-distribution.sh文件其實就是調用了Maven進行編譯的,能夠經過下面的命令運行:

1 ./make-distribution.sh --tgz -Phadoop-2.2 -Pyarn -DskipTests -Dhadoop.version=2.2.0

  大量關於Hadoop、Spark的乾貨博客:過往記憶:http://www.iteblog.com

若是你看到下面的輸出信息,那恭喜你,編譯成功了!

01 [WARNING] See http://docs.codehaus.org/display/MAVENUSER/Shade+Plugin
02 [INFO] ------------------------------------------------------------------------
03 [INFO] Reactor Summary:
04 [INFO]
05 [INFO] Spark Project Parent POM .......................... SUCCESS [2.172s]
06 [INFO] Spark Project Core ................................ SUCCESS [3:14.405s]
07 [INFO] Spark Project Bagel ............................... SUCCESS [22.606s]
08 [INFO] Spark Project GraphX .............................. SUCCESS [56.679s]
09 [INFO] Spark Project Streaming ........................... SUCCESS [1:14.616s]
10 [INFO] Spark Project ML Library .......................... SUCCESS [1:31.366s]
11 [INFO] Spark Project Tools ............................... SUCCESS [15.484s]
12 [INFO] Spark Project Catalyst ............................ SUCCESS [1:13.788s]
13 [INFO] Spark Project SQL ................................. SUCCESS [1:22.578s]
14 [INFO] Spark Project Hive ................................ SUCCESS [1:10.762s]
15 [INFO] Spark Project REPL ................................ SUCCESS [36.957s]
16 [INFO] Spark Project YARN Parent POM ..................... SUCCESS [2.290s]
17 [INFO] Spark Project YARN Stable API ..................... SUCCESS [38.067s]
18 [INFO] Spark Project Assembly ............................ SUCCESS [23.663s]
19 [INFO] Spark Project External Twitter .................... SUCCESS [19.490s]
20 [INFO] Spark Project External Kafka ...................... SUCCESS [24.782s]
21 [INFO] Spark Project External Flume Sink ................. SUCCESS [24.539s]
22 [INFO] Spark Project External Flume ...................... SUCCESS [27.308s]
23 [INFO] Spark Project External ZeroMQ ..................... SUCCESS [21.148s]
24 [INFO] Spark Project External MQTT ....................... SUCCESS [2:00.741s]
25 [INFO] Spark Project Examples ............................ SUCCESS [54.435s]
26 [INFO] ------------------------------------------------------------------------
27 [INFO] BUILD SUCCESS
28 [INFO] ------------------------------------------------------------------------
29 [INFO] Total time: 17:58.481s
30 [INFO] Finished at: Tue Sep 16 19:20:10 CST 2014
31 [INFO] Final Memory: 76M/1509M
32 [INFO] ------------------------------------------------------------------------

本博客文章除特別聲明,所有都是原創!
尊重原創,轉載請註明: 轉載自過往記憶(http://www.iteblog.com/)
本文連接地址: 《用Maven編譯Spark 1.0.0源碼以錯誤解決》(http://www.iteblog.com/archives/1038)

相關文章
相關標籤/搜索