對源碼進行編譯雖然有點自虐,但能夠幫助本身更好地瞭解其中的細節,爲之後的深刻和解決配置問題打下基礎,不然遇到問題可能會一籌莫展。這裏介紹Spark的編譯過程[來自於:http://www.iteblog.com/archives/1038],可是開源軟件的演進是很快 的,Spark的最新版本已經到1.5了,Hadoop的最新版本已經2.6了,須要根據實際狀況進行摸索和調整。 java
目前Spark已經更新到1.0.0了,在本博客的《Spark 1.0.0於5月30日正式發佈》中已經介紹了Spark 1.0.0的一些新特性。咱們能夠看到Spark 1.0.0帶來了許多很不錯的感覺。本篇文章來介紹如何用Maven編譯Spark 1.0.0源碼。步驟主要以下:apache
1、先去Spark官網下載好源碼。
1 |
# wget http: //d3kbcqa49mib13.cloudfront.net/spark-1.0.0.tgz |
2 |
# tar -zxf spark- 1.0 . 0 .tgz |
2、設置MAVEN_OPTS參數
在編譯Spark的時候Maven須要不少內存,不然會出現相似下面的錯誤信息:maven
01 |
Exception in thread "main" java.lang.OutOfMemoryError: PermGen space |
02 |
at org.apache.maven.cli.MavenCli.execute(MavenCli.java: 545 ) |
03 |
at org.apache.maven.cli.MavenCli.doMain(MavenCli.java: 196 ) |
04 |
at org.apache.maven.cli.MavenCli.main(MavenCli.java: 141 ) |
05 |
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) |
06 |
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java: 39 ) |
07 |
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java: 25 ) |
08 |
at java.lang.reflect.Method.invoke(Method.java: 597 ) |
09 |
at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java: 290 ) |
10 |
at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java: 230 ) |
11 |
at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java: 409 ) |
12 |
at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java: 352 ) |
解決方法是:oop
1 |
export MAVEN_OPTS= "-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m" |
3、 Cannot run program "javac": java.io.IOException:
若是編譯的過程出現如下錯誤,請設置一下Java path。ui
1 |
[ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin: 3.1 . 6 : |
2 |
compile (scala-compile-first) on project spark-core_2. 10 : wrap: |
3 |
java.io.IOException: Cannot run program "javac" : java.io.IOException: |
4 |
error= 2 , No such file or directory -> [Help 1 ] |
4、 Please set the SCALA_HOME
這個錯誤很明顯沒有設置SCALA_HOME,去下載一個scala,而後設置一下便可。spa
1 |
[ERROR] Failed to execute goal org.apache.maven.plugins: |
2 |
maven-antrun-plugin: 1.7 :run ( default ) on project spark-core_2. 10 : |
3 |
An Ant BuildException has occured: Please set the SCALA_HOME |
4 |
(or SCALA_LIBRARY_PATH if scala is on the path) environment |
6 |
[ERROR] around Ant part ...<fail message="Please set the SCALA_HOME |
7 |
(or SCALA_LIBRARY_PATH if scala is on the path) environment variables |
8 |
and retry.">... @ 6 : 126 in spark- 1.0 . 0 /core/target/antrun/build-main.xml |
5、選擇相應的Hadoop和Yarn版本
由於不一樣版本的HDFS在協議上是不兼容的,因此若是你想用你的Spark從HDFS上讀取數據,那麼你就的選擇相應版本的HDFS來編譯 Spark,這個能夠在編譯的時候經過設置hadoop.version來選擇,默認狀況下,Spark是用Hadoop 1.0.4版本。.net
Hadoop version |
Profile required |
0.23.x |
hadoop-0.23 |
1.x to 2.1.x |
(none) |
2.2.x |
hadoop-2.2 |
2.3.x |
hadoop-2.3 |
2.4.x |
hadoop-2.4 |
(1)、對於Apache Hadoop 1.x、Cloudera CDH的mr1發行版,這些版本沒有 YARN,因此咱們能夠用下面的命令來編譯Sparkscala
2 |
mvn -Dhadoop.version= 1.2 . 1 -DskipTests clean package |
4 |
# Cloudera CDH 4.2 . 0 with MapReduce v1 |
5 |
mvn -Dhadoop.version= 2.0 . 0 -mr1-cdh4. 2.0 -DskipTests clean package |
8 |
mvn -Phadoop- 0.23 -Dhadoop.version= 0.23 . 7 -DskipTests clean package |
(2)、對於Apache Hadoop 2.x, 0.23.x,Cloudera CDH以及其它一些版本的Hadoop,它們都是帶有YARN,因此你能夠啓用「yarn-alpha」或者「yarn」配置選項,並經過 yarn.version來設置不一樣版本的YARN,可選擇的值以下:code
YARN version |
Profile required |
0.23.x 到 2.1.x |
yarn-alpha |
2.2.x和以後版本 |
yarn |
咱們能夠經過下面命令來編譯Sparkxml
01 |
# Apache Hadoop 2.0 . 5 -alpha |
02 |
mvn -Pyarn-alpha -Dhadoop.version= 2.0 . 5 -alpha -DskipTests clean package |
05 |
mvn -Pyarn-alpha -Dhadoop.version= 2.0 . 0 -cdh4. 2.0 -DskipTests clean package |
07 |
# Apache Hadoop 0.23 .x |
08 |
mvn -Pyarn-alpha -Phadoop- 0.23 -Dhadoop.version= 0.23 . 7 -DskipTests clean package |
11 |
mvn -Pyarn -Phadoop- 2.2 -Dhadoop.version= 2.2 . 0 -DskipTests clean package |
14 |
mvn -Pyarn -Phadoop- 2.3 -Dhadoop.version= 2.3 . 0 -DskipTests clean package |
17 |
mvn -Pyarn -Phadoop- 2.4 -Dhadoop.version= 2.4 . 0 -DskipTests clean package |
19 |
# Different versions of HDFS and YARN. |
20 |
mvn -Pyarn-alpha -Phadoop- 2.3 -Dhadoop.version= 2.3 . 0 -Dyarn.version= 0.23 . 7 |
21 |
-DskipTests clean package |
固然(1)咱們也能夠用sbt來編譯Spark,本博客的《Spark 0.9.1源碼編譯》有詳細的介紹,你們能夠去參考。
(2)、本身編譯Spark能夠學到許多東西,不過你徹底能夠去下載已經編譯好的Spark,這徹底由你本身去決定。
(3)、本文原文出自: 《用Maven編譯Spark 1.0.0源碼以錯誤解決》: http://www.iteblog.com/archives/1038
(4)、在下載下來的Spark源碼中的同一級目錄下有個make-distribution.sh腳本,這個腳本能夠打包Spark的發行包,make-distribution.sh文件其實就是調用了Maven進行編譯的,能夠經過下面的命令運行:
1 |
./make-distribution.sh --tgz -Phadoop- 2.2 -Pyarn -DskipTests -Dhadoop.version= 2.2 . 0 |
大量關於Hadoop、Spark的乾貨博客:過往記憶:http://www.iteblog.com
若是你看到下面的輸出信息,那恭喜你,編譯成功了!
01 |
[WARNING] See http: //docs.codehaus.org/display/MAVENUSER/Shade+Plugin |
02 |
[INFO] ------------------------------------------------------------------------ |
03 |
[INFO] Reactor Summary: |
05 |
[INFO] Spark Project Parent POM .......................... SUCCESS [ 2 .172s] |
06 |
[INFO] Spark Project Core ................................ SUCCESS [ 3 : 14 .405s] |
07 |
[INFO] Spark Project Bagel ............................... SUCCESS [ 22 .606s] |
08 |
[INFO] Spark Project GraphX .............................. SUCCESS [ 56 .679s] |
09 |
[INFO] Spark Project Streaming ........................... SUCCESS [ 1 : 14 .616s] |
10 |
[INFO] Spark Project ML Library .......................... SUCCESS [ 1 : 31 .366s] |
11 |
[INFO] Spark Project Tools ............................... SUCCESS [ 15 .484s] |
12 |
[INFO] Spark Project Catalyst ............................ SUCCESS [ 1 : 13 .788s] |
13 |
[INFO] Spark Project SQL ................................. SUCCESS [ 1 : 22 .578s] |
14 |
[INFO] Spark Project Hive ................................ SUCCESS [ 1 : 10 .762s] |
15 |
[INFO] Spark Project REPL ................................ SUCCESS [ 36 .957s] |
16 |
[INFO] Spark Project YARN Parent POM ..................... SUCCESS [ 2 .290s] |
17 |
[INFO] Spark Project YARN Stable API ..................... SUCCESS [ 38 .067s] |
18 |
[INFO] Spark Project Assembly ............................ SUCCESS [ 23 .663s] |
19 |
[INFO] Spark Project External Twitter .................... SUCCESS [ 19 .490s] |
20 |
[INFO] Spark Project External Kafka ...................... SUCCESS [ 24 .782s] |
21 |
[INFO] Spark Project External Flume Sink ................. SUCCESS [ 24 .539s] |
22 |
[INFO] Spark Project External Flume ...................... SUCCESS [ 27 .308s] |
23 |
[INFO] Spark Project External ZeroMQ ..................... SUCCESS [ 21 .148s] |
24 |
[INFO] Spark Project External MQTT ....................... SUCCESS [ 2 : 00 .741s] |
25 |
[INFO] Spark Project Examples ............................ SUCCESS [ 54 .435s] |
26 |
[INFO] ------------------------------------------------------------------------ |
28 |
[INFO] ------------------------------------------------------------------------ |
29 |
[INFO] Total time: 17 : 58 .481s |
30 |
[INFO] Finished at: Tue Sep 16 19 : 20 : 10 CST 2014 |
31 |
[INFO] Final Memory: 76M/1509M |
32 |
[INFO] ------------------------------------------------------------------------ |
本博客文章除特別聲明,所有都是原創!
尊重原創,轉載請註明: 轉載自過往記憶(http://www.iteblog.com/)
本文連接地址: 《用Maven編譯Spark 1.0.0源碼以錯誤解決》(http://www.iteblog.com/archives/1038)