最近爲了解決Spark2.1的Bug,對Spark的源碼作了很多修改,須要對修改的代碼作編譯測試,若是編譯整個Spark項目快的話,也得半小時左右,因此基本上是改了哪一個子項目就單獨對那個項目編譯打包。html
Spark官方已經給出瞭如何使用mvn單獨編譯子項目的方法:http://spark.apache.org/docs/latest/building-spark.html#building-submodules-individuallyjava
使用mvn單獨編譯子項目是節約了很多時間。可是頻繁的改動項目,每次用mvn編譯仍是挺耗時間的。git
以前看官方文檔提到,對於開發者,爲了提升效率,推薦使用sbt編譯。因而,又查了下文檔資料:http://spark.apache.org/developer-tools.htmlsql
咦,看到:Running Build Targets For Individual Projects,內容以下:docker
$ # sbt $ build/sbt package $ # Maven $ build/mvn package -DskipTests -pl assembly
這不是坑麼,雖然沒怎麼用sbt編譯過Spark,可是sbt俺仍是用過的。build/sbt package
明明是編譯整個項目的好吧,這哪是編譯子項目啊。apache
翻遍官方全部跟編譯有關的資料,無果。oop
最後,研究了下Spark的sbt定義,也就是下project/SparkBuild.scala
文件,找到了使用sbt編譯子項目的方法。測試
下面是對spark-core從新編譯打包的方法,咱們須要使用REPL模式,大體的流程以下:ui
➜ spark git:(branch-2.1.0) ✗ ./build/sbt -Pyarn -Phadoop-2.6 -Phive ... [info] Set current project to spark-parent (in build file:/Users/stan/Projects/spark/) > project core [info] Set current project to spark-core (in build file:/Users/stan/Projects/spark/) > package [info] Updating {file:/Users/stan/Projects/spark/}tags... [info] Resolving jline#jline;2.12.1 ... ... [info] Packaging /Users/stan/Projects/spark/core/target/scala-2.11/spark-core_2.11-2.1.0.jar ... [info] Done packaging. [success] Total time: 213 s, completed 2017-2-15 16:58:15
最後將spark-core_2.11-2.1.0.jar
替換到jars
或者assembly/target/scala-2.11/jars
目錄下就能夠了。spa
選擇的子項目的關鍵是project
命令,如何知道有哪些定義好的子項目呢?這個還得參考project/SparkBuild.scala
中BuildCommons的定義:
object BuildCommons { private val buildLocation = file(".").getAbsoluteFile.getParentFile val sqlProjects@Seq(catalyst, sql, hive, hiveThriftServer, sqlKafka010) = Seq( "catalyst", "sql", "hive", "hive-thriftserver", "sql-kafka-0-10" ).map(ProjectRef(buildLocation, _)) val streamingProjects@Seq( streaming, streamingFlumeSink, streamingFlume, streamingKafka, streamingKafka010 ) = Seq( "streaming", "streaming-flume-sink", "streaming-flume", "streaming-kafka-0-8", "streaming-kafka-0-10" ).map(ProjectRef(buildLocation, _)) val allProjects@Seq( core, graphx, mllib, mllibLocal, repl, networkCommon, networkShuffle, launcher, unsafe, tags, sketch, _* ) = Seq( "core", "graphx", "mllib", "mllib-local", "repl", "network-common", "network-shuffle", "launcher", "unsafe", "tags", "sketch" ).map(ProjectRef(buildLocation, _)) ++ sqlProjects ++ streamingProjects val optionallyEnabledProjects@Seq(mesos, yarn, java8Tests, sparkGangliaLgpl, streamingKinesisAsl, dockerIntegrationTests) = Seq("mesos", "yarn", "java8-tests", "ganglia-lgpl", "streaming-kinesis-asl", "docker-integration-tests").map(ProjectRef(buildLocation, _)) val assemblyProjects@Seq(networkYarn, streamingFlumeAssembly, streamingKafkaAssembly, streamingKafka010Assembly, streamingKinesisAslAssembly) = Seq("network-yarn", "streaming-flume-assembly", "streaming-kafka-0-8-assembly", "streaming-kafka-0-10-assembly", "streaming-kinesis-asl-assembly") .map(ProjectRef(buildLocation, _)) val copyJarsProjects@Seq(assembly, examples) = Seq("assembly", "examples") .map(ProjectRef(buildLocation, _)) val tools = ProjectRef(buildLocation, "tools") // Root project. val spark = ProjectRef(buildLocation, "spark") val sparkHome = buildLocation val testTempDir = s"$sparkHome/target/tmp" val javacJVMVersion = settingKey[String]("source and target JVM version for javac") val scalacJVMVersion = settingKey[String]("source and target JVM version for scalac") }
咱們看下這個例子:
val sqlProjects@Seq(catalyst, sql, hive, hiveThriftServer, sqlKafka010) = Seq( "catalyst", "sql", "hive", "hive-thriftserver", "sql-kafka-0-10" ).map(ProjectRef(buildLocation, _))
這是對sql項目定義的子項目,有:catalyst, sql, hive, hive-thriftserver, sql-kafka-0-10
。
咱們若是須要編譯catalyst這個項目,只須要進入sbt:project catalyst
選擇catalyst項目就能夠了,後面使用的compile、package等命令都是針對這個項目的。
多謝知乎@鳳凰木的評論,還有一種非REPL的編譯方式,好比要編譯hive項目,咱們能夠直接在Spark源碼目錄下執行build/sbt hive/package
示例:build/sbt "~catalyst/test-only *FoldablePropagationSuite"
對catalyst項目執行測試,只測試FoldablePropagationSuite結尾的類。
~
是對開發很是有用的東西,他表示進行持續測試,若是咱們執行測試後發現case沒有過,那麼能夠在不退出測試的狀況下,直接去修改代碼,保存代碼後,sbt會再次執行測試。
若是須要對一個子項目執行測試,只須要執行:build/sbt sql/test
(對sql項目作測試)。
這下能夠爽爽的編譯Spark了。
還有一些有用的編譯技巧,去參考http://spark.apache.org/developer-tools.html就能夠了。