【原創】大叔經驗分享(5)oozie提交spark任務如何添加依賴

spark任務添加依賴的方式:redis

1 若是是local方式運行,能夠經過--jars來添加依賴;apache

2 若是是yarn方式運行,能夠經過spark.yarn.jars來添加依賴;app

這兩種方式在oozie上都行不通,首先oozie上沒辦法也不該該經過local運行,其次經過spark.yarn.jars方式配置你會發現根本不會生效,來看爲何maven

查看LauncherMapper的日誌ide

 

Spark Version 2.1.1spa

Spark Action Main class        : org.apache.spark.deploy.SparkSubmit日誌

 

Oozie Spark action configurationorm

=================================================================xml

...ci

                    --conf

                    spark.yarn.jars=hdfs://hdfs_name/jarpath/*.jar

                    --conf

                    spark.yarn.jars=hdfs://hdfs_name/oozie/share/lib_20180801121138/spark/spark-yarn_2.11-2.1.1.jar

 

可見oozie會本身添加一個新的spark.yarn.jars配置,若是提供兩個相同的key,spark會如何處理

 

org.apache.spark.deploy.SparkSubmit

    val appArgs = new SparkSubmitArguments(args)

 

org.apache.spark.launcher.SparkSubmitOptionParser

        if (!handle(name, value)) {

 

org.apache.spark.deploy.SparkSubmitArguments

  override protected def handle(opt: String, value: String): Boolean = {

  ...

      case CONF =>

        value.split("=", 2).toSeq match {

          case Seq(k, v) => sparkProperties(k) = v

          case _ => SparkSubmit.printErrorAndExit(s"Spark config without '=': $value")

        }

 

可見會直接覆蓋,使用最後一個配置,即oozie的配置,而不是應用本身提供的配置,這樣就須要應用本身將特殊依賴打包到應用jar中,具體使用maven的maven-assembly-plugin,配置其中的<dependencySets><dependencySet><includes><include>,詳細配置以下:

 

<assembly xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0"

          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

          xsi:schemaLocation="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0 http://maven.apache.org/xsd/assembly-1.1.0.xsd">

    <!-- TODO: a jarjar format would be better -->

    <id>jar-with-dependencies</id>

    <formats>

        <format>jar</format>

    </formats>

    <includeBaseDirectory>false</includeBaseDirectory>

    <dependencySets>

        <dependencySet>

            <outputDirectory>/</outputDirectory>

            <useProjectArtifact>true</useProjectArtifact>

            <unpack>true</unpack>

            <scope>runtime</scope>

            <includes>

                <include>redis.clients:jedis</include>

                <include>org.apache.commons:commons-pool2</include>

            </includes>

        </dependencySet>

    </dependencySets>

</assembly>

 

這裏只是將默認提供的jar-with-dependencies.xml內容拷貝出來添加includes配置;

相關文章
相關標籤/搜索