Idea+maven+scala構建包並在spark on yarn 運行

配置Maven項目

pom.xml配置文件中配置spark開發所須要的包,根據你Spark版本找對應的包,Maven中央倉庫nginx

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.11</artifactId>
    <version>2.3.1</version>
</dependency>

構建方式

配置Artifacts構建包

圖片描述
圖片描述
圖片描述

配置Maven構建包

  • 使用Maven構建包只須要在pom.xml中添加以下插件(maven-shade-plugin)便可
<plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>2.4.1</version>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <filters>
                                <filter>
                                    <artifact>*:*</artifact>
                                    <excludes>
                                        <exclude>META-INF/*.SF</exclude>
                                        <exclude>META-INF/*.DSA</exclude>
                                        <exclude>META-INF/*.RSA</exclude>
                                    </excludes>
                                </filter>
                            </filters>
                            <transformers>
                                <transformer
                                        implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/spring.handlers</resource>
                                </transformer>
                                <transformer
                                        implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/spring.schemas</resource>
                                </transformer>
                                <transformer
                                        implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                    <mainClass>cn.mucang.sensor.SensorMain</mainClass>
                                </transformer>
                            </transformers>
                        </configuration>
                    </execution>
                </executions>
            </plugin>

構建示例scala代碼

import org.apache.spark.storage.StorageLevel
import org.apache.spark.{SparkConf, SparkContext}

object InfoOutput {
  def main(args: Array[String]): Unit = {
     val sparkConf = new SparkConf().setMaster("local[*]").setAppName("NginxLog")
    val sc = new SparkContext(sparkConf)
    val fd = sc.textFile("hdfs:///xxx/logs/access.log")
    val logRDD = fd.filter(_.contains(".baidu.com")).map(_.split(" "))
    logRDD.persist(StorageLevel.DISK_ONLY)
    val ipTopRDD = logRDD.map(v => v(2)).countByValue().take(10)
    ipTopRDD.foreach(println)
  }
}

圖片描述

上傳Jar

  • 使用scp上傳Jar包到spark-submit服務器,Jar位置在項目的out目錄下
  • 由於沒有依賴第三方包因此打出怕jar會很小,使用spark-submit提示任務:
spark-submit --class InfoOutput --verbose --master yarn --deploy-mode cluster nginxlogs.jar
相關文章
相關標籤/搜索