Maven
項目在pom.xml
配置文件中配置spark開發所須要的包,根據你Spark
版本找對應的包,Maven中央倉庫nginx
<dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.11</artifactId> <version>2.3.1</version> </dependency>
Artifacts
構建包Maven
構建包Maven
構建包只須要在pom.xml
中添加以下插件(maven-shade-plugin
)便可<plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-shade-plugin</artifactId> <version>2.4.1</version> <executions> <execution> <phase>package</phase> <goals> <goal>shade</goal> </goals> <configuration> <filters> <filter> <artifact>*:*</artifact> <excludes> <exclude>META-INF/*.SF</exclude> <exclude>META-INF/*.DSA</exclude> <exclude>META-INF/*.RSA</exclude> </excludes> </filter> </filters> <transformers> <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer"> <resource>META-INF/spring.handlers</resource> </transformer> <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer"> <resource>META-INF/spring.schemas</resource> </transformer> <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"> <mainClass>cn.mucang.sensor.SensorMain</mainClass> </transformer> </transformers> </configuration> </execution> </executions> </plugin>
scala
代碼import org.apache.spark.storage.StorageLevel import org.apache.spark.{SparkConf, SparkContext} object InfoOutput { def main(args: Array[String]): Unit = { val sparkConf = new SparkConf().setMaster("local[*]").setAppName("NginxLog") val sc = new SparkContext(sparkConf) val fd = sc.textFile("hdfs:///xxx/logs/access.log") val logRDD = fd.filter(_.contains(".baidu.com")).map(_.split(" ")) logRDD.persist(StorageLevel.DISK_ONLY) val ipTopRDD = logRDD.map(v => v(2)).countByValue().take(10) ipTopRDD.foreach(println) } }
Jar
包scp
上傳Jar
包到spark-submit服務器,Jar
位置在項目的out目錄下spark-submit --class InfoOutput --verbose --master yarn --deploy-mode cluster nginxlogs.jar