以前爲了搭建scala開發spark的環境花了幾天的時間,終於搞定了,具體能夠參考:http://www.cnblogs.com/ljy2013/p/4964201.html 。下面就是用一個示例來測試本身的開發環境了,因而就只用了大數據比較經典的例子:WordCount。下面詳細說明一下:html
一、首先安裝以前搭建的環境,建立maven工程來寫scala的代碼。工程目錄以下:java
二、編寫代碼git
package com.yiban.datacenter.Spark_demo import org.apache.spark.SparkContext import org.apache.spark.SparkConf import org.apache.hadoop.conf.Configuration import org.apache.hadoop.fs.FileSystem /** * @author ${user.name} */ object App { def foo(x : Array[String]) = x.foldLeft("")((a,b) => a + b) def main(args : Array[String]) { //hadoop configuration 沒有這個在local模式下會報錯 val hadoopconf = new Configuration(); hadoopconf.setBoolean("fs.hdfs.impl.disable.cache", true); val fileSystem = FileSystem.get(hadoopconf); //spark configuration val conf = new SparkConf().setAppName("wordcount").setMaster("yarn-cluster") //這裏採用yarn集羣的方式運行 val sc = new SparkContext(conf) val wordcount=sc.textFile("/user/liujiyu/input", 1).flatMap(_.split(" ")).map(word=>(word,1)).reduceByKey(_+_).saveAsTextFile("/user/liujiyu/sparkwordcountoutput") val data = Array(1, 2, 3, 4, 5) val data2=Seq(1,2,3) val distData = sc.parallelize(data) distData.saveAsTextFile("/user/liujiyu/spark-demo") } }
三、pom.xml文件內容以下:github
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.yiban.datacenter</groupId> <artifactId>Spark-demo</artifactId> <version>0.0.1-SNAPSHOT</version> <name>${project.artifactId}</name> <description>My wonderfull scala app</description> <inceptionYear>2015</inceptionYear> <licenses> <license> <name>My License</name> <url>http://....</url> <distribution>repo</distribution> </license> </licenses> <properties> <maven.compiler.source>1.6</maven.compiler.source> <maven.compiler.target>1.6</maven.compiler.target> <encoding>UTF-8</encoding> <scala.version>2.10.5</scala.version> <scala.compat.version>2.10</scala.compat.version> </properties> <repositories> <repository> <id>cloudera-repo-releases</id> <url>https://repository.cloudera.com/artifactory/repo/</url> </repository> </repositories> <dependencies> <dependency> <groupId>org.apache.spark</groupId> <artifactId> spark-core_2.10</artifactId> <version>1.5.2</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId> hadoop-client</artifactId> <version>2.6.0-cdh5.4.4</version> </dependency> <dependency> <groupId>org.scala-lang</groupId> <artifactId>scala-library</artifactId> <version>${scala.version}</version> </dependency> <!-- Test --> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.11</version> <scope>test</scope> </dependency> <dependency> <groupId>org.specs2</groupId> <artifactId>specs2-core_${scala.compat.version}</artifactId> <version>2.4.16</version> <scope>test</scope> </dependency> <dependency> <groupId>org.scalatest</groupId> <artifactId>scalatest_${scala.compat.version}</artifactId> <version>2.2.4</version> <scope>test</scope> </dependency> </dependencies> <build> <sourceDirectory>src/main/scala</sourceDirectory> <testSourceDirectory>src/test/scala</testSourceDirectory> <plugins> <plugin> <!-- see http://davidb.github.com/scala-maven-plugin --> <groupId>net.alchim31.maven</groupId> <artifactId>scala-maven-plugin</artifactId> <version>3.2.0</version> <executions> <execution> <goals> <goal>compile</goal> <goal>testCompile</goal> </goals> <configuration> <args> <arg>-make:transitive</arg> <arg>-dependencyfile</arg> <arg>${project.build.directory}/.scala_dependencies</arg> </args> </configuration> </execution> </executions> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <version>2.18.1</version> <configuration> <useFile>false</useFile> <disableXmlReport>true</disableXmlReport> <!-- If you have classpath issue like NoDefClassError,... --> <!-- useManifestOnlyJar>false</useManifestOnlyJar --> <includes> <include>**/*Test.*</include> <include>**/*Suite.*</include> </includes> </configuration> </plugin> </plugins> </build> </project>
四、執行maven clean package 對工程進行打包。apache
五、將對應打包好的文件放到集羣上去運行app
執行以下命令進行運行:maven
spark-submit --class "com.yiban.datacenter.Spark_demo.App" --master yarn-cluster Spark-demo-0.0.1-SNAPSHOT.jaroop
運行結束,會在對應路徑產生結果,查看hdfs對應路徑結果便可。測試