spark Local環境搭建,第一個DEMO程序的編寫html
去 http://spark.apache.org/downloads.html 網站下載spark,我下載的是spark-1.6.1-bin-hadoop2.6,spark版本是1.6.1,同時下載hadoop-2.6.0.tar.gzjava
spark是基於hadoop之上的,運行過程當中會調用相關hadoop庫,若是沒配置相關hadoop運行環境,會出錯.git
至此,在cmd命令下輸入spark-shell.正常輸出便是成功.shell
POM:apache
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>org.credo</groupId> <artifactId>spark-test</artifactId> <version>1.0-SNAPSHOT</version> <dependencies> <!-- http://mvnrepository.com/artifact/org.apache.spark/spark-core_2.10 --> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.10</artifactId> <version>1.6.1</version> </dependency> </dependencies> <build> <pluginManagement> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <version>3.5.1</version> <configuration> <source>1.8</source> <target>1.8</target> <encoding>UTF-8</encoding> <compilerArgument>-proc:none</compilerArgument> </configuration> </plugin> </plugins> </pluginManagement> </build> </project>
main方法:windows
package org.credo; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaPairRDD; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.api.java.function.FlatMapFunction; import org.apache.spark.api.java.function.Function2; import org.apache.spark.api.java.function.PairFunction; import scala.Tuple2; import java.util.Arrays; import java.util.UUID; /** * Created by ZhaoQian on 2016/6/12. */ public class spark { public static void main(String[] args) { System.out.println("================spark begin=============================="); System.setProperty("hadoop.home.dir", "D:\\software\\bigdata\\hadoop-2.6.0"); //建立一個Java版本的spark Context SparkConf sparkConf=new SparkConf().setAppName("wordCount"); JavaSparkContext javaSparkContext=new JavaSparkContext(sparkConf); //讀取某個文件 JavaRDD<String> input=javaSparkContext.textFile("D:\\logger\\server.log2"); /**普通的寫法*/ // JavaRDD<String> words=input.flatMap( // new FlatMapFunction<String, String>() { // @Override // public Iterable<String> call(String s) throws Exception { // return Arrays.asList(s.split(" ")); // } // } // ); // //轉換爲鍵值對並計數 // JavaPairRDD<String,Integer> counts=words.mapToPair(new PairFunction<String, String, Integer>() { // @Override // public Tuple2<String, Integer> call(String s) throws Exception { // return new Tuple2<String, Integer>(s,1); // } // }).reduceByKey(new Function2<Integer, Integer, Integer>() { // @Override // public Integer call(Integer v1, Integer v2) throws Exception { // return v1+v2; // } // }); //切分爲單詞,上面是默認方法,下面是lambda表達式. JavaRDD<String> words=input .flatMap((FlatMapFunction<String, String>) s -> Arrays.asList(s.split(" "))); JavaPairRDD<String,Integer> counts=words .mapToPair((PairFunction<String, String, Integer>) s -> new Tuple2<>(s,1)) .reduceByKey((Function2<Integer, Integer, Integer>) (v1, v2) -> v1+v2); //在文件中顯示統計的單詞信息 ("某單詞","單詞統計出的次數") counts.saveAsTextFile("D://logger//"+ UUID.randomUUID().toString()); System.out.println("================spark end=============================="); } }
解決A master URL must be set in your configuration錯誤api
在運行spark的測試程序SparkPi時,點擊運行,出現了以下錯誤:多線程
從提示中能夠看出找不到程序運行的master,此時須要配置環境變量。 傳遞給spark的master url能夠有以下幾種:app
VM options中輸入「-Dspark.master=local」,指示本程序本地單線程運行,再次運行便可。dom
_Failed to locate the winutils binary in the hadoop binary path java.io.IOExc [權限或文件缺失,或者是hadoop環境未配置正確引發]: https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-tips-and-tricks-running-spark-windows.html