版本號:
maven3.5.0 scala IDE for Eclipse:版本(4.6.1) spark-2.1.1-bin-hadoop2.7 kafka_2.11-0.8.2.1 JDK1.8java
基礎環境:
1、指定JDK爲1.8
在pom.xml配置文件中添加如下參數便可:apache
- <properties>
- <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
- <encoding>UTF-8</encoding>
- <java.version>1.8</java.version>
- <maven.compiler.source>1.8</maven.compiler.source>
- <maven.compiler.target>1.8</maven.compiler.target>
- </properties>
- <plugin>
- <groupId>org.apache.maven.plugins</groupId>
- <artifactId>maven-compiler-plugin</artifactId>
- <configuration>
- <source>1.8</source>
- <target>1.8</target>
- </configuration>
- </plugin>
配置以後的pom.xml文件以下:api
- <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
- xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
- <modelVersion>4.0.0</modelVersion>
-
- <groupId>Test</groupId>
- <artifactId>test</artifactId>
- <version>0.0.1-SNAPSHOT</version>
- <packaging>jar</packaging>
-
- <name>test</name>
- <url>http://maven.apache.org</url>
-
- <properties>
- <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
- <encoding>UTF-8</encoding>
- <!-- 配置JDK爲1.8 -->
- <java.version>1.8</java.version>
- <maven.compiler.source>1.8</maven.compiler.source>
- <maven.compiler.target>1.8</maven.compiler.target>
- </properties>
-
- <dependencies>
- <dependency>
- <groupId>junit</groupId>
- <artifactId>junit</artifactId>
- <version>3.8.1</version>
- <scope>test</scope>
- </dependency>
- </dependencies>
-
- <build>
- <plugins>
- <!-- 配置JDK爲1.8 -->
- <plugin>
- <groupId>org.apache.maven.plugins</groupId>
- <artifactId>maven-compiler-plugin</artifactId>
- <configuration>
- <source>1.8</source>
- <target>1.8</target>
- </configuration>
- </plugin>
-
- <!-- 配置打包依賴包maven-assembly-plugin -->
- <plugin>
- <artifactId> maven-assembly-plugin </artifactId>
- <configuration>
- <descriptorRefs>
- <descriptorRef>jar-with-dependencies</descriptorRef>
- </descriptorRefs>
- <archive>
- <manifest>
- <mainClass></mainClass>
- </manifest>
- </archive>
- </configuration>
- <executions>
- <execution>
- <id>make-assembly</id>
- <phase>package</phase>
- <goals>
- <goal>assembly</goal>
- </goals>
- </execution>
- </executions>
- </plugin>
- </plugins>
- </build>
- </project>
2、配置Spark依賴包
查看spark-2.1.1-bin-hadoop2.7/jars目錄下的jar包版本maven
![](http://static.javashuo.com/static/loading.gif)
到maven遠程倉庫http://mvnrepository.com中搜索對應jar包便可。ide
![](http://static.javashuo.com/static/loading.gif)
一、配置spark-core_2.11-2.1.1.jar
往pom.xml文件中添加如下配置:oop
- <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.11 -->
- <dependency>
- <groupId>org.apache.spark</groupId>
- <artifactId>spark-core_2.11</artifactId>
- <version>2.1.1</version>
- <scope>runtime</scope>
- </dependency>
爲了後面打包時把依賴包也一塊兒打包,須要把<scope>provided</scope>配置成<scope>runtime</scope>。ui
二、配置spark-streaming_2.11-2.1.1.jar
往pom.xml文件中添加如下配置:url
- <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-streaming_2.11 -->
- <dependency>
- <groupId>org.apache.spark</groupId>
- <artifactId>spark-streaming_2.11</artifactId>
- <version>2.1.1</version>
- <scope>runtime</scope>
- </dependency>
爲了後面打包時把依賴包也一塊兒打包,須要把<scope>provided</scope>配置成<scope>runtime</scope>。spa
3、配置Spark+Kafka
一、配置spark-streaming-kafka-0-8_2.11-2.1.1.jar
往pom.xml文件中添加如下配置:.net
- <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka-0-8_2.11 -->
- <dependency>
- <groupId>org.apache.spark</groupId>
- <artifactId>spark-streaming-kafka-0-8_2.11</artifactId>
- <version>2.1.1</version>
- </dependency>
4、pom.xml完整配置內容
- <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
- xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
- <modelVersion>4.0.0</modelVersion>
-
- <groupId>Test</groupId>
- <artifactId>test</artifactId>
- <version>0.0.1-SNAPSHOT</version>
- <packaging>jar</packaging>
-
- <name>test</name>
- <url>http://maven.apache.org</url>
-
- <properties>
- <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
- <encoding>UTF-8</encoding>
- <!-- 配置JDK爲1.8 -->
- <java.version>1.8</java.version>
- <maven.compiler.source>1.8</maven.compiler.source>
- <maven.compiler.target>1.8</maven.compiler.target>
- </properties>
-
- <dependencies>
- <dependency>
- <groupId>junit</groupId>
- <artifactId>junit</artifactId>
- <version>3.8.1</version>
- <scope>test</scope>
- </dependency>
- <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.11 -->
- <dependency>
- <groupId>org.apache.spark</groupId>
- <artifactId>spark-core_2.11</artifactId>
- <version>2.1.1</version>
- <scope>runtime</scope>
- </dependency>
- <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-streaming_2.11 -->
- <dependency>
- <groupId>org.apache.spark</groupId>
- <artifactId>spark-streaming_2.11</artifactId>
- <version>2.1.1</version>
- <scope>runtime</scope>
- </dependency>
- <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka-0-8_2.11 -->
- <dependency>
- <groupId>org.apache.spark</groupId>
- <artifactId>spark-streaming-kafka-0-8_2.11</artifactId>
- <version>2.1.1</version>
- </dependency>
- </dependencies>
-
- <build>
- <plugins>
- <!-- 配置JDK爲1.8 -->
- <plugin>
- <groupId>org.apache.maven.plugins</groupId>
- <artifactId>maven-compiler-plugin</artifactId>
- <configuration>
- <source>1.8</source>
- <target>1.8</target>
- </configuration>
- </plugin>
-
- <!-- 配置打包依賴包maven-assembly-plugin -->
- <plugin>
- <artifactId> maven-assembly-plugin </artifactId>
- <configuration>
- <descriptorRefs>
- <descriptorRef>jar-with-dependencies</descriptorRef>
- </descriptorRefs>
- <archive>
- <manifest>
- <mainClass></mainClass>
- </manifest>
- </archive>
- </configuration>
- <executions>
- <execution>
- <id>make-assembly</id>
- <phase>package</phase>
- <goals>
- <goal>assembly</goal>
- </goals>
- </execution>
- </executions>
- </plugin>
- </plugins>
- </build>
- </project>
5、本地開發spark代碼上傳spark集羣服務並運行
JavaDirectKafkaCompare.java
- package com.spark.main;
-
- import java.util.HashMap;
- import java.util.HashSet;
- import java.util.Arrays;
- import java.util.Iterator;
- import java.util.Map;
- import java.util.Set;
- import java.util.regex.Pattern;
-
- import scala.Tuple2;
- import kafka.serializer.StringDecoder;
-
- import org.apache.spark.SparkConf;
- import org.apache.spark.api.java.function.*;
- import org.apache.spark.streaming.api.java.*;
- import org.apache.spark.streaming.kafka.KafkaUtils;
- import org.apache.spark.streaming.Durations;
-
- public class JavaDirectKafkaCompare {
-
- public static void main(String[] args) throws Exception {
- /**
- * setMaster("local[2]"),至少要指定兩個線程,一條用於用於接收消息,一條線程用於處理消息
- * Durations.seconds(2)每兩秒讀取一次kafka
- */
- SparkConf sparkConf = new SparkConf().setAppName("JavaDirectKafkaWordCount").setMaster("local[2]");
- JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, Durations.seconds(2));
- /**
- * checkpoint("hdfs://192.168.168.200:9000/checkpoint")防止數據丟包
- */
- jssc.checkpoint("hdfs://192.168.168.200:9000/checkpoint");
- /**
- * 配置鏈接kafka的相關參數
- */
- Set<String> topicsSet = new HashSet<>(Arrays.asList("test"));
- Map<String, String> kafkaParams = new HashMap<>();
- kafkaParams.put("metadata.broker.list", "192.168.168.200:9092");
-
- // Create direct kafka stream with brokers and topics
- JavaPairInputDStream<String, String> messages = KafkaUtils.createDirectStream(
- jssc,
- String.class,
- String.class,
- StringDecoder.class,
- StringDecoder.class,
- kafkaParams,
- topicsSet
- );
-
- // Get the lines, split them into words, count the words and print
- /**
- * _2()獲取第二個對象的值
- */
- JavaDStream<String> lines = messages.map(new Function<Tuple2<String, String>, String>() {
- @Override
- public String call(Tuple2<String, String> tuple2) {
- return tuple2._2();
- }
- });
-
- String sfzh = "432922196105276721";
- JavaDStream<String> wordCounts = lines.filter(new Function<String, Boolean>(){
- @Override
- public Boolean call(String s) throws Exception {
- // TODO Auto-generated method stub
- /**
- * 經過身份證號篩選出相關數據
- */
- if(s.contains(sfzh)){
- System.out.println("比對出來的結果:" + s);
- return true;
- }
- return false;
- }
- });
- wordCounts.print();
- // Start the computation
- jssc.start();
- jssc.awaitTermination();
- }
-
- }
右鍵Run As ------>Maven install,運行成功以後,會在target目錄生成一個test-0.0.1-SNAPSHOT-jar-with-dependencies.jar,把該jar包複製到LInux集羣環境下的SPARK_HOME/myApp目錄下:
![](http://static.javashuo.com/static/loading.gif)
執行命令:
- cd /usr/local/spark/spark-2.1.1-bin-hadoop2.7;
- bin/spark-submit --class "com.spark.main.JavaDirectKafkaCompare" --master local[4] myApp/test-0.0.1-SNAPSHOT-jar-with-dependencies.jar;
![](http://static.javashuo.com/static/loading.gif)
6、附上離線Maven倉庫
下載地址: 連接:http://pan.baidu.com/s/1eS7Ywme 密碼:y3qz