Flink學習筆記:Flink開發環境搭建

本文爲《Flink大數據項目實戰》學習筆記,想經過視頻系統學習Flink這個最火爆的大數據計算框架的同窗,推薦學習課程:

 

Flink大數據項目實戰:http://t.cn/EJtKhazhtml

 

1. 建立Flink項目及依賴管理

1.1建立Flink項目

官網建立Flink項目有兩種方式:java

https://ci.apache.org/projects/flink/flink-docs-release-1.6/quickstart/java_api_quickstart.htmlgit

 

方式一:github

mvn archetype:generate \apache

-DarchetypeGroupId=org.apache.flink \api

-DarchetypeArtifactId=flink-quickstart-java \bash

-DarchetypeVersion=1.6.2app

 

 

方式二框架

$ curl https://flink.apache.org/q/quickstart.sh | bash -s 1.6.2curl

 

 

 

這裏咱們仍然使用第一種方式建立Flink項目。

 

打開終端,切換到對應的目錄,經過maven建立flink項目

mvn archetype:generate -DarchetypeGroupId=org.apache.flink -DarchetypeArtifactId=flink-quickstart-java  -DarchetypeVersion=1.6.2

 

 

項目構建過程當中須要輸入groupId,artifactId,version和package

 

 

 

Flink項目建立成功

 

 

 

打開IDEA工具,點擊open。

 

 

 

選擇剛剛建立的flink項目

 

 

Flink項目已經成功導入IDEA開發工具

 

 

 

經過maven打包測試運行

mvn clean package

 

 

 

刷新target目錄能夠看到剛剛打包的flink項目

 

 

1.2. Flink依賴

Core Dependencies(核心依賴):

1.核心依賴打包在flink-dist*.jar裏

2.包含coordination, networking, checkpoints, failover, APIs, operations (such as windowing), resource management等必須的依賴

注意:核心依賴不會隨着應用打包(<scope>provided</scope>)

3.核心依賴項儘量小,並避免依賴項衝突

 

Pom文件中添加核心依賴

<dependency>

<groupId>org.apache.flink</groupId>

<artifactId>flink-java</artifactId>

<version>1.6.2</version>

<scope>provided</scope>

</dependency>

<dependency>

<groupId>org.apache.flink</groupId>

<artifactId>flink-streaming-java_2.11</artifactId>

<version>1.6.2</version>

<scope>provided</scope>

</dependency>

注意:不會隨着應用打包。

 

 

User Application Dependencies(應用依賴):

connectors, formats, or libraries(CEP, SQL, ML)、

注意:應用依賴會隨着應用打包(scope保持默認值就好)

Pom文件中添加應用依賴

<dependency>

<groupId>org.apache.flink</groupId>

<artifactId>flink-connector-kafka-0.10_2.11</artifactId>

<version>1.6.2</version>

</dependency>

 

注意:應用依賴按需選擇,會隨着應用打包,能夠經過Maven Shade插件進行打包。

1.3. 關於Scala版本

Scala各版本之間是不兼容的(你基於Scala2.12開發Flink應用就不能依賴Scala2.11的依賴包)。

 

只使用Java的開發人員能夠選擇任何Scala版本,Scala開發人員須要選擇與他們的應用程序的Scala版本匹配的Scala版本。

1.4. Hadoop依賴

不要把Hadoop依賴直接添加到Flink application,而是:

export HADOOP_CLASSPATH=`hadoop classpath`

Flink組件啓動時會使用該環境變量的

 

 

特殊狀況:若是在Flink application中須要用到Hadoop的input-/output format,只需引入Hadoop兼容包便可(Hadoop compatibility wrappers)

<dependency>

<groupId>org.apache.flink</groupId>

<artifactId>flink-hadoop-compatibility_2.11</artifactId>

<version>1.6.2</version>

</dependency>

1.5 Flink項目打包

Flink 可使用maven-shade-plugin對Flink maven項目進行打包,具體打包命令爲mvn clean package。

2. 本身編譯Flink

2.1安裝maven

1.下載

到maven官網下載安裝包,這裏咱們能夠選擇使用apache-maven-3.3.9-bin.tar.gz。

 

2.解壓

將apache-maven-3.3.9-bin.tar.gz安裝包上傳至主節點的,而後使用tar命令進行解壓

tar -zxvf apache-maven-3.3.9-bin.tar.gz

 

3.建立軟鏈接

ln -s apache-maven-3.3.9 maven

 

4.配置環境變量

vi ~/.bashrc

export MAVEN_HOME=/home/hadoop/app/maven

export PATH=$MAVEN_HOME/bin:$PATH

 

5.生效環境變量

source ~/.bashrc

 

6.查看maven版本

mvn –version

 

7. settings.xml配置阿里鏡像

添加阿里鏡像

<mirror>

                   <id>nexus-osc</id>

                   <mirrorOf>*</mirrorOf>

                   <name>Nexus osc</name>

                   <url>http://maven.aliyun.com/nexus/content/repositories/central</url>

         </mirror>

2.2安裝jdk

編譯flink要求jdk8或者以上版本,這裏已經提早安裝好jdk1.8,具體安裝配置再也不贅敘,查看版本以下:

[hadoop@cdh01 conf]$ java -version

java version "1.8.0_51"

Java(TM) SE Runtime Environment (build 1.8.0_51-b16)

Java HotSpot(TM) 64-Bit Server VM (build 25.51-b03, mixed mode)

2.3下載源碼

登陸github:https://github.com/apache/flink,獲取flink下載地址:https://github.com/apache/flink.git

 

打開Flink主節點終端,進入/home/hadoop/opensource目錄,經過git clone下載flink源碼:

git clone https://github.com/apache/flink.git

 

 

錯誤1:若是Linux沒有安裝git,會報以下錯誤:

bash: git: command not found

 

解決:git安裝步驟以下所示:

1.安裝編譯git時須要的包(注意須要在root用戶下安裝)

yum install curl-devel expat-devel gettext-devel openssl-devel zlib-devel

 

yum install  gcc perl-ExtUtils-MakeMaker

 

2.刪除已有的git

yum remove git

 

3.下載git源碼

 

先安裝wget

yum -y install wget

 

使用wget下載git源碼

wget https://www.kernel.org/pub/software/scm/git/git-2.0.5.tar.gz

 

解壓git

tar xzf git-2.0.5.tar.gz

 

 

編譯安裝git

cd git-2.0.5

make prefix=/usr/local/git all

sudo make prefix=/usr/local/git install

echo "export PATH=$PATH:/usr/local/git/bin" >> ~/.bashrc

source ~/.bashrc

 

查看git版本

git –version

 

 

錯誤2:git clone https://github.com/apache/flink.git

Cloning into 'flink'...

fatal: unable to access 'https://github.com/apache/flink.git/': SSL connect error

 

解決:

升級 nss 版本:yum update nss

2.4切換對應flink版本

使用以下命令查看flink版本分支

git tag

 

切換到flink對應版本(這裏咱們使用flink1.6.2)

git checkout release-1.6.2

 

2.5編譯flink

進入flink 源碼根目錄:/home/hadoop/opensource/flink,經過maven編譯flink

mvn clean install -DskipTests -Dhadoop.version=2.6.0

 

 

報錯:

[INFO] BUILD FAILURE

[INFO] ------------------------------------------------------------------------

[INFO] Total time: 06:58 min

[INFO] Finished at: 2019-01-18T22:11:54-05:00

[INFO] Final Memory: 106M/454M

[INFO] ------------------------------------------------------------------------

[ERROR] Failed to execute goal on project flink-mapr-fs: Could not resolve dependencies for project org.apache.flink:flink-mapr-fs:jar:1.6.2: Could not find artifact com.mapr.hadoop:maprfs:jar:5.2.1-mapr in nexus-osc (http://maven.aliyun.com/nexus/content/repositories/central) -> [Help 1]

[ERROR]

[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.

[ERROR] Re-run Maven using the -X switch to enable full debug logging.

[ERROR]

[ERROR] For more information about the errors and possible solutions, please read the following articles:

[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException

[ERROR]

[ERROR] After correcting the problems, you can resume the build with the command

[ERROR]   mvn <goals> -rf :flink-mapr-fs

報錯缺失flink-mapr-fs,須要手動下載安裝。

解決:

1.下載maprfs jar包

經過手動下載maprfs-5.2.1-mapr.jar包,下載地址地址:https://repository.mapr.com/nexus/content/groups/mapr-public/com/mapr/hadoop/maprfs/5.2.1-mapr/

2.上傳至主節點

將下載的maprfs-5.2.1-mapr.jar包上傳至主節點的/home/hadoop/downloads目錄下。

3.手動安裝

手動安裝缺乏的包到本地倉庫

mvn install:install-file -DgroupId=com.mapr.hadoop -DartifactId=maprfs -Dversion=5.2.1-mapr -Dpackaging=jar  -Dfile=/home/hadoop/downloads/maprfs-5.2.1-mapr.jar

4.繼續編譯

使用maven繼續編譯flink(能夠排除剛剛已經安裝的包)

mvn clean install -Dmaven.test.skip=true -Dhadoop.version=2.7.3  -rf :flink-mapr-fs

報錯:

[INFO] BUILD FAILURE

[INFO] ------------------------------------------------------------------------

[INFO] Total time: 05:51 min

[INFO] Finished at: 2019-01-18T22:39:20-05:00

[INFO] Final Memory: 108M/480M

[INFO] ------------------------------------------------------------------------

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project flink-mapr-fs: Compilation failure: Compilation failure:

[ERROR] /home/hadoop/opensource/flink/flink-filesystems/flink-mapr-fs/src/main/java/org/apache/flink/runtime/fs/maprfs/MapRFileSystem.java:[70,44] package org.apache.hadoop.fs does not exist

[ERROR] /home/hadoop/opensource/flink/flink-filesystems/flink-mapr-fs/src/main/java/org/apache/flink/runtime/fs/maprfs/MapRFileSystem.java:[73,45] cannot find symbol

[ERROR] symbol:   class Configuration

[ERROR] location: package org.apache.hadoop.conf

[ERROR] /home/hadoop/opensource/flink/flink-filesystems/flink-mapr-fs/src/main/java/org/apache/flink/runtime/fs/maprfs/MapRFileSystem.java:[73,93] cannot find symbol

[ERROR] symbol:   class Configuration

缺失org.apache.hadoop.fs包,報錯找不到。

解決:

flink-mapr-fs模塊的pom文件中添加以下依賴:

<dependency>

         <groupId>org.apache.hadoop</groupId>

         <artifactId>hadoop-common</artifactId>

         <version>${hadoop.version}</version>

</dependency>

繼續日後編譯:

mvn clean install -Dmaven.test.skip=true -Dhadoop.version=2.7.3  -rf :flink-mapr-fs

又報錯:

[ERROR] Failed to execute goal on project flink-avro-confluent-registry: Could not resolve dependencies for project org.apache.flink:flink-avro-confluent-registry:jar:1.6.2: Could not find artifact io.confluent:kafka-schema-registry-client:jar:3.3.1 in nexus-osc (http://maven.aliyun.com/nexus/content/repositories/central) -> [Help 1]

[ERROR]

報錯缺乏kafka-schema-registry-client-3.3.1.jar 包

解決:

手動下載kafka-schema-registry-client-3.3.1.jar包,下載地址以下:

http://packages.confluent.io/maven/io/confluent/kafka-schema-registry-client/3.3.1/kafka-schema-registry-client-3.3.1.jar

將下載的kafka-schema-registry-client-3.3.1.jar上傳至主節點的目錄下/home/hadoop/downloads

 

手動安裝缺乏的kafka-schema-registry-client-3.3.1.jar包

mvn install:install-file -DgroupId=io.confluent -DartifactId=kafka-schema-registry-client -Dversion=3.3.1 -Dpackaging=jar  -Dfile=/home/hadoop/downloads/kafka-schema-registry-client-3.3.1.jar

繼續日後編譯

mvn clean install -Dmaven.test.skip=true -Dhadoop.version=2.7.3  -rf :flink-mapr-fs

 

相關文章
相關標籤/搜索