Flink大數據項目實戰:http://t.cn/EJtKhazhtml
官網建立Flink項目有兩種方式:java
https://ci.apache.org/projects/flink/flink-docs-release-1.6/quickstart/java_api_quickstart.htmlgit
方式一:github
mvn archetype:generate \apache
-DarchetypeGroupId=org.apache.flink \api
-DarchetypeArtifactId=flink-quickstart-java \bash
-DarchetypeVersion=1.6.2app
方式二框架
$ curl https://flink.apache.org/q/quickstart.sh | bash -s 1.6.2curl
這裏咱們仍然使用第一種方式建立Flink項目。
打開終端,切換到對應的目錄,經過maven建立flink項目
mvn archetype:generate -DarchetypeGroupId=org.apache.flink -DarchetypeArtifactId=flink-quickstart-java -DarchetypeVersion=1.6.2
項目構建過程當中須要輸入groupId,artifactId,version和package
Flink項目建立成功
打開IDEA工具,點擊open。
選擇剛剛建立的flink項目
Flink項目已經成功導入IDEA開發工具
經過maven打包測試運行
mvn clean package
刷新target目錄能夠看到剛剛打包的flink項目
Core Dependencies(核心依賴):
1.核心依賴打包在flink-dist*.jar裏
2.包含coordination, networking, checkpoints, failover, APIs, operations (such as windowing), resource management等必須的依賴
注意:核心依賴不會隨着應用打包(<scope>provided</scope>)
3.核心依賴項儘量小,並避免依賴項衝突
Pom文件中添加核心依賴
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>1.6.2</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_2.11</artifactId>
<version>1.6.2</version>
<scope>provided</scope>
</dependency>
注意:不會隨着應用打包。
User Application Dependencies(應用依賴):
connectors, formats, or libraries(CEP, SQL, ML)、
注意:應用依賴會隨着應用打包(scope保持默認值就好)
Pom文件中添加應用依賴
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka-0.10_2.11</artifactId>
<version>1.6.2</version>
</dependency>
注意:應用依賴按需選擇,會隨着應用打包,能夠經過Maven Shade插件進行打包。
Scala各版本之間是不兼容的(你基於Scala2.12開發Flink應用就不能依賴Scala2.11的依賴包)。
只使用Java的開發人員能夠選擇任何Scala版本,Scala開發人員須要選擇與他們的應用程序的Scala版本匹配的Scala版本。
不要把Hadoop依賴直接添加到Flink application,而是:
export HADOOP_CLASSPATH=`hadoop classpath`
Flink組件啓動時會使用該環境變量的
特殊狀況:若是在Flink application中須要用到Hadoop的input-/output format,只需引入Hadoop兼容包便可(Hadoop compatibility wrappers)
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-hadoop-compatibility_2.11</artifactId>
<version>1.6.2</version>
</dependency>
Flink 可使用maven-shade-plugin對Flink maven項目進行打包,具體打包命令爲mvn clean package。
1.下載
到maven官網下載安裝包,這裏咱們能夠選擇使用apache-maven-3.3.9-bin.tar.gz。
2.解壓
將apache-maven-3.3.9-bin.tar.gz安裝包上傳至主節點的,而後使用tar命令進行解壓
tar -zxvf apache-maven-3.3.9-bin.tar.gz
3.建立軟鏈接
ln -s apache-maven-3.3.9 maven
4.配置環境變量
vi ~/.bashrc
export MAVEN_HOME=/home/hadoop/app/maven
export PATH=$MAVEN_HOME/bin:$PATH
5.生效環境變量
source ~/.bashrc
6.查看maven版本
mvn –version
7. settings.xml配置阿里鏡像
添加阿里鏡像
<mirror>
<id>nexus-osc</id>
<mirrorOf>*</mirrorOf>
<name>Nexus osc</name>
<url>http://maven.aliyun.com/nexus/content/repositories/central</url>
</mirror>
編譯flink要求jdk8或者以上版本,這裏已經提早安裝好jdk1.8,具體安裝配置再也不贅敘,查看版本以下:
[hadoop@cdh01 conf]$ java -version
java version "1.8.0_51"
Java(TM) SE Runtime Environment (build 1.8.0_51-b16)
Java HotSpot(TM) 64-Bit Server VM (build 25.51-b03, mixed mode)
登陸github:https://github.com/apache/flink,獲取flink下載地址:https://github.com/apache/flink.git
打開Flink主節點終端,進入/home/hadoop/opensource目錄,經過git clone下載flink源碼:
git clone https://github.com/apache/flink.git
錯誤1:若是Linux沒有安裝git,會報以下錯誤:
bash: git: command not found
解決:git安裝步驟以下所示:
1.安裝編譯git時須要的包(注意須要在root用戶下安裝)
yum install curl-devel expat-devel gettext-devel openssl-devel zlib-devel
yum install gcc perl-ExtUtils-MakeMaker
2.刪除已有的git
yum remove git
3.下載git源碼
先安裝wget
yum -y install wget
使用wget下載git源碼
wget https://www.kernel.org/pub/software/scm/git/git-2.0.5.tar.gz
解壓git
tar xzf git-2.0.5.tar.gz
編譯安裝git
cd git-2.0.5
make prefix=/usr/local/git all
sudo make prefix=/usr/local/git install
echo "export PATH=$PATH:/usr/local/git/bin" >> ~/.bashrc
source ~/.bashrc
查看git版本
git –version
錯誤2:git clone https://github.com/apache/flink.git
Cloning into 'flink'...
fatal: unable to access 'https://github.com/apache/flink.git/': SSL connect error
解決:
升級 nss 版本:yum update nss
使用以下命令查看flink版本分支
git tag
切換到flink對應版本(這裏咱們使用flink1.6.2)
git checkout release-1.6.2
進入flink 源碼根目錄:/home/hadoop/opensource/flink,經過maven編譯flink
mvn clean install -DskipTests -Dhadoop.version=2.6.0
報錯:
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 06:58 min
[INFO] Finished at: 2019-01-18T22:11:54-05:00
[INFO] Final Memory: 106M/454M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project flink-mapr-fs: Could not resolve dependencies for project org.apache.flink:flink-mapr-fs:jar:1.6.2: Could not find artifact com.mapr.hadoop:maprfs:jar:5.2.1-mapr in nexus-osc (http://maven.aliyun.com/nexus/content/repositories/central) -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn <goals> -rf :flink-mapr-fs
報錯缺失flink-mapr-fs,須要手動下載安裝。
解決:
1.下載maprfs jar包
經過手動下載maprfs-5.2.1-mapr.jar包,下載地址地址:https://repository.mapr.com/nexus/content/groups/mapr-public/com/mapr/hadoop/maprfs/5.2.1-mapr/
2.上傳至主節點
將下載的maprfs-5.2.1-mapr.jar包上傳至主節點的/home/hadoop/downloads目錄下。
3.手動安裝
手動安裝缺乏的包到本地倉庫
mvn install:install-file -DgroupId=com.mapr.hadoop -DartifactId=maprfs -Dversion=5.2.1-mapr -Dpackaging=jar -Dfile=/home/hadoop/downloads/maprfs-5.2.1-mapr.jar
4.繼續編譯
使用maven繼續編譯flink(能夠排除剛剛已經安裝的包)
mvn clean install -Dmaven.test.skip=true -Dhadoop.version=2.7.3 -rf :flink-mapr-fs
報錯:
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 05:51 min
[INFO] Finished at: 2019-01-18T22:39:20-05:00
[INFO] Final Memory: 108M/480M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project flink-mapr-fs: Compilation failure: Compilation failure:
[ERROR] /home/hadoop/opensource/flink/flink-filesystems/flink-mapr-fs/src/main/java/org/apache/flink/runtime/fs/maprfs/MapRFileSystem.java:[70,44] package org.apache.hadoop.fs does not exist
[ERROR] /home/hadoop/opensource/flink/flink-filesystems/flink-mapr-fs/src/main/java/org/apache/flink/runtime/fs/maprfs/MapRFileSystem.java:[73,45] cannot find symbol
[ERROR] symbol: class Configuration
[ERROR] location: package org.apache.hadoop.conf
[ERROR] /home/hadoop/opensource/flink/flink-filesystems/flink-mapr-fs/src/main/java/org/apache/flink/runtime/fs/maprfs/MapRFileSystem.java:[73,93] cannot find symbol
[ERROR] symbol: class Configuration
缺失org.apache.hadoop.fs包,報錯找不到。
解決:
flink-mapr-fs模塊的pom文件中添加以下依賴:
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>${hadoop.version}</version>
</dependency>
繼續日後編譯:
mvn clean install -Dmaven.test.skip=true -Dhadoop.version=2.7.3 -rf :flink-mapr-fs
又報錯:
[ERROR] Failed to execute goal on project flink-avro-confluent-registry: Could not resolve dependencies for project org.apache.flink:flink-avro-confluent-registry:jar:1.6.2: Could not find artifact io.confluent:kafka-schema-registry-client:jar:3.3.1 in nexus-osc (http://maven.aliyun.com/nexus/content/repositories/central) -> [Help 1]
[ERROR]
報錯缺乏kafka-schema-registry-client-3.3.1.jar 包
解決:
手動下載kafka-schema-registry-client-3.3.1.jar包,下載地址以下:
將下載的kafka-schema-registry-client-3.3.1.jar上傳至主節點的目錄下/home/hadoop/downloads
手動安裝缺乏的kafka-schema-registry-client-3.3.1.jar包
mvn install:install-file -DgroupId=io.confluent -DartifactId=kafka-schema-registry-client -Dversion=3.3.1 -Dpackaging=jar -Dfile=/home/hadoop/downloads/kafka-schema-registry-client-3.3.1.jar
繼續日後編譯
mvn clean install -Dmaven.test.skip=true -Dhadoop.version=2.7.3 -rf :flink-mapr-fs