Flink教程-快速開始

Flink教程-快速開始

Apache Flink 是一個開源的分佈式批數據以及流數據處理平臺。目前已經升級爲 Apache 頂級開源項目。不管是 Spark 仍是 Flink,他們的主要優點都是基於內存運行機器學習算法,運行速度很是快,並且 Flink 支持迭代計算。做爲大數據挖掘工程師兩個工具都必須掌握。
Flink 剛剛開源,國內關注人數不是不少,源代碼量也不大,可是看 Spark 的源碼就有點困難了,因此學習 Flink,也能學習到一個優秀的分佈式框架是怎麼樣一步一步構建起來的。html

Flink安裝準備

Flink運行支持 Linux、蘋果、Windows 主流平臺。不過最好仍是使用 Linux。下面給出安裝前的準備:java

  1. 安裝 Jdk1.7.X 或者以上的版本
  2. 在 Flink 官網下載對應 Hadoop 預編譯版本

將預編譯版本解壓,進入解壓縮文件,爲了方便,後文統一稱此目錄爲:FLINK_HOME。算法

開始安裝

單機快速嘗試

單機嘗試很是簡單,直接執行命令:apache

  1. Linux用戶: sh bin/start-local.sh
  2. Windows用戶,在命令窗戶輸入:bin\start-local.bat

等待其出現以下提示以後:api

D:\Java\flink\flink-0.10.1>bin\start-local.bat
Starting Flink job manager. Webinterface by default on http://localhost:8081/.
Don't close this batch window. Stop job manager by pressing Ctrl+C.

在瀏覽器中輸入:http://localhost:8081/,Flink默認監聽8081端口,防止其餘進程佔用此端口。此時出現下面的管理界面:
Flink Web 管理界面
能夠發現這個界面和 Spark 的管理界面的邏輯差很少,主要是管理正在運行的Job,已經完成的 Job,以及Task 管理和 Job 管理,Task 應該是管理 Job 的,之後再仔細分析裏面的邏輯。瀏覽器

跑第一個例子

下面火燒眉毛先來跑一個分佈式系統最經典的例子:WordCount,下面以 FLINK_HOME 的 README.txt 文件做爲示例文件,測試 WordCount 程序,在 Windows 上面運行代碼以及運行過程以下圖:微信

D:\Java\flink\flink-0.10.1>bin\flink.bat run .\examples\WordCount.jar file:/D:/Java/flink/flink-0.10.1/README.txt file:/D:/Java/flink/flink-0.10.1/wordcount-result.txt
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.li
b.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
01/15/2016 16:30:51     Job execution switched to status RUNNING.
01/15/2016 16:30:51     CHAIN DataSource (at getTextDataSet(WordCount.java:142)
(org.apache.flink.api.java.io.TextInputFormat)) -> FlatMap (FlatMap at main(WordCount.java:69)) -> Combine(SUM(1), at main(WordCount.java:72)(1/1) switched to SCHEDULED
01/15/2016 16:30:51     CHAIN DataSource (at getTextDataSet(WordCount.java:142)
(org.apache.flink.api.java.io.TextInputFormat)) -> FlatMap (FlatMap at main(WordCount.java:69)) -> Combine(SUM(1), at main(WordCount.java:72)(1/1) switched to DEPLOYING
01/15/2016 16:30:52     CHAIN DataSource (at getTextDataSet(WordCount.java:142)
(org.apache.flink.api.java.io.TextInputFormat)) -> FlatMap (FlatMap at main(WordCount.java:69)) -> Combine(SUM(1), at main(WordCount.java:72)(1/1) switched to RUNNING
01/15/2016 16:30:52     Reduce (SUM(1), at main(WordCount.java:72)(1/1) switched to SCHEDULED
01/15/2016 16:30:52     Reduce (SUM(1), at main(WordCount.java:72)(1/1) switched to DEPLOYING
01/15/2016 16:30:52     CHAIN DataSource (at getTextDataSet(WordCount.java:142)
(org.apache.flink.api.java.io.TextInputFormat)) -> FlatMap (FlatMap at main(WordCount.java:69)) -> Combine(SUM(1), at main(WordCount.java:72)(1/1) switched to FINISHED
01/15/2016 16:30:52     Reduce (SUM(1), at main(WordCount.java:72)(1/1) switched to RUNNING
01/15/2016 16:30:53     DataSink (CsvOutputFormat (path: file:/D:/Java/flink/flink-0.10.1/wordcount-result.txt, delimiter:  ))(1/1) switched to SCHEDULED
01/15/2016 16:30:53     DataSink (CsvOutputFormat (path: file:/D:/Java/flink/flink-0.10.1/wordcount-result.txt, delimiter:  ))(1/1) switched to DEPLOYING
01/15/2016 16:30:53     Reduce (SUM(1), at main(WordCount.java:72)(1/1) switched to FINISHED
01/15/2016 16:30:53     DataSink (CsvOutputFormat (path: file:/D:/Java/flink/flink-0.10.1/wordcount-result.txt, delimiter:  ))(1/1) switched to RUNNING
01/15/2016 16:30:53     DataSink (CsvOutputFormat (path: file:/D:/Java/flink/flink-0.10.1/wordcount-result.txt, delimiter:  ))(1/1) switched to FINISHED
01/15/2016 16:30:53     Job execution switched to status FINISHED.

能夠看到輸出日誌很是詳細,很方便就清楚整個運行流程,獲得輸出文件 wordcount-result.txt 前面10條內容以下 :app

1 1
13 1
5d002 1
740 1
about 1
account 1
administration 1
algorithms 1
and 7
another 1
any 2

我的微信公衆號

歡迎關注本人微信公衆號,會定時發送關於大數據、機器學習、Java、Linux 等技術的學習文章,並且是一個系列一個系列的發佈,無任何廣告,純屬我的興趣。
Clebeg能量集結號框架

相關文章
相關標籤/搜索