在Flume的安裝及簡單的使用(一) 的基礎上系統環境之上添加hadoop-2.7.3
,並建立hadoop僞分佈集羣並建立。css
hadoop僞分佈集羣的搭建,請參考:http://blog.csdn.net/qq_38799155/article/details/77748831java
在hadoop用戶下配置:ruby
$ vi .bashrc
添加以下內容bash
export FLUME_HOME=/home/hadoop/flume
export PATH=$PATH:$FLUME_HOME/bin
以後source一下,使其生效markdown
$ source .bashrc
$ cd /home/hadoop/flume/conf
$ cp flume-env.sh.template flume-env.sh
$ vi flume-env.sh
添加以下內容oop
export JAVA_HOME=/usr/java/jdk1.8.0_121
export HADOOP_HOME=/home/hadoop/hadoop-2.7.3
如圖所示:
性能
$ flume-ng version
如圖所示:
測試
Flume能夠經過Avro監聽某個端口並捕獲傳輸的數據,具體示例以下:ui
//建立一個Flume配置文件
$ cd /home/hadoop/flume/
$ mkdir example
$ cp conf/flume-conf.properties.template example/netcat.conf
進入到/home/hadoop/flume/example/
下的netcat.conf
文件進行修改this
$ cd /home/hadoop/flume/example/
$ vi netcat.conf
修改以下( 配置netcat.conf用於實時獲取另外一終端輸入的數據):
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444
# Describe the sink
a1.sinks.k1.type = logger
# Use a channel that buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
運行FlumeAgent,監聽本機的44444端口
flume-ng agent -c conf -f netcat.conf -n a1 -Dflume.root.logger=INFO,console
效果如圖所示:
打開另外一終端,經過telnet登陸localhost的44444,輸入測試數據
$ telnet localhost 44444
如圖所示,證實啓動成功
查看flume收集數據狀況
Spool用於監測配置的目錄下新增的文件,並將文件中的數據讀取出來。須要注意兩點:拷貝到spool目錄下的文件不能夠再打開編輯、spool目錄下不可包含相應的子目錄。具體示例以下:
$ cd /home/hadoop/flume/
$ cp conf/flume-conf.properties.template example/spool1.conf
$ cp conf/flume-conf.properties.template example/spool2.conf
$ cd /home/hadoop/flume/example/
$ vi spool1.conf
修改內容以下
# Namethe components
local1.sources= r1
local1.sinks= k1
local1.channels= c1
# Source
local1.sources.r1.type= spooldir
local1.sources.r1.spoolDir= /home/hadoop/avro_data
# Sink
local1.sinks.k1.type= avro
local1.sinks.k1.hostname= localhost
local1.sinks.k1.port= 60000
#Channel
local1.channels.c1.type= memory
# Bindthe source and sink to the channel
local1.sources.r1.channels= c1
local1.sinks.k1.channel= c1
$ cd /home/hadoop/flume/example/
$ vi spool2.conf
修改內容以下
# Namethe components
a1.sources= r1
a1.sinks= k1
a1.channels= c1
# Source
a1.sources.r1.type= avro
a1.sources.r1.channels= c1
a1.sources.r1.bind= localhost
a1.sources.r1.port= 60000
# Sink
a1.sinks.k1.type= hdfs
a1.sinks.k1.hdfs.path= hdfs://hadoop:9000/home/hadoop/hadoop-2.7.3/flumeData
a1.sinks.k1.rollInterval= 0
a1.sinks.k1.hdfs.writeFormat= Text
a1.sinks.k1.hdfs.fileType= DataStream
# Channel
a1.channels.c1.type= memory
a1.channels.c1.capacity= 10000
# Bind the source and sink to the channel
a1.sources.r1.channels= c1
a1.sinks.k1.channel= c1
4.分別打開兩個終端,運行以下命令啓動兩個Flume Agent
都在/home/hadoop/flume/example/
運行
$ flume-ng agent -c conf -f spool2.conf -n a1
$ flume-ng agent -c conf -f spool1.conf -n local1
5.查看本地文件系統中須要監控的avro_data目錄內容
在hadoop用戶的根目錄下建立avro_data文件夾
$ cd /home/hadoop/
$ mkdir avro_data
$ cd avro_data
$ vi avro_data.txt
添加內容以下(內容隨便寫):
1,first_name,age,address
2,James,55,6649 N Blue Gum St
3,Art,62,8 W Cerritos Ave #54
4,Lenna,56,639 Main St
5,Donette,2,34 Center St
6,YuKi,35,1 State Route 27
7,Ammy,28,322 New Horizon Blvd
8,Abel,26,37275 SSt Rt 17m M
9,Leota,52,7 W Jackson Blvd
10,Kris,36,228 Runamuck P1 #2808
11,Kiley,32,25 E 75th St #69
12,Simona,32,3 Mcauley Dr
13,Sage,25,5 Boston Ave #88
14,Mitsue,23,7 Eads St
15,Mattile, 12,73 State Road 434 E
以後
$ cd /home/hadoop/avro_data/
$ cat avro_data.txt
如圖所示:
6.查看寫HDFS的Agent,檢查是否捕獲了數據別寫入HDFS
17/09/19 02:37:15 INFO hdfs.BucketWriter: Creating hdfs://hadoop:9000/home/hadoop/hadoop-2.7.3/flumeData/FlumeData.1505759834441.tmp
17/09/19 02:37:20 INFO hdfs.BucketWriter: Closing hdfs://hadoop:9000/home/hadoop/hadoop-2.7.3/flumeData/FlumeData.1505759834441.tmp
17/09/19 02:37:20 INFO hdfs.BucketWriter: Renaming hdfs://hadoop:9000/home/hadoop/hadoop-2.7.3/flumeData/FlumeData.1505759834441.tmp to hdfs://hadoop:9000/home/hadoop/hadoop-2.7.3/flumeData/FlumeData.1505759834441
17/09/19 02:37:20 INFO hdfs.BucketWriter: Creating hdfs://hadoop:9000/home/hadoop/hadoop-2.7.3/flumeData/FlumeData.1505759834442.tmp
17/09/19 02:37:50 INFO hdfs.BucketWriter: Closing hdfs://hadoop:9000/home/hadoop/hadoop-2.7.3/flumeData/FlumeData.1505759834442.tmp
17/09/19 02:37:50 INFO hdfs.BucketWriter: Renaming hdfs://hadoop:9000/home/hadoop/hadoop-2.7.3/flumeData/FlumeData.1505759834442.tmp to hdfs://hadoop:9000/home/hadoop/hadoop-2.7.3/flumeData/FlumeData.1505759834442
7.經過WEB UI查看HDFS中的文件
Flume內置了大量的Source,其中Avro Source、Thrift Source、Spooling Directory Source、Kafka Source具備較好的性能和較普遍的使用場景。下面是Source的一些參考資料: