根據需求,首先定義如下3大要素
採集源,即source——監控文件內容更新 : exec ‘tail -F file’
下沉目標,即sink——HDFS文件系統 : hdfs sink
Source和sink之間的傳遞通道——channel,可用file channel 也能夠用 內存channel
agent1.sources = source1
agent1.sinks = sin k1
agent1.channels = channel1
# Describe/configure tail -F source1
agent1.sources.source1.type = exec
agent1.sources.source1.command = tail -f /root/flumedata/logs/text.txt
agent1.sources.source1.channels = channel1
#configure host for source
agent1.sources.source1.interceptors = i1
agent1.sources.source1.interceptors.i1.type = host
agent1.sources.source1.interceptors.i1.hostHeader = hostname
# Describe sink1
agent1.sinks.sink1.type = hdfs
#a1.sinks.k1.channel = c1
agent1.sinks.sink1.hdfs.path =hdfs://hadoop01:9000/weblog/flume-collection/%y-%m-%d/%H-%M
agent1.sinks.sink1.hdfs.filePrefix = access_log
agent1.sinks.sink1.hdfs.maxOpenFiles = 5000
agent1.sinks.sink1.hdfs.batchSize= 10
agent1.sinks.sink1.hdfs.fileType = DataStream
agent1.sinks.sink1.hdfs.writeFormat =Text
agent1.sinks.sink1.hdfs.rollSize = 10
agent1.sinks.sink1.hdfs.rollCount = 100
agent1.sinks.sink1.hdfs.rollInterval = 6
agent1.sinks.sink1.hdfs.round = true
agent1.sinks.sink1.hdfs.roundValue = 1
agent1.sinks.sink1.hdfs.roundUnit = minute
agent1.sinks.sink1.hdfs.useLocalTimeStamp = true
# Use a channel which buffers events in memory
agent1.channels.channel1.type = memory
agent1.channels.channel1.keep-alive = 120
agent1.channels.channel1.capacity = 500000
agent1.channels.channel1.transactionCapacity = 600
# Bind the source and sink to the channel
agent1.sources.source1.channels = channel1
agent1.sinks.sink1.channel = channel1
而後往:/root/flumedata/logs/text.txt 這個文件中追加日期
while true
do
date >> /root/flumedata/logs/text.txt
done
tail -f 和 tail -F的區別:
tail -f 當文件變了,不會再輸出
tail -F當文件變了,還會再輸出
因此,咱們能夠利用tail -F實現斷點續傳的功能:
a1.sources.r2.command=
tail -n +$(tail -n1 /root/log) -F /root/data/nginx.log | awk 'ARGIND==1{i=$0;next}{i++;if($0~/^tail/){i=0};print $0;print i >> "/root/log";fflush("")}' /root/log-
若是有多個source,那必需要配置多個:a1.sources.r2.command