到官網上下載 flume 1.7版本java
剛開始的想法是從IIS--->Flume-->Hdfslinux
但在採集的時候一直報錯,沒法直接鏈接到遠程的hdfsapache
22 二月 2017 14:59:04,566 WARN [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSEventSink.process:443) - HDFS IO error java.io.IOException: Callable timed out after 10000 ms on file: hdfs://192.168.1.75:9008/iis/2017-02-22/u_ex151127.log.1487746609021.tmp at org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:682) at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:232) at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:504) at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:406) at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67) at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145) at java.lang.Thread.run(Thread.java:745) Caused by: java.util.concurrent.TimeoutException at java.util.concurrent.FutureTask.get(FutureTask.java:205) at org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:675) ... 6 more
因此後面有選用折中的辦法,從 windows flume 採集到linux的flume,再到hdfswindows
IIS-->(Windows)Flume-->(Linux)Flume-->Hdfsapp
採集端windows flume配置文件以下:網站
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = spooldir
a1.sources.r1.channels = c1
a1.sources.r1.spoolDir = C:\\inetpub\\logs\\LogFiles\\W3SVC4
a1.sources.r1.fileHeader = true
a1.sources.r1.basenameHeader = true
a1.sources.r1.basenameHeaderKey = fileName
a1.sources.r1.ignorePattern = ^(.)*\\.tmp$
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = timestamp
a1.sinks.k1.type = avro a1.sinks.k1.hostname = 192.168.1.75 a1.sinks.k1.port = 44444
# Use a channel which buffers events in memory
a1.channels.c1.type=memory
a1.channels.c1.capacity=10000
a1.channels.c1.transactionCapacity=1000
a1.channels.c1.keep-alive=30
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
其中主要就是將sinks配置到linux中的flume地址,採集目錄就是IIS的某個網站日誌文件地址:C:\\inetpub\\logs\\LogFiles\\W3SVC4spa
接收端linux flume的配置以下:日誌
tier1.sources=source1
tier1.channels=channel1
tier1.sinks=sink1
tier1.sources.source1.type=avro tier1.sources.source1.bind=192.168.1.75 tier1.sources.source1.port=44444 tier1.sources.source1.channels=channel1
tier1.channels.channel1.type=memory
tier1.channels.channel1.capacity=10000
tier1.channels.channel1.transactionCapacity=1000
tier1.channels.channel1.keep-alive=30
tier1.sinks.sink1.channel=channel1
tier1.sinks.sink1.type = hdfs
tier1.sinks.sink1.hdfs.path = hdfs://127.0.0.1:9008/iis
tier1.sinks.sink1.hdfs.writeFormat = Text
tier1.sinks.sink1.hdfs.fileType = DataStream
tier1.sinks.sink1.hdfs.rollInterval = 0
tier1.sinks.sink1.hdfs.rollSize = 0
tier1.sinks.sink1.hdfs.rollCount = 0
tier1.sinks.sink1.hdfs.filePrefix = localhost-%Y-%m-%d
tier1.sinks.sink1.hdfs.useLocalTimeStamp = true
tier1.sinks.sink1.hdfs.idleTimeout = 60
./flume-ng agent -c ../conf -f ../conf/avro_hdfs.conf -n tier1 -Dflume.root.logger=DEBUG,console
須要在flume的bin目錄中啓動code
flume-ng.cmd agent --conf ..\conf --conf-file ..\conf\avro.conf --name a1