利用Flume採集IIS日誌到HDFS

1.下載flume 1.7

到官網上下載 flume 1.7版本java

2.配置flume配置文件

剛開始的想法是從IIS--->Flume-->Hdfslinux

但在採集的時候一直報錯,沒法直接鏈接到遠程的hdfsapache

22 二月 2017 14:59:04,566 WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSEventSink.process:443)  - HDFS IO error
java.io.IOException: Callable timed out after 10000 ms on file: hdfs://192.168.1.75:9008/iis/2017-02-22/u_ex151127.log.1487746609021.tmp
    at org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:682)
    at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:232)
    at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:504)
    at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:406)
    at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)
    at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.TimeoutException
    at java.util.concurrent.FutureTask.get(FutureTask.java:205)
    at org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:675)
    ... 6 more

因此後面有選用折中的辦法,從 windows flume 採集到linux的flume,再到hdfswindows

IIS-->(Windows)Flume-->(Linux)Flume-->Hdfsapp

採集端windows flume配置文件以下:網站

a1.sources = r1
a1.sinks = k1
a1.channels = c1
 
# Describe/configure the source
a1.sources.r1.type = spooldir
a1.sources.r1.channels = c1
a1.sources.r1.spoolDir = C:\\inetpub\\logs\\LogFiles\\W3SVC4
a1.sources.r1.fileHeader = true
a1.sources.r1.basenameHeader = true
a1.sources.r1.basenameHeaderKey = fileName
a1.sources.r1.ignorePattern = ^(.)*\\.tmp$
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = timestamp

a1.sinks.k1.type = avro a1.sinks.k1.hostname = 192.168.1.75 a1.sinks.k1.port = 44444
 
# Use a channel which buffers events in memory
a1.channels.c1.type=memory  
a1.channels.c1.capacity=10000  
a1.channels.c1.transactionCapacity=1000  
a1.channels.c1.keep-alive=30  

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

其中主要就是將sinks配置到linux中的flume地址,採集目錄就是IIS的某個網站日誌文件地址:C:\\inetpub\\logs\\LogFiles\\W3SVC4spa

接收端linux flume的配置以下:日誌

tier1.sources=source1
tier1.channels=channel1  
tier1.sinks=sink1  
      
tier1.sources.source1.type=avro tier1.sources.source1.bind=192.168.1.75 tier1.sources.source1.port=44444 tier1.sources.source1.channels=channel1  
      
tier1.channels.channel1.type=memory  
tier1.channels.channel1.capacity=10000  
tier1.channels.channel1.transactionCapacity=1000  
tier1.channels.channel1.keep-alive=30  
      
tier1.sinks.sink1.channel=channel1  

tier1.sinks.sink1.type = hdfs
tier1.sinks.sink1.hdfs.path = hdfs://127.0.0.1:9008/iis
tier1.sinks.sink1.hdfs.writeFormat = Text
tier1.sinks.sink1.hdfs.fileType = DataStream
tier1.sinks.sink1.hdfs.rollInterval = 0
tier1.sinks.sink1.hdfs.rollSize = 0
tier1.sinks.sink1.hdfs.rollCount = 0
tier1.sinks.sink1.hdfs.filePrefix = localhost-%Y-%m-%d
tier1.sinks.sink1.hdfs.useLocalTimeStamp = true
tier1.sinks.sink1.hdfs.idleTimeout = 60
 

3.啓動linux中的flume 

./flume-ng agent -c ../conf -f ../conf/avro_hdfs.conf -n tier1 -Dflume.root.logger=DEBUG,console

4.啓動windows中的flume

須要在flume的bin目錄中啓動code

flume-ng.cmd agent --conf ..\conf --conf-file ..\conf\avro.conf --name a1
相關文章
相關標籤/搜索