Flume是一個分佈式、可靠、和高可用的海量日誌採集、聚合和傳輸的系統。支持在日誌系統中定製各種數據發送方,用於收集數據;同時,Flume提供對數據進行簡單處理,並寫到各類數據接受方(好比文本、HDFS、Hbase等)的能力 。
css
Flume主要由3個重要的組件購成:html
對現有程序改動最小的使用方式是使用是直接讀取程序原來記錄的日誌文件,基本能夠實現無縫接入,不須要對現有程序進行任何改動。
對於直接讀取文件Source,有兩種方式:node
拷貝到spool目錄下的文件不能夠再打開編輯。數據庫
spool目錄下不可包含相應的子目錄。apache
在實際使用的過程當中,能夠結合log4j使用,使用log4j的時候,將log4j的文件分割機制設爲1分鐘一次,將文件拷貝到spool的監控目錄。log4j有一個TimeRolling的插件,能夠把log4j分割的文件到spool目錄。基本實現了實時的監控。
Flume在傳完文件以後,將會修改文件的後綴,變爲.COMPLETED(後綴也能夠在配置文件中靈活指定)
ExecSource,SpoolSource對比:
ExecSource能夠實現對日誌的實時收集,可是存在Flume不運行或者指令執行出錯時,將沒法收集到日誌數據,沒法何證日誌數據的完整性。SpoolSource雖然沒法實現實時的收集數據,可是可使用以分鐘的方式分割文件,趨近於實時。若是應用沒法實現以分鐘切割日誌文件的話,能夠兩種收集方式結合使用。緩存
Channel有多種方式:
有MemoryChannel,JDBC Channel,MemoryRecoverChannel,FileChannel。MemoryChannel能夠實現高速的吞吐,可是沒法保證數據的完整性。MemoryRecoverChannel在官方文檔的建議上已經建義使用FileChannel來替換。FileChannel保證數據的完整性與一致性。在具體配置不現的FileChannel時,建議FileChannel設置的目錄和程序日誌文件保存的目錄設成不一樣的磁盤,以便提升效率。服務器
Sink在設置存儲數據時,能夠向文件系統中,數據庫中,hadoop中儲數據,在日誌數據較少時,能夠將數據存儲在文件系中,而且設定必定的時間間隔保存數據。在日誌數據較多時,能夠將相應的日誌數據存儲到Hadoop中,便於往後進行相應的數據分析。分佈式
flume安裝配置比較簡單,下載flume1.5.0二進制包 http://www.apache.org/dyn/closer.cgi/flume/1.5.0/apache-flume-1.5.0-bin.tar.gz
解壓便可 tar -zvxf apache-flume-1.5.0-bin.tar.gzide
進入flume目錄,新建example.confoop
# example.conf: A single-node Flume configuration # Name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = exec a1.sources.r1.command = echo 'hello' # Describe the sink a1.sinks.k1.type = logger # Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1
啓動flume: bin/flume-ng agent --f example.conf --name a1 -Dflume.root.logger=INFO,console
輸出日誌:
14/06/19 18:16:29 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider starting 14/06/19 18:16:29 INFO node.PollingPropertiesFileConfigurationProvider: Reloading configuration file:example.conf 14/06/19 18:16:29 INFO conf.FlumeConfiguration: Added sinks: k1 Agent: a1 14/06/19 18:16:29 INFO conf.FlumeConfiguration: Processing:k1 14/06/19 18:16:29 INFO conf.FlumeConfiguration: Processing:k1 14/06/19 18:16:29 WARN conf.FlumeConfiguration: Invalid property specified: conf 14/06/19 18:16:29 WARN conf.FlumeConfiguration: Configuration property ignored: mple.conf = A single-node Flume configuration 14/06/19 18:16:29 WARN conf.FlumeConfiguration: Agent configuration for 'mple' does not contain any channels. Marking it as invalid. 14/06/19 18:16:29 WARN conf.FlumeConfiguration: Agent configuration invalid for agent 'mple'. It will be removed. 14/06/19 18:16:29 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [a1] 14/06/19 18:16:29 INFO node.AbstractConfigurationProvider: Creating channels 14/06/19 18:16:29 INFO channel.DefaultChannelFactory: Creating instance of channel c1 type memory 14/06/19 18:16:29 INFO node.AbstractConfigurationProvider: Created channel c1 14/06/19 18:16:29 INFO source.DefaultSourceFactory: Creating instance of source r1, type exec 14/06/19 18:16:29 INFO sink.DefaultSinkFactory: Creating instance of sink: k1, type: logger 14/06/19 18:16:29 INFO node.AbstractConfigurationProvider: Channel c1 connected to [r1, k1] 14/06/19 18:16:29 INFO node.Application: Starting new configuration:{ sourceRunners:{r1=EventDrivenSourceRunner: { source:org.apache.flume.source.ExecSource{name:r1,state:IDLE} }} sinkRunners:{k1=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@1730d54 counterGroup:{ name:null counters:{} } }} channels:{c1=org.apache.flume.channel.MemoryChannel{name: c1}} } 14/06/19 18:16:29 INFO node.Application: Starting Channel c1 14/06/19 18:16:29 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: c1: Successfully registered new MBean. 14/06/19 18:16:29 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: c1 started 14/06/19 18:16:29 INFO node.Application: Starting Sink k1 14/06/19 18:16:29 INFO node.Application: Starting Source r1 14/06/19 18:16:29 INFO source.ExecSource: Exec source starting with command:echo 'hello' 14/06/19 18:16:29 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SOURCE, name: r1: Successfully registered new MBean. 14/06/19 18:16:29 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: r1 started 14/06/19 18:16:29 INFO source.ExecSource: Command [echo 'hello'] exited with 0 14/06/19 18:16:29 INFO sink.LoggerSink: Event: { headers:{} body: 27 68 65 6C 6C 6F 27 'hello' }