Flume NG Getting Started(Flume NG 新手入門指南)

Flume NG Getting Started(Flume NG 新手入門指南)翻譯

新手入門

    • Flume NG是什麼?
      • 有什麼改變?
    • 得到Flume NG
      • 從源碼構建
    • 配置
      • flume-ng全局選項
      • flume-ng agent選項
      • flume-ng avro-client 選項
    • 提供反饋

Flume NG是什麼?

Flume NG的目標是比Flume OG在簡單性,大小和容易部署上有顯著性地提升。爲了實現這個目標,Flume NG將不會兼容Flume OG.咱們目前在徵求那些對測試Flume NG正確性、是否易用以及與其餘系統的融合度有興趣的用戶的反饋信息。html

有什麼改變?

Flume NG(下一代)是在與Flume OG基礎理念相同的狀況下的一種截然不同的實現。若是你已經對Flume很熟悉了,那麼下面就是你須要知道內容。java

  • 你仍然須要有sources和sinks組件來作相同的事,只是如今用channels來鏈接。
  • Channels是可插拔的和持久的。Flume NG中in-memory channel處理速度快但不支持持久化而file-based channel支持event持久化。
  • There's no more logical or physical nodes. We call all physical nodes agents and agents can run zero or more sources and sinks.如今沒有邏輯節點或者物理節點。咱們稱物理節點爲agents而agents中能夠運行0至多個sources和sinks。
  • 沒有master和再也不依賴ZooKeeper。如今,Flume是依賴於一個簡單的基於文件配置配置系統。
  • 全部的都是插件,一些面向用戶,一些針對工具和系統開發者。插件化組件包括channels、sources、sinks、interceptors、sink processors和event serializeres。

請自行閱讀JIRAs文件尋找你認爲重要的特性。node

得到Flume NG

你能夠在Flume官網中 Downloads 下載源碼。若是你不打算爲Flume打補丁的話,那麼使用二進制文件將是簡單的方式。git

用源碼構建

爲了用源碼搭建Flume NG,你必需要有git、Sun JDK1.6,Apache Maven3.x,大概90MB的磁盤空間和網絡鏈接。apache

1. 檢查資源服務器

$ git clone https://git-wip-us.apache.org/repos/asf/flume.git flume
$ cd flume
$ git checkout trunk

2.編譯項目網絡

Apache Flume搭建時須要比默認配置更多的內存,咱們建議你作如下的Maven 選項:app

export MAVEN_OPTS="-Xms512m -Xmx1024m -XX:PermSize=256m -XX:MaxPermSize=512m"
# Build the code and run the tests (note: use mvn install, not mvn package, since we deploy Jenkins SNAPSHOT jars daily, and Flume is a multi-module project)
$ mvn install
# ...or build the code without running the tests
$ mvn install -DskipTests

(須要說明的是Flume須要在構建路徑下放置Google Protocol Buffers compiler來保證成功率。你能夠根據here裏面的介紹來下載和安裝。)負載均衡

這將在flume-ng-dist/target目錄下生成兩種類型的包,它們是tcp

  • apache-flume-ng-dist-1.4.0-SNAPSHOT-bin.tar.gz - A binary distribution of Flume, ready to run.
  • apache-flume-ng-dist-1.4.0-SNAPSHOT-src.tar.gz - A source-only distribution of Flume.

若是你只是一個想要運行Flume的用戶,你大概只須要-bin 版本。將它複製並解壓,你就可使用Flume了。

$ cp flume-ng-dist/target/apache-flume-1.4.0-SNAPSHOT-bin.tar.gz .
$ tar -zxvf apache-flume-1.4.0-SNAPSHOT-bin.tar.gz
$ cd apache-flume-1.4.0-SNAPSHOT-bin

3.按照工做模板建立一個你本身的屬性文檔(或者從頭開始建立一個)

$ cp conf/flume-conf.properties.template conf/flume.conf

4.(可選)按照模板建立一個你本身的flume-env.sh文檔(或者從頭開始建立一個)。若是命令行中經過 –conf/-c指定了conf目錄的話,那麼fluem-ng會在該路徑下搜尋「flume-env.sh」文檔。一個使用flume-env.sh的狀況是當你使用你自定義的Flume NG組件進行開發時經過JAVA_OPTS來指定調試或者評測選項。

$ cp conf/flume-env.sh.template conf/flume-env.sh

5.配置和運行Flume NG

在你完成 Flume NG的配置以後,你能夠經過 bin/flume-ng 可執行文件來運行它。這個腳本擁有一系列參數和模式。

配置

Flume使用基於Java屬性文件格式的配置文檔。運行agent時須要經過 –f<file>選項來告訴Flume。這個文件能夠放在任意位置,但從歷史和將來的角度來看,這個conf目錄將會是配置文件的正確位置。

讓咱們從一個基礎的例子開始吧。將下面複製並粘貼到conf/flume.conf中:

# Define a memory channel called ch1 on agent1
agent1.channels.ch1.type = memory
 
# Define an Avro source called avro-source1 on agent1 and tell it
# to bind to 0.0.0.0:41414. Connect it to channel ch1.
agent1.sources.avro-source1.channels = ch1
agent1.sources.avro-source1.type = avro
agent1.sources.avro-source1.bind = 0.0.0.0
agent1.sources.avro-source1.port = 41414
 
# Define a logger sink that simply logs all events it receives
# and connect it to the other end of the same channel.
agent1.sinks.log-sink1.channel = ch1
agent1.sinks.log-sink1.type = logger
 
# Finally, now that we've defined all of our components, tell
# agent1 which ones we want to activate.
agent1.channels = ch1
agent1.sources = avro-source1
agent1.sinks = log-sink1

這個例子建立了內存channel(一個不可靠和高效傳輸)將一個Avro RPC source,和一個logger sink鏈接在一塊兒。Avro source接收到的任何event都會按照規劃的路線傳到ch1 channel中而後傳遞給logger sink。須要重點說明的是定義組件只是配置Flume的前半部分工做;它們必須配置在<agent>中被激活。多個source、channel和sink是能夠被配置的,用空格隔開每一個組件就能夠。

至於全部的鞋機,請自行查看Javadoc中的org.apache.flume.conf.properties.PropertiesFileConfigurationProvider類。

這是當前已經實現的channels、sinks和sources的列表。每一個插件都有其本身的選項和須要配置的屬性,請自行閱讀javadoc。

Component

Type

Description

Implementation Class

Channel

memory

In-memory, fast, non-durable event transport

一個將event存儲在內容中,快速傳輸但沒法持久化的channel。

MemoryChannel

Channel

file

A channel for reading, writing, mapping, and manipulating a file

一個對文件進行讀、寫、映射和操做的channel

FileChannel

Channel

jdbc

JDBC-based, durable event transport (Derby-based)

基於JDBC,支持持久化的channel

JDBCChannel

Channel

recoverablememory

A durable channel implementation that uses the local file system for its storage

一個使用本地文件系統實現持久化的channel

RecoverableMemoryChannel

Channel

org.apache.flume.channel.PseudoTxnMemoryChannel

Mainly for testing purposes. Not meant for production use.

用於測試,不用於生產

PseudoTxnMemoryChannel

Channel

(custom type as FQCN)

Your own Channel impl.

自定義channel

(custom FQCN)

Source

avro

Avro Netty RPC event source

AvroSource

Source

exec

Execute a long-lived Unix process and read from stdout

執行一個長鏈接Unix進程並從標準輸出設備讀取數據

ExecSource

Source

netcat

Netcat style TCP event source

NetcatSource

Source

seq

Monotonically incrementing sequence generator event source

單調遞增序列發生器的事件source

SequenceGeneratorSource

Source

org.apache.flume.source.StressSource

Mainly for testing purposes. Not meant for production use. Serves as a continuous source of events where each event has the same payload. The payload consists of some number of bytes (specified by size property, defaults to 500) where each byte has the signed value Byte.MAX_VALUE (0x7F, or 127).

主要用於測試,不適合用於生產。用於接收每一個擁有相同的有效負載的event。那有效負載包含一組字節(經過 size屬性指定,默認爲500)每一個字節都是最大值(Byte.MAX_VALUE(0X7F或者127))

org.apache.flume.source.StressSource

Source

syslogtcp

 

SyslogTcpSource

Source

syslogudp

 

SyslogUDPSource

Source

org.apache.flume.source.avroLegacy.AvroLegacySource

 

AvroLegacySource

Source

org.apache.flume.source.thriftLegacy.ThriftLegacySource

 

ThriftLegacySource

Source

org.apache.flume.source.scribe.ScribeSource

 

ScribeSource

Source

(custom type as FQCN)

Your own Source impl.

自定義Source

(custom FQCN)

Sink

hdfs

Writes all events received to HDFS (with support for rolling, bucketing, HDFS-200 append, and more)

將全部接收到events寫到HDFS(支持回滾,桶裝和追加以及其餘)

HDFSEventSink

Sink

org.apache.flume.sink.hbase.HBaseSink

A simple sink that reads events from a channel and writes them to HBase.

一個簡單的sink用於將從channel讀到的數據寫到HBase

org.apache.flume.sink.hbase.HBaseSink

Sink

org.apache.flume.sink.hbase.AsyncHBaseSink

 

org.apache.flume.sink.hbase.AsyncHBaseSink

Sink

logger

Log events at INFO level via configured logging subsystem (log4j by default)

經過配置日誌子系統將INFO級別的events打印出來。

LoggerSink

Sink

avro

Sink that invokes a pre-defined Avro protocol method for all events it receives (when paired with an avro source, forms tiered collection)

一個調用預先定義好的Avro protocol方法來處理接收的全部event的sink(與avro source配對,造成分層收集)

AvroSink

Sink

file_roll

 

RollingFileSink

Sink

irc

 

IRCSink

Sink

null

/dev/null for Flume - blackhole all events received

event黑洞,有來無回

NullSink

Sink

(custom type as FQCN)

Your own Sink impl.

自定義sink

(custom FQCN)

ChannelSelector

replicating

 

ReplicatingChannelSelector

ChannelSelector

multiplexing

 

MultiplexingChannelSelector

ChannelSelector

(custom type)

Your own ChannelSelector impl.

(custom FQCN)

SinkProcessor

default

 

DefaultSinkProcessor

SinkProcessor

failover

 

FailoverSinkProcessor

SinkProcessor

load_balance

Provides the ability to load-balance flow over multiple sinks.

當存在多個sink時實現負載均衡

LoadBalancingSinkProcessor

SinkProcessor

(custom type as FQCN)

Your own SinkProcessor impl.

(custom FQCN)

Interceptor$Builder

host

 

HostInterceptor$Builder

Interceptor$Builder

timestamp

TimestampInterceptor

TimestampInterceptor$Builder

Interceptor$Builder

static

 

StaticInterceptor$Builder

Interceptor$Builder

regex_filter

 

RegexFilteringInterceptor$Builder

Interceptor$Builder

(custom type as FQCN)

Your own Interceptor$Builder impl.

(custom FQCN)

EventSerializer$Builder

text

 

BodyTextEventSerializer$Builder

EventSerializer$Builder

avro_event

 

FlumeEventAvroEventSerializer$Builder

EventSerializer

org.apache.flume.sink.hbase.SimpleHbaseEventSerializer

 

SimpleHbaseEventSerializer

EventSerializer

org.apache.flume.sink.hbase.SimpleAsyncHbaseEventSerializer

 

SimpleAsyncHbaseEventSerializer

EventSerializer

org.apache.flume.sink.hbase.RegexHbaseEventSerializer

 

RegexHbaseEventSerializer

HbaseEventSerializer

Custom implementation of serializer for HBaseSink.
(custom type as FQCN)

Your own HbaseEventSerializer impl.

(custom FQCN)

AsyncHbaseEventSerializer

Custom implementation of serializer for AsyncHbase sink.
(custom type as FQCN)

Your own AsyncHbaseEventSerializer impl.

(custom FQCN)

EventSerializer$Builder

Custom implementation of serializer for all sinks except for HBaseSink and AsyncHBaseSink.
(custom type as FQCN)

Your own EventSerializer$Builder impl.

(custom FQCN)

flume-ng可執行可讓你運行一個Flume NG agent或者一個 Avro 客戶端用於測試和實驗。不管怎樣,你必須指定一個命令(例如 agent或者avro-client)和一個conf目錄(--conf<conf dir>)。全部其餘的選項均可以用命令行指定。

使用上面的fiume.conf來啓動flume服務器

bin/flume-ng agent --conf ./conf/ -f conf/flume.conf -Dflume.root.logger=DEBUG,console -n agent1

須要注意的是agent 的名字是經過 –n agent1來指定的而且必須和-conf/flume.conf中給定的名字相匹配。

你的輸出應該是這樣的:

$ bin/flume-ng agent --conf conf/ -f conf/flume.conf -n agent1
2012-03-16 16:36:11,918 (main) [INFO - org.apache.flume.lifecycle.LifecycleSupervisor.start(LifecycleSupervisor.java:58)] Starting lifecycle supervisor 1
2012-03-16 16:36:11,921 (main) [INFO - org.apache.flume.node.FlumeNode.start(FlumeNode.java:54)] Flume node starting - agent1
2012-03-16 16:36:11,926 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.start(DefaultLogicalNodeManager.java:110)] Node manager starting
2012-03-16 16:36:11,928 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.lifecycle.LifecycleSupervisor.start(LifecycleSupervisor.java:58)] Starting lifecycle supervisor 10
2012-03-16 16:36:11,929 (lifecycleSupervisor-1-0) [DEBUG - org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.start(DefaultLogicalNodeManager.java:114)] Node manager started
2012-03-16 16:36:11,926 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.conf.file.AbstractFileConfigurationProvider.start(AbstractFileConfigurationProvider.java:67)] Configuration provider starting
2012-03-16 16:36:11,930 (lifecycleSupervisor-1-1) [DEBUG - org.apache.flume.conf.file.AbstractFileConfigurationProvider.start(AbstractFileConfigurationProvider.java:87)] Configuration provider started
2012-03-16 16:36:11,930 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:189)] Checking file:conf/flume.conf for changes
2012-03-16 16:36:11,931 (conf-file-poller-0) [INFO - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:196)] Reloading configuration file:conf/flume.conf
2012-03-16 16:36:11,936 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.properties.FlumeConfiguration$AgentConfiguration.isValid(FlumeConfiguration.java:225)] Starting validation of configuration for agent: agent1, initial-configuration: AgentConfiguration[agent1]
SOURCES: {avro-source1=ComponentConfiguration[avro-source1]
  CONFIG: {port=41414, channels=ch1, type=avro, bind=0.0.0.0}
  RUNNER:   ComponentConfiguration[runner]
    CONFIG: {}
 
 
}
CHANNELS: {ch1=ComponentConfiguration[ch1]
  CONFIG: {type=memory}
 
}
SINKS: {log-sink1=ComponentConfiguration[log-sink1]
  CONFIG: {type=logger, channel=ch1}
  RUNNER:   ComponentConfiguration[runner]
    CONFIG: {}
 
 
}
2012-03-16 16:36:11,936 (conf-file-poller-0) [INFO - org.apache.flume.conf.properties.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:119)] Post-validation flume configuration contains configuation  for agents: [agent1]
2012-03-16 16:36:11,937 (conf-file-poller-0) [DEBUG - org.apache.flume.channel.DefaultChannelFactory.create(DefaultChannelFactory.java:67)] Creating instance of channel ch1 type memory
2012-03-16 16:36:11,944 (conf-file-poller-0) [DEBUG - org.apache.flume.source.DefaultSourceFactory.create(DefaultSourceFactory.java:73)] Creating instance of source avro-source1, type avro
2012-03-16 16:36:11,957 (conf-file-poller-0) [INFO - org.apache.flume.sink.DefaultSinkFactory.create(DefaultSinkFactory.java:69)] Creating instance of sink log-sink1 typelogger
2012-03-16 16:36:11,963 (conf-file-poller-0) [INFO - org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java:52)] Node configuration change:{ sourceRunners:{avro-source1=EventDrivenSourceRunner: { source:AvroSource: { bindAddress:0.0.0.0 port:41414 } }} sinkRunners:{log-sink1=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@79f6f296 counterGroup:{ name:null counters:{} } }} channels:{ch1=org.apache.flume.channel.MemoryChannel@43b09468} }
2012-03-16 16:36:11,974 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.source.AvroSource.start(AvroSource.java:122)] Avro source starting:AvroSource: { bindAddress:0.0.0.0 port:41414 }
2012-03-16 16:36:11,975 (Thread-1) [DEBUG - org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:123)] Polling sink runner starting
2012-03-16 16:36:12,352 (lifecycleSupervisor-1-1) [DEBUG - org.apache.flume.source.AvroSource.start(AvroSource.java:132)] Avro source started
 

flume-ng 全局選項

Option

Description

--conf,-c <conf>

Use configs in <conf> directory

--classpath,-C <cp>

Append to the classpath

--dryrun,-d

Do not actually start Flume, just print the command

-Dproperty=value

Sets a JDK system property value

flume-ng agent選項

當給一個agent命令時,Flume NG agent將會根據一個給定的配置文件進行啓動。

Option

Description

--conf-file,-f <file>

Indicates which configuration file you want to run with (required)

--name,-n <agentname>

Indicates the name of agent on which we're running (required)

flume-ng avro-client 選項

運行一個Avro client從標準輸入發送數據或文件到一個Flume NG Avro Source所監聽的主機和端口上。

Option

Description

--host,-H <hostname>

Specifies the hostname of the Flume agent (may be localhost)

--port,-p <port>

Specifies the port on which the Avro source is listening

--filename,-F <filename>

Sends each line of <filename> to Flume (optional)

--headerFile,-F <file>

Header file containing headers as key/value pairs on each new line

Avroclient將每一行以\n, \r, or \r\n結尾的數據當作一個event。把avro-clinet 命令當作Flume中cat命令。例如,下面的命令建立了一個event並將它發送到Flume’avro source所監聽的端口41414。

在一個新的窗口輸入如下內容:

$ bin/flume-ng avro-client --conf conf -H localhost -p 41414 -F /etc/passwd -Dflume.root.logger=DEBUG,console

你能看到如下信息:

2012-03-16 16:39:17,124 (main) [DEBUG - org.apache.flume.client.avro.AvroCLIClient.run(AvroCLIClient.java:175)] Finished
2012-03-16 16:39:17,127 (main) [DEBUG - org.apache.flume.client.avro.AvroCLIClient.run(AvroCLIClient.java:178)] Closing reader
2012-03-16 16:39:17,127 (main) [DEBUG - org.apache.flume.client.avro.AvroCLIClient.run(AvroCLIClient.java:183)] Closing transceiver
2012-03-16 16:39:17,129 (main) [DEBUG - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:73)] Exiting

服務器所運行的第一個窗口會打印如下內容:

2012-03-16 16:39:16,738 (New I/O server boss #1 ([id: 0x49e808ca, /0:0:0:0:0:0:0:0:41414])) [INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:123)] [id: 0x0b92a848, /1
27.0.0.1:39577 => /127.0.0.1:41414] OPEN
2012-03-16 16:39:16,742 (New I/O server worker #1-1) [INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:123)] [id: 0x0b92a848, /127.0.0.1:39577 => /127.0.0.1:41414] BOU
ND: /127.0.0.1:41414
2012-03-16 16:39:16,742 (New I/O server worker #1-1) [INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:123)] [id: 0x0b92a848, /127.0.0.1:39577 => /127.0.0.1:41414] CON
NECTED: /127.0.0.1:39577
2012-03-16 16:39:17,129 (New I/O server worker #1-1) [INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:123)] [id: 0x0b92a848, /127.0.0.1:39577 :> /127.0.0.1:41414] DISCONNECTED
2012-03-16 16:39:17,129 (New I/O server worker #1-1) [INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:123)] [id: 0x0b92a848, /127.0.0.1:39577 :> /127.0.0.1:41414] UNBOUND
2012-03-16 16:39:17,129 (New I/O server worker #1-1) [INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:123)] [id: 0x0b92a848, /127.0.0.1:39577 :> /127.0.0.1:41414] CLOSED
2012-03-16 16:39:17,302 (Thread-1) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:68)] Event: { headers:{} body:[B@5c1ae90c }
2012-03-16 16:39:17,302 (Thread-1) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:68)] Event: { headers:{} body:[B@6aba4211 }
2012-03-16 16:39:17,302 (Thread-1) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:68)] Event: { headers:{} body:[B@6a47a0d4 }
2012-03-16 16:39:17,302 (Thread-1) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:68)] Event: { headers:{} body:[B@48ff4cf }
...

恭喜!你的Apache Flume已經成功運行了。

提供反饋

對於幫助構建、配置和運行Flume來講,最好的地方就是用戶的郵件列表。發送郵件到user-subscribe@flume.apache.org 進行訂閱和一旦你訂閱以後user@flume.apache.org 會發送訂閱信息給你。檔案信息能夠在 http://mail-archives.apache.org/mod_mbox/incubator-flume-user/ (2012/7創建) and http://mail-archives.apache.org/mod_mbox/incubator-flume-user/http://mail-archives.apache.org/mod_mbox/flume-user/ 獲得。

若是你確認你發現一個bug或者須要一個特性或者提高,請不要害羞。去這個 https://issues.apache.org/jira/browse/FLUME 網站爲該版本的Flume提一個JIRA.對於NG版本,請爲合適的里程碑/發佈留下「影響版本」的標識。能夠只留下你對於未達之處的沒法肯定的任何想法。當咱們須要的時候會像你徵求細節。須要說明的是你必須建立一個Apache JIRA帳戶以至你能夠提出問題。

 

下面爲原文


 

 

Getting Started

What is Flume NG?

Flume NG aims to be significantly simpler, smaller, and easier to deploy than Flume OG. In doing so, we do not commit to maintaining backward compatibility of Flume NG with Flume OG. We're currently soliciting feedback from those who are interested in testing Flume NG for correctness, ease of use, and potential integration with other systems.

What's Changed?

Flume NG (Next Generation) is a huge departure from Flume OG (Original Generation) in its implementation although many of the original concepts are the same. If you're already familiar with Flume, here's what you need to know.

  • You still have sources and sinks and they still do the same thing. They are now connected by channels.
  • Channels are pluggable and dictate durability. Flume NG ships with an in-memory channel for fast, but non-durable event delivery and a file-based channel for durable event delivery.
  • There's no more logical or physical nodes. We call all physical nodes agents and agents can run zero or more sources and sinks.
  • There's no master and no ZooKeeper dependency anymore. At this time, Flume runs with a simple file-based configuration system.
  • Just about everything is a plugin, some end user facing, some for tool and system developers. Pluggable components include channels, sources, sinks, interceptors, sink processors, and event serializers.

Please file JIRAs and/or vote for features you feel are important.

Getting Flume NG

Flume is available as a source tarball and binary on the Downloads section of the Flume Website. If you are not planning on creating patches for Flume, the binary is likely the easiest way to get started.

Building From Source

To build Flume NG from source, you'll need git, the Sun JDK 1.6, Apache Maven 3.x, about 90MB of local disk space and an Internet connection.

1. Check out the source

$ git clone https://git-wip-us.apache.org/repos/asf/flume.git flume
$ cd flume
$ git checkout trunk

2. Compile the project

The Apache Flume build requires more memory than the default configuration. We recommend you set the following Maven options:

export MAVEN_OPTS="-Xms512m -Xmx1024m -XX:PermSize=256m -XX:MaxPermSize=512m"
 
# Build the code and run the tests (note: use mvn install, not mvn package, since we deploy Jenkins SNAPSHOT jars daily, and Flume is a multi-module project)
$ mvn install
# ...or build the code without running the tests
$ mvn install -DskipTests

(Please note that Flume requires that Google Protocol Buffers compiler be in the path for the build to be successful. You download and install it by following the instructions here.)

This produces two types of packages in flume-ng-dist/target. They are:

  • apache-flume-ng-dist-1.4.0-SNAPSHOT-bin.tar.gz - A binary distribution of Flume, ready to run.
  • apache-flume-ng-dist-1.4.0-SNAPSHOT-src.tar.gz - A source-only distribution of Flume.

If you're a user and you just want to run Flume, you probably want the -bin version. Copy one out, decompress it, and you're ready to go.

 
$ cp flume-ng-dist/target/apache-flume-1.4.0-SNAPSHOT-bin.tar.gz .
$ tar -zxvf apache-flume-1.4.0-SNAPSHOT-bin.tar.gz
$ cd apache-flume-1.4.0-SNAPSHOT-bin

3. Create your own properties file based on the working template (or create one from scratch)

 
$ cp conf/flume-conf.properties.template conf/flume.conf

4. (Optional) Create your flume-env.sh file based on the template (or create one from scratch). The flume-ng executable looks for and sources a file named "flume-env.sh" in the conf directory specified by the --conf/-c commandline option. One use case for using flume-env.sh would be to specify debugging or profiling options via JAVA_OPTS when developing your own custom Flume NG components such as sources and sinks.

 
$ cp conf/flume-env.sh.template conf/flume-env.sh

5. Configure and Run Flume NG

After you've configured Flume NG (see below), you can run it with the bin/flume-ng executable. This script has a number of arguments and modes.

Configuration

Flume uses a Java property file based configuration format. It is required that you tell Flume which file to use by way of the -f <file> option (see above) when running an agent. The file can live anywhere, but historically - and in the future - the conf directory will be the correct place for config files.

Let's start with a basic example. Copy and paste this into conf/flume.conf:

# Define a memory channel called ch1 on agent1
agent1.channels.ch1.type = memory
 
# Define an Avro source called avro-source1 on agent1 and tell it
# to bind to 0.0.0.0:41414. Connect it to channel ch1.
agent1.sources.avro-source1.channels = ch1
agent1.sources.avro-source1.type = avro
agent1.sources.avro-source1.bind = 0.0.0.0
agent1.sources.avro-source1.port = 41414
 
# Define a logger sink that simply logs all events it receives
# and connect it to the other end of the same channel.
agent1.sinks.log-sink1.channel = ch1
agent1.sinks.log-sink1.type = logger
 
# Finally, now that we've defined all of our components, tell
# agent1 which ones we want to activate.
agent1.channels = ch1
agent1.sources = avro-source1
agent1.sinks = log-sink1

This example creates a memory channel (i.e. an unreliable or "best effort" transport), an Avro RPC source, and a logger sink and connects them together. Any events received by the Avro source are routed to the channel ch1 and delivered to the logger sink. It's important to note that defining components is the first half of configuring Flume; they must be activated by listing them in the <agent>.channels, <agent>.sources, and sections. Multiple sources, sinks, and channels may be listed, separated by a space.

For full details, please see the javadoc for the org.apache.flume.conf.properties.PropertiesFileConfigurationProvider class.

This is a listing of the implemented sources, sinks, and channels at this time. Each plugin has its own optional and required configuration properties so please see the javadocs (for now).

Component

Type

Description

Implementation Class

Channel

memory

In-memory, fast, non-durable event transport

一個將event存儲在內容中,快速傳輸但沒法持久化的channel。

MemoryChannel

Channel

file

A channel for reading, writing, mapping, and manipulating a file

一個對文件進行讀、寫、映射和操做的channel

FileChannel

Channel

jdbc

JDBC-based, durable event transport (Derby-based)

基於JDBC,支持持久化的channel

JDBCChannel

Channel

recoverablememory

A durable channel implementation that uses the local file system for its storage

一個使用本地文件系統實現持久化的channel

RecoverableMemoryChannel

Channel

org.apache.flume.channel.PseudoTxnMemoryChannel

Mainly for testing purposes. Not meant for production use.

用於測試,不用於生產

PseudoTxnMemoryChannel

Channel

(custom type as FQCN)

Your own Channel impl.

自定義channel

(custom FQCN)

Source

avro

Avro Netty RPC event source

AvroSource

Source

exec

Execute a long-lived Unix process and read from stdout

執行一個長鏈接Unix進程並從標準輸出設備讀取數據

ExecSource

Source

netcat

Netcat style TCP event source

NetcatSource

Source

seq

Monotonically incrementing sequence generator event source

單調遞增序列發生器的事件source

SequenceGeneratorSource

Source

org.apache.flume.source.StressSource

Mainly for testing purposes. Not meant for production use. Serves as a continuous source of events where each event has the same payload. The payload consists of some number of bytes (specified by size property, defaults to 500) where each byte has the signed value Byte.MAX_VALUE (0x7F, or 127).

主要用於測試,不適合用於生產。用於接收每一個擁有相同的有效負載的event。那有效負載包含一組字節(經過 size屬性指定,默認爲500)每一個字節都是最大值(Byte.MAX_VALUE(0X7F或者127))

org.apache.flume.source.StressSource

Source

syslogtcp

 

SyslogTcpSource

Source

syslogudp

 

SyslogUDPSource

Source

org.apache.flume.source.avroLegacy.AvroLegacySource

 

AvroLegacySource

Source

org.apache.flume.source.thriftLegacy.ThriftLegacySource

 

ThriftLegacySource

Source

org.apache.flume.source.scribe.ScribeSource

 

ScribeSource

Source

(custom type as FQCN)

Your own Source impl.

自定義Source

(custom FQCN)

Sink

hdfs

Writes all events received to HDFS (with support for rolling, bucketing, HDFS-200 append, and more)

將全部接收到events寫到HDFS(支持回滾,桶裝和追加以及其餘)

HDFSEventSink

Sink

org.apache.flume.sink.hbase.HBaseSink

A simple sink that reads events from a channel and writes them to HBase.

一個簡單的sink用於將從channel讀到的數據寫到HBase

org.apache.flume.sink.hbase.HBaseSink

Sink

org.apache.flume.sink.hbase.AsyncHBaseSink

 

org.apache.flume.sink.hbase.AsyncHBaseSink

Sink

logger

Log events at INFO level via configured logging subsystem (log4j by default)

經過配置日誌子系統將INFO級別的events打印出來。

LoggerSink

Sink

avro

Sink that invokes a pre-defined Avro protocol method for all events it receives (when paired with an avro source, forms tiered collection)

一個調用預先定義好的Avro protocol方法來處理接收的全部event的sink(與avro source配對,造成分層收集)

AvroSink

Sink

file_roll

 

RollingFileSink

Sink

irc

 

IRCSink

Sink

null

/dev/null for Flume - blackhole all events received

event黑洞,有來無回

NullSink

Sink

(custom type as FQCN)

Your own Sink impl.

自定義sink

(custom FQCN)

ChannelSelector

replicating

 

ReplicatingChannelSelector

ChannelSelector

multiplexing

 

MultiplexingChannelSelector

ChannelSelector

(custom type)

Your own ChannelSelector impl.

(custom FQCN)

SinkProcessor

default

 

DefaultSinkProcessor

SinkProcessor

failover

 

FailoverSinkProcessor

SinkProcessor

load_balance

Provides the ability to load-balance flow over multiple sinks.

當存在多個sink時實現負載均衡

LoadBalancingSinkProcessor

SinkProcessor

(custom type as FQCN)

Your own SinkProcessor impl.

(custom FQCN)

Interceptor$Builder

host

 

HostInterceptor$Builder

Interceptor$Builder

timestamp

TimestampInterceptor

TimestampInterceptor$Builder

Interceptor$Builder

static

 

StaticInterceptor$Builder

Interceptor$Builder

regex_filter

 

RegexFilteringInterceptor$Builder

Interceptor$Builder

(custom type as FQCN)

Your own Interceptor$Builder impl.

(custom FQCN)

EventSerializer$Builder

text

 

BodyTextEventSerializer$Builder

EventSerializer$Builder

avro_event

 

FlumeEventAvroEventSerializer$Builder

EventSerializer

org.apache.flume.sink.hbase.SimpleHbaseEventSerializer

 

SimpleHbaseEventSerializer

EventSerializer

org.apache.flume.sink.hbase.SimpleAsyncHbaseEventSerializer

 

SimpleAsyncHbaseEventSerializer

EventSerializer

org.apache.flume.sink.hbase.RegexHbaseEventSerializer

 

RegexHbaseEventSerializer

HbaseEventSerializer

Custom implementation of serializer for HBaseSink.
(custom type as FQCN)

Your own HbaseEventSerializer impl.

(custom FQCN)

AsyncHbaseEventSerializer

Custom implementation of serializer for AsyncHbase sink.
(custom type as FQCN)

Your own AsyncHbaseEventSerializer impl.

(custom FQCN)

EventSerializer$Builder

Custom implementation of serializer for all sinks except for HBaseSink and AsyncHBaseSink.
(custom type as FQCN)

Your own EventSerializer$Builder impl.

(custom FQCN)

The flume-ng executable lets you run a Flume NG agent or an Avro client which is useful for testing and experiments. No matter what, you'll need to specify a command (e.g. agent or avro-client) and a conf directory (--conf <conf dir>). All other options are command-specific.

To start the flume server using the flume.conf above:

bin/flume-ng agent --conf ./conf/ -f conf/flume.conf -Dflume.root.logger=DEBUG,console -n agent1

Notice that the agent name is specified by -n agent1 and must match a agent name given in -f conf/flume.conf

Your output should look something like this:

$ bin/flume-ng agent --conf conf/ -f conf/flume.conf -n agent1
2012-03-16 16:36:11,918 (main) [INFO - org.apache.flume.lifecycle.LifecycleSupervisor.start(LifecycleSupervisor.java:58)] Starting lifecycle supervisor 1
2012-03-16 16:36:11,921 (main) [INFO - org.apache.flume.node.FlumeNode.start(FlumeNode.java:54)] Flume node starting - agent1
2012-03-16 16:36:11,926 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.start(DefaultLogicalNodeManager.java:110)] Node manager starting
2012-03-16 16:36:11,928 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.lifecycle.LifecycleSupervisor.start(LifecycleSupervisor.java:58)] Starting lifecycle supervisor 10
2012-03-16 16:36:11,929 (lifecycleSupervisor-1-0) [DEBUG - org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.start(DefaultLogicalNodeManager.java:114)] Node manager started
2012-03-16 16:36:11,926 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.conf.file.AbstractFileConfigurationProvider.start(AbstractFileConfigurationProvider.java:67)] Configuration provider starting
2012-03-16 16:36:11,930 (lifecycleSupervisor-1-1) [DEBUG - org.apache.flume.conf.file.AbstractFileConfigurationProvider.start(AbstractFileConfigurationProvider.java:87)] Configuration provider started
2012-03-16 16:36:11,930 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:189)] Checking file:conf/flume.conf for changes
2012-03-16 16:36:11,931 (conf-file-poller-0) [INFO - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:196)] Reloading configuration file:conf/flume.conf
2012-03-16 16:36:11,936 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.properties.FlumeConfiguration$AgentConfiguration.isValid(FlumeConfiguration.java:225)] Starting validation of configuration for agent: agent1, initial-configuration: AgentConfiguration[agent1]
SOURCES: {avro-source1=ComponentConfiguration[avro-source1]
  CONFIG: {port=41414, channels=ch1, type=avro, bind=0.0.0.0}
  RUNNER:   ComponentConfiguration[runner]
    CONFIG: {}
 
 
}
CHANNELS: {ch1=ComponentConfiguration[ch1]
  CONFIG: {type=memory}
 
}
SINKS: {log-sink1=ComponentConfiguration[log-sink1]
  CONFIG: {type=logger, channel=ch1}
  RUNNER:   ComponentConfiguration[runner]
    CONFIG: {}
 
 
}
2012-03-16 16:36:11,936 (conf-file-poller-0) [INFO - org.apache.flume.conf.properties.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:119)] Post-validation flume configuration contains configuation  for agents: [agent1]
2012-03-16 16:36:11,937 (conf-file-poller-0) [DEBUG - org.apache.flume.channel.DefaultChannelFactory.create(DefaultChannelFactory.java:67)] Creating instance of channel ch1 type memory
2012-03-16 16:36:11,944 (conf-file-poller-0) [DEBUG - org.apache.flume.source.DefaultSourceFactory.create(DefaultSourceFactory.java:73)] Creating instance of source avro-source1, type avro
2012-03-16 16:36:11,957 (conf-file-poller-0) [INFO - org.apache.flume.sink.DefaultSinkFactory.create(DefaultSinkFactory.java:69)] Creating instance of sink log-sink1 typelogger
2012-03-16 16:36:11,963 (conf-file-poller-0) [INFO - org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.onNodeConfigurationChanged(DefaultLogicalNodeManager.java:52)] Node configuration change:{ sourceRunners:{avro-source1=EventDrivenSourceRunner: { source:AvroSource: { bindAddress:0.0.0.0 port:41414 } }} sinkRunners:{log-sink1=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@79f6f296 counterGroup:{ name:null counters:{} } }} channels:{ch1=org.apache.flume.channel.MemoryChannel@43b09468} }
2012-03-16 16:36:11,974 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.source.AvroSource.start(AvroSource.java:122)] Avro source starting:AvroSource: { bindAddress:0.0.0.0 port:41414 }
2012-03-16 16:36:11,975 (Thread-1) [DEBUG - org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:123)] Polling sink runner starting
2012-03-16 16:36:12,352 (lifecycleSupervisor-1-1) [DEBUG - org.apache.flume.source.AvroSource.start(AvroSource.java:132)] Avro source started
 
 

flume-ng global options

Option

Description

--conf,-c <conf>

Use configs in <conf> directory

--classpath,-C <cp>

Append to the classpath

--dryrun,-d

Do not actually start Flume, just print the command

-Dproperty=value

Sets a JDK system property value

flume-ng agent options

When given the agent command, a Flume NG agent will be started with a given configuration file (required).

 

Option

Description

--conf-file,-f <file>

Indicates which configuration file you want to run with (required)

--name,-n <agentname>

Indicates the name of agent on which we're running (required)

flume-ng avro-client options(flume-ng avro-client 選項)

Run an Avro client that sends either a file or data from stdin to a specified host and port where a Flume NG Avro Source is listening.

Option

Description

--host,-H <hostname>

Specifies the hostname of the Flume agent (may be localhost)

--port,-p <port>

Specifies the port on which the Avro source is listening

--filename,-F <filename>

Sends each line of <filename> to Flume (optional)

--headerFile,-F <file>

Header file containing headers as key/value pairs on each new line

The Avro client treats each line (terminated by \n, \r, or \r\n) as an event. Think of the avro-client command as cat for Flume. For instance, the following creates one event per Linux user and sends it to Flume's avro source on localhost:41414.

In a new window type the following:

$ bin/flume-ng avro-client --conf conf -H localhost -p 41414 -F /etc/passwd -Dflume.root.logger=DEBUG,console

You should see something like this:

2012-03-16 16:39:17,124 (main) [DEBUG - org.apache.flume.client.avro.AvroCLIClient.run(AvroCLIClient.java:175)] Finished
2012-03-16 16:39:17,127 (main) [DEBUG - org.apache.flume.client.avro.AvroCLIClient.run(AvroCLIClient.java:178)] Closing reader
2012-03-16 16:39:17,127 (main) [DEBUG - org.apache.flume.client.avro.AvroCLIClient.run(AvroCLIClient.java:183)] Closing transceiver
2012-03-16 16:39:17,129 (main) [DEBUG - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:73)] Exiting

And in your first window, where the server is running:

2012-03-16 16:39:16,738 (New I/O server boss #1 ([id: 0x49e808ca, /0:0:0:0:0:0:0:0:41414])) [INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:123)] [id: 0x0b92a848, /1
27.0.0.1:39577 => /127.0.0.1:41414] OPEN
2012-03-16 16:39:16,742 (New I/O server worker #1-1) [INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:123)] [id: 0x0b92a848, /127.0.0.1:39577 => /127.0.0.1:41414] BOU
ND: /127.0.0.1:41414
2012-03-16 16:39:16,742 (New I/O server worker #1-1) [INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:123)] [id: 0x0b92a848, /127.0.0.1:39577 => /127.0.0.1:41414] CON
NECTED: /127.0.0.1:39577
2012-03-16 16:39:17,129 (New I/O server worker #1-1) [INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:123)] [id: 0x0b92a848, /127.0.0.1:39577 :> /127.0.0.1:41414] DISCONNECTED
2012-03-16 16:39:17,129 (New I/O server worker #1-1) [INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:123)] [id: 0x0b92a848, /127.0.0.1:39577 :> /127.0.0.1:41414] UNBOUND
2012-03-16 16:39:17,129 (New I/O server worker #1-1) [INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:123)] [id: 0x0b92a848, /127.0.0.1:39577 :> /127.0.0.1:41414] CLOSED
2012-03-16 16:39:17,302 (Thread-1) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:68)] Event: { headers:{} body:[B@5c1ae90c }
2012-03-16 16:39:17,302 (Thread-1) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:68)] Event: { headers:{} body:[B@6aba4211 }
2012-03-16 16:39:17,302 (Thread-1) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:68)] Event: { headers:{} body:[B@6a47a0d4 }
2012-03-16 16:39:17,302 (Thread-1) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:68)] Event: { headers:{} body:[B@48ff4cf }
...

Congratulations! You have Apache Flume running!

Providing Feedback

For help building, configuring, and running Flume (NG or otherwise), the best place is always the user mailing list. Send an email to user-subscribe@flume.apache.org to subscribe and user@flume.apache.org to post once you've subscribed. The archives are available at http://mail-archives.apache.org/mod_mbox/incubator-flume-user/ (up through part of July 2012) and http://mail-archives.apache.org/mod_mbox/incubator-flume-user/http://mail-archives.apache.org/mod_mbox/flume-user/ (starting through part of July 2012 onwards).

If you believe you've found a bug or wish to file a feature request or improvement, don't be shy. Go to https://issues.apache.org/jira/browse/FLUME and file a JIRA for the version of Flume. For NG, please set the "Affects Version" to the appropriate milestone / release. Just leave any field you're not sure about blank. We'll bug you for details if we need them. Note that you must create an Apache JIRA account and log in before you can file issues.

 

因我的能力實在有限,不免會出現這樣那樣的話,但願你們不吝指教。

相關文章
相關標籤/搜索