flume安裝及配置介紹(二)

 注: 環境: skylin-linuxhtml

Flume的下載方式:  node

wget http://www.apache.org/dyn/closer.lua/flume/1.6.0/apache-flume-1.6.0-bin.tar.

下載完成以後,使用tar進行解壓linux

tar -zvxf  apache-flume-1.6..0-bin.tar.

進入flume的conf配置包中,使用命令touch flume.conf,而後cp flume-conf.properties.template flume.confexpress

 使vim/gedit flume.conf 編輯配置文件,須要說明的的是,Flume conf文件用的是Java版的property文件的key-value鍵值對模式.apache

  在Flume配置文件中,咱們須要vim

     1. 須要命名當前使用的Agent的名稱.app

     2. 命名Agent下的source的名字.less

     3. 命名Agent下的channal的名字.ui

     4. 命名Agent下的sink的名字.this

     5. 將source和sink經過channal綁定起來.

通常來講,在Flume中會存在着多個Agent,因此咱們須要給它們分別取一個名字來區分它們,注意名字不要相同,名字保持惟一!

例如:

#Agent取名爲 agent_name
#source 取名爲 source_name ,一次類推
agent_name.source = source_name
agent_name.channels = channel_name
agent_name.sinks = sink_name

上圖對應的是單個Agent,單個sink,單個channel狀況,以下圖

若是咱們須要在一個Agent上配置n個sink,m個channel(n>1, m>1),

那麼只須要這樣配置便可:

#Agent取名爲 agent_name
#source 取名爲 source_name ,一次類推
agent_name.source = source_name ,source_name1
agent_name.channels = channel_name,channel_name1
agent_name.sinks = sink_name,sink_name1

上面的配置就表示一個Agent中有兩個 source,sink,channel的狀況,如圖所示

 

以上是對多sink,channel,source狀況,對於 多個Agent,只須要給每一個Agent取一個獨一無二的名字便可!

Flume支持各類各樣的sources,sinks,channels,它們支持的類型以下:

Sources Channels Sinks
  • Avro Source
  • Thrift Source
  • Exec Source
  • JMS Source
  • Spooling Directory Source
  • Twitter 1% firehose Source
  • Kafka Source
  • NetCat Source
  • Sequence Generator Source
  • Syslog Sources
  • Syslog TCP Source
  • Multiport Syslog TCP Source
  • Syslog UDP Source
  • HTTP Source
  • Stress Source
  • Legacy Sources
  • Thrift Legacy Source
  • Custom Source
  • Scribe Source
  • Memory Channel
  • JDBC Channel
  • Kafka Channel
  • File Channel
  • Spillable Memory Channel
  • Pseudo Transaction Channel
  • HDFS Sink
  • Hive Sink
  • Logger Sink
  • Avro Sink
  • Thrift Sink
  • IRC Sink
  • File Roll Sink
  • Null Sink
  • HBaseSink
  • AsyncHBaseSink
  • MorphlineSolrSink
  • ElasticSearchSink
  • Kite Dataset Sink
  • Kafka Sink

 以上的類型,你能夠根據本身的需求來搭配組合使用,固然若是你願意,你能夠隨心所欲的搭配.好比咱們使用Avro source類型,採用Memory channel,使用HDFS sink存儲,那咱們的配置能夠接着上的配置這樣寫

#Agent取名爲 agent_name
#source 取名爲 source_name ,一次類推
agent_name.source = Avro
agent_name.channels = MemoryChannel
agent_name.sinks = HDFS

當你命名好Agent的組成部分後,你還須要對Agent的組成sources , sinks, channles去一一描述. 下面咱們來逐一的細說;


Source的配置

注: 須要特別說明,在Agent中對於存在的N(N>1)個source,其中的每個source都須要單獨進行配置,首先咱們須要對source的type進行設置,而後在對每個type進行對應的屬性設置.其通用的模式以下:

agent_name.sources. source_name.type = value 
agent_name.sources. source_name.property2 = value 
agent_name.sources. source_name.property3 = value 

具體的例子,好比咱們Source選用的是Avro模式

#Agent取名爲 agent_name
#source 取名爲 source_name ,一次類推
agent_name.source = Avro
agent_name.channels = MemoryChannel
agent_name.sinks = HDFS

#——————————sourcec配置——————————————#
agent_name.source.Avro.type = avro
agent_name.source.Avro.bind = localhost
agent_name.source.Avro.port = 9696
#將source綁定到MemoryChannel管道上
agent_name.source.Avro.channels = MemoryChannel 

Channels的配置

 Flume在source和sink配間提供各類管道(channels)來傳遞數據.於是和source同樣,它也須要配置屬性,同source同樣,對於N(N>0)個channels,

須要單個對它們注意設置屬性,它們的通用模板爲:

agent_name.channels.channel_name.type = value 
agent_name.channels.channel_name. property2 = value 
agent_name.channels.channel_name. property3 = value 

具體的例子,假如咱們選用memory channel類型,那麼我先要配置管道的類型

agent_name.channels.MemoryChannel.type = memory

可是咱們如今只是設置好了管道自個兒屬性,咱們還須要將其和sink,source連接起來,也就是綁定,綁定設置以下,咱們能夠分別寫在source,sink處,也能夠集中寫在channel處

agent_name.sources.Avro.channels = MemoryChannel
agent_name.sinks.HDFS.channels =  MemoryCHannel

Sink的配置

sink的配置和Source配置相似,它的通用格式:

agent_name.sinks. sink_name.type = value 
agent_name.sinks. sink_name.property2 = value 
agent_name.sinks. sink_name.property3 = value

具體例子,好比咱們設置Sink類型爲HDFS ,那麼咱們的配置單就以下:

agent_name.sinks.HDFS.type = hdfs
agent_name.sinks.HDFS.path = HDFS‘s path

以上就是對Flume的配置文件詳細介紹,下面在補全一張完整的配置圖:

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#  http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.


# The configuration file needs to define the sources, 
# the channels and the sinks.
# Sources, channels and sinks are defined per agent, 
# in this case called 'agent'

#define agent
agent.sources = seqGenSrc
agent.channels = memoryChannel
agent.sinks = loggerSink kafkaSink

#
# For each one of the sources, the type is defined
#默認模式 agent.sources.seqGenSrc.type = seq / netcat / avro
agent.sources.seqGenSrc.type = avro
agent.sources.seqGenSrc.bind = localhost
agent.sources.seqGenSrc.port = 9696
#####數據來源####
#agent.sources.seqGenSrc.coommand = tail -F /home/gongxijun/Qunar/data/data.log

# The channel can be defined as follows.
agent.sources.seqGenSrc.channels = memoryChannel

#+++++++++++++++定義sink+++++++++++++++++++++#
# Each sink's type must be defined


agent.sinks.loggerSink.type = logger
agent.sinks.loggerSink.type = hbase   
agent.sinks.loggerSink.channel = memoryChannel
#表名
agent.sinks.loggerSink.table = flume
#列名
agent.sinks.loggerSink.columnFamily= gxjun
agent.sinks.loggerSink.serializer = org.apache.flume.sink.hbase.MyHbaseEventSerializer 
#agent.sinks.loggerSink.serializer  = org.apache.flume.sink.hbase.RegexHbaseEventSerializer
agent.sinks.loggerSink.zookeeperQuorum=localhost:2181
agent.sinks.loggerSink.znodeParent= /hbase

#Specify the channel the sink should use
agent.sinks.loggerSink.channel = memoryChannel 

# Each channel's type is defined.
#memory
agent.channels.memoryChannel.type = memory
agent.channels.memortChhannel.keep-alive = 10

# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the memory channel
#agent.channels.memoryChannel.checkpointDir = /home/gongxijun/Qunar/data
#agent.channels.memoryChannel.dataDirs = /home/gongxijun/Qunar/data , /home/gongxijun/Qunar/tmpData
agent.channels.memoryChannel.capacity = 10000000
agent.channels.memoryChannel.transactionCapacity = 10000



#define the sink2 kafka

#+++++++++++++++定義sink+++++++++++++++++++++#
# Each sink's type must be defined


agent.sinks.kafkaSink.type = logger
agent.sinks.kafkaSink.type = org.apache.flume.sink.kafka.KafkaSink

agent.sinks.kafkaSink.channel = memoryChannel
#agent.sinks.kafkaSink.server=localhost:9092
agent.sinks.kafkaSink.topic= kafka-topic
agent.sinks.kafkaSink.batchSize = 20
agent.sinks.kafkaSink.brokerList = localhost:9092
#Specify the channel the sink should use
agent.sinks.kafkaSink.channel = memoryChannel 

該配置類型以下如所示:

 

參考資料:

http://www.tutorialspoint.com/apache_flume/apache_flume_configuration.htm

 

 

做者: 龔細軍

引用請註明出處:http://www.cnblogs.com/gongxijun/p/5661037.html

相關文章
相關標籤/搜索