Flume初步學習

時間 2019-11-08

標籤 flume 初步學習欄目日誌分析简体版

原文原文鏈接

1、Flume基礎部分：

Flume -- 日誌收集框架

產生背景：

日誌分散到各個機器上，又想用大數據平臺進行統計分析

從其餘server把日誌移動收集到集羣上，並可以監控，須要有時效性、容錯性、負載均衡

Flume 通常經過配置configuration file，來實現各類數據的收集

概述：

flume.apache.org

分佈式、高可靠、高可用、高效、高擴展性

收集、聚合、移動大量的日誌數據

webserver ==> flume ==> HDFS

基於流式數據的簡潔框架，有容錯機構

支持在線應用

只須要管理Agent的配置就行

同類框架對比：

Scribe：FaceBook C語言再也不維護

Chukwa：Yahoo Java 再也不維護

以上的負載特性都很差

Fluentd：Ruby開發

Logstash：ELK的其中一個組件，也用得比較多

Flume：由Cloudera/Apache開發 Java，用的多

通常用1.5版本之後的Flume NG

Flume的架構和組件：

Source收集、Channel彙集、Sink輸出

官網上的Document裏面有具體介紹

Channel：緩存池

多個寫到一個：

一個寫到多個：

2、Flume實戰部分：

配置Flume：

conf目錄下：

拷貝flume-env.sh.template

設置JAVA_HOME路徑

啓動bin目錄下的flume-ng

使用Flume的關鍵是寫agent配置文件

實戰一：

從指定的網絡端口採集數據輸出到控制檯

netcat source + memory channel + logger sink

配置agent文件：

# example.conf: A single-node Flume configuration

# Name the components on this agent

a1.sources = r1

a1.sinks = k1

a1.channels = c1

# Describe/configure the source

a1.sources.r1.type = netcat

a1.sources.r1.bind = localhost

a1.sources.r1.port = 44444

# Describe the sink

a1.sinks.k1.type = logger

# Use a channel which buffers events in memory

a1.channels.c1.type = memory

#a1.channels.c1.capacity = 1000

#a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

source能夠對應channel，而sink只能對應一個channel

shell命令：

bin/flume-ng agent --conf conf --conf-file example.conf --name a1 -Dflume.root.logger=INFO,console

—conf 全局配置文件

--conf-file 單個agent配置文件

—name agent的名稱

console 顯示到控制檯

開始以上腳本後，在另外一個控制檯：

使用telnet鏈接：

telnet localhost（主機名） 44444（端口號）

[INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{} body: 6B 69 6E 67 68 65 79 0D kinghey. }

以上是接受到數據的控制檯信息，event是flume傳輸的基本單元

Mac中control + C退出flume

實戰二：

監控一個文件實時採集新增的數據輸出到控制檯

agent的選型：

exec source + memory channel + logger sink

按照官方文檔改agent文件就好了

# Name the components on this agent

a1.sources = r1

a1.sinks = k1

a1.channels = c1

# Describe/configure the source

a1.sources.r1.type = exec

a1.sources.r1.command = tail -F /usr/local/mycode/data/data.log //監控的文件名

a1.sources.r1.shell = /bin/sh -c

# Describe the sink

a1.sinks.k1.type = logger

# Use a channel which buffers events in memory

a1.channels.c1.type = memory

# Bind the source and sink to the channel

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

實戰三：

將A服務器上的日誌實時採集到B服務器

機器A： exec source + memory channel + avro sink

機器B： avro source + memory channel + logger sink

當須要兩臺機器進行通訊時，通常用avro進行數據傳輸：

此實例就是把機器A的log文件的新增數據，經過avro方式傳到機器B，並顯示在控制檯上

設置時，關鍵是把兩邊的主機名（hostname、bind）和端口（port）對應好

啓動時，先啓動接收端（機器B），才能開啓端口，後啓動機器A

接收時，只有機器B的控制檯有顯示，機器A只是放在了avro sink

1. flume學習01-flume介紹
2. perl 學習初步
3. Qt初步學習
4. python初步學習
5. 初步學習OpenStreetMap
6. wireshark初步學習
7. Servlet初步學習
8. 初步學習cvs
9. MySQL初步學習
10. VIM 初步學習
更多相關文章...
• 您已經學習了 XML Schema，下一步學習什麼呢？ - XML Schema 教程
• 我們已經學習了 SQL，下一步學習什麼呢？ - SQL 教程
• 適用於PHP初學者的學習線路和建議
• Tomcat學習筆記（史上最全tomcat學習筆記）

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。