好程序員大數據學習路線之Logstach與flume對比,沒有集羣的概念,logstach與flume都稱爲組
logstash是用JRuby語言開發的
組件的對比:
logstach : input filter output
flume : source channel sink
優劣對比:
logstach :
安裝簡單,安裝體積小
有filter組件,使得該工具具備數據過濾,數據切分的功能
能夠與ES無縫結合
具備數據容錯功能,在數據採集的時候,若是發生宕機或斷開的狀況,會斷點續傳(會記錄讀取的偏移量)
綜上,該工具主要用途爲採集日誌數據
flume:
高可用方面要比logstach強大
flume一直在強調數據的安全性,flume在數據傳輸過程當中是由事務控制的
flume能夠應用在多類型數據傳輸領域
數據對接
將logstach.gz文件上傳解壓便可
能夠在logstach目錄下建立conf文件,用來存儲配置文件
一 命令啓動
1.bin/logstash -e 'input { stdin {} } output { stdout{} }'
stdin/stdout(標準輸入輸出流)
hello xixi
2018-09-12T21:58:58.649Z hadoop01 hello xixi
hello haha
2018-09-12T21:59:19.487Z hadoop01 hello haha
2.bin/logstash -e 'input { stdin {} } output { stdout{codec => rubydebug} }'
hello xixi
{node
"message" => "hello xixi", "@version" => "1", "@timestamp" => "2018-09-12T22:00:49.612Z", "host" => "hadoop01"
}
3.es集羣中 ,須要啓動es集羣
bin/logstash -e 'input { stdin {} } output { elasticsearch {hosts => ["192.168.88.81:9200"]} stdout{} }'
輸入命令後,es自動生成index,自動mapping.
hello haha
2018-09-12T22:13:05.361Z hadoop01 hehello haha
bin/logstash -e 'input { stdin {} } output { elasticsearch {hosts => ["192.168.88.81:9200", "192.168.88.82:9200"]} stdout{} }'
4.kafka集羣中,啓動kafka集羣
bin/logstash -e 'input { stdin {} } output { elasticsearch {hosts => ["192.168.88.81:9200", "192.168.88.82:9200"]} stdout{} }'
二 配置文件啓動
須要啓動zookeeper集羣,kafka集羣,es集羣
1.與kafka數據對接
vi logstash-kafka.conf
啓動
bin/logstash -f logstash-kafka.conf (-f:指定文件)
在另外一節點上啓動kafka消費命令
input {
file {程序員
path => "/root/data/test.log" discover_interval => 5 start_position => "beginning"
}
}json
output {bootstrap
kafka { topic_id => "test1" codec => plain { format => "%{message}" charset => "UTF-8" } bootstrap_servers => "node01:9092,node02:9092,node03:9092" }
}
2.與kafka-es數據對接
vi logstash-es.conf安全
bin/logstash -f logstash-es.conf
在另外一節點上啓動kafka消費命令
input {ruby
file { type => "gamelog" path => "/log/*/*.log" discover_interval => 10 start_position => "beginning" }
}app
output {elasticsearch
elasticsearch { index => "gamelog-%{+YYYY.MM.dd}" hosts => ["node01:9200", "node02:9200", "node03:9200"] }
}
數據對接過程
logstach節點存放: 哪一個節點空閒資源多放入哪一個節點 (靈活存放)工具
1.啓動logstach監控logserver目錄,把數據採集到kafka
2.啓動另一個logstach,監控kafka某個topic數據,把他採集到elasticsearch
數據對接案例
須要啓動兩個logstach,調用各個配置文件,進行對接
1.採集數據到kafka
cd conf
建立配置文件: vi gs-kafka.conf
input {
file {oop
codec => plain { charset => "GB2312" } path => "/root/basedir/*/*.txt" discover_interval => 5 start_position => "beginning"
}
}
output {
kafka { topic_id => "gamelogs" codec => plain { format => "%{message}" charset => "GB2312" } bootstrap_servers => "node01:9092,node02:9092,node03:9092" }
}
建立kafka對應的topic
bin/kafka-topics.sh --create --zookeeper hadoop01:2181 --replication-factor 1 --partitions 1 --topic gamelogs
2.在hadoop01上啓動logstach
bin/logstash -f conf/gs-kafka.conf
3.在hadoop02上啓動另一個logstach
cd logstach/conf
vi kafka-es.conf
input {
kafka {
type => "accesslogs" codec => "plain" auto_offset_reset => "smallest" group_id => "elas1" topic_id => "accesslogs" zk_connect => "node01:2181,node02:2181,node03:2181"
}
kafka {
type => "gamelogs" auto_offset_reset => "smallest" codec => "plain" group_id => "elas2" topic_id => "gamelogs" zk_connect => "node01:2181,node02:2181,node03:2181"
}
}
filter {
if [type] == "accesslogs" {
json { source => "message" remove_field => [ "message" ] target => "access" }
}
if [type] == "gamelogs" {
mutate { split => { "message" => " " } add_field => { "event_type" => "%{message[3]}" "current_map" => "%{message[4]}" "current_X" => "%{message[5]}" "current_y" => "%{message[6]}" "user" => "%{message[7]}" "item" => "%{message[8]}" "item_id" => "%{message[9]}" "current_time" => "%{message[12]}" } remove_field => [ "message" ]
}
}
}
output {
if [type] == "accesslogs" {
elasticsearch { index => "accesslogs" codec => "json" hosts => ["node01:9200", "node02:9200", "node03:9200"] }
}
if [type] == "gamelogs" {
elasticsearch { index => "gamelogs1" codec => plain { charset => "UTF-16BE" } hosts => ["node01:9200", "node02:9200", "node03:9200"] }
}
}
bin/logstash -f conf/kafka-es.conf
4.修改basedir文件中任意數據便可產生es的index文件
5.網頁數據存儲在設置的/data/esdata中
6.在網頁中查找指定字段
默認分詞器爲term,只能查找單個漢字,query_string能夠查找全漢字