Flume + zookeeper + kafka收集Nginx日誌

相關環境

軟件 版本
Centos 3.10.0-862.el7.x86_64
jdk 1.8
zookeeper 3.4.10
kafka 1.1.0
flume 1.6.0
Host IP
c1 192.168.1.200
c1_1 192.168.1.201
c1_2 192.168.1.202

用戶統一爲hadooplinux

前置操做

各主機間啓動ssh鏈接

這一步相當重要,若是沒有配置成功,會影響到hadoop,kafka集羣之間的鏈接nginx

[hadoop@c1 ~]$ ssh-keygen
[hadoop@c1 ~]$ sudo vim /etc/ssh/sshd_config
    ...
    PubkeyAuthentication yes
    ...
[hadoop@c1 ~]$ systemctl restart sshd
[hadoop@c1 ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys  # ssh 本機
[hadoop@c1 ~]$ sudo vim /etc/hosts  # 添加三臺主機對應ip host
    ...
    192.168.1.200 c1
    192.168.1.201 c1_1
    192.168.1.202 c1_2
    ...
[hadoop@c1 ~]$ ssh-copy-id -i  ~/.ssh/id_rsa.pub hadoop@c1_1
[hadoop@c1 ~]$ ssh-copy-id -i  ~/.ssh/id_rsa.pub hadoop@c1_2

其餘兩臺機器重複上面的操做便可.完成後,能夠ssh一下各臺機子(包括本機)是否還須要密碼shell

安裝軟件

# 下載jdk1.8+
[hadoop@c1 ~] mkdir -p ~/app/jdk1.8 && tar -zxvf jdk-8u171-linux-x64.tar.gz -C ~/app/jdk1.8
# 下載flume1.6
[hadoop@c1 ~] tar -zxvf apache-flume-1.6.0-bin.tar.gz -C ~/app/
# 下載zookeeper3.4.10
[hadoop@c1 ~] tar -zxvf zookeeper-3.4.10.tar.gz -C ~/app/
# 下載kafka_2.11-1.1.0
[hadoop@c1 ~] tar -xzf kafka_2.11-1.1.0.tgz -C ~/app/
# 環境變量
[hadoop@c1 ~] vim .bash_profile
    ...
    export JAVA_HOME=/home/hadoop/app/jdk1.8.0
    export PATH=$JAVA_HOME/bin:$PATH
    export FLUME_HOME=/home/hadoop/app/flume-1.6.0
    export PATH=$FLUME_HOME/bin:$PATH
    export ZK_HOME=/home/hadoop/app/zookeeper-3.4.10
    export PATH=$ZK_HOME/bin:$PATH
    export KAFKA_HOME=/home/hadoop/app/kafka_2.11-1.1.0
    export PATH=$KAFKA_HOME/bin:$PATH
    ...
[hadoop@c1 ~] source .bash_profile
# 複製軟件和環境變量到其餘主機
[hadoop@c1 ~]scp -r ~/app hadoop@c1_1:~
[hadoop@c1 ~]scp -r ~/app hadoop@c1_2:~
[hadoop@c1 ~]scp .bash_profile hadoop@c1_1:~
[hadoop@c1 ~]scp .bash_profile hadoop@c1_2:~
# 到其餘主機執行source .bash_profile

配置文件

  • flume配置文件apache

    # vim ${FLUME_HOME}/conf/nginx_kafka.conf
    nginx-kafka.sources = r1                                                                                     
    nginx-kafka.sinks = k1
    nginx-kafka.channels = c1
    
    nginx-kafka.sources.r1.type = exec
    nginx-kafka.sources.r1.command = tail -f /home/hadoop/data/access.log
    nginx-kafka.sources.r1.shell = /bin/sh -c
    # flume1.6 kafka sink 寫法
    nginx-kafka.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
    nginx-kafka.sinks.k1.brokerList = c1:9092
    nginx-kafka.sinks.k1.topic= nginxtopic
    nginx-kafka.sinks.k1.batchSize=10
    
    nginx-kafka.channels.c1.type = memory
    nginx-kafka.sources.r1.channels = c1
    nginx-kafka.sinks.k1.channel = c1
  • zookeeper配置文件bootstrap

    # cp ${ZK_HOME}/conf/zoo_simple.cfg ${ZK_HOME}/conf/zoo.cfg && vim ${ZK_HOME}/conf/zoo.cfg
    tickTime=2000
    initLimit=10
    syncLimit=5
    dataDir=/home/hadoop/data/zookeeper
    clientPort=2181
    # 注意,當前主機的zookeeper server不能設置hostname,必須是0.0.0.0 不然沒法鏈接
    server.1=0.0.0.0:2888:3888                                                   
    server.2=c1_1:2888:3888
    server.3=c1_2:2888:3888

    建立zookeeper集羣idvim

    echo "1">/home/hadoop/data/zookeeper/myid

    其餘主機重複相同操做,server.x須要和myid值一致,bash

  • kafka配置文件app

    kafka配置文件須要改動的只有幾個ssh

    # ${KAFKA_HOME}/config/server.properties
    broker.id=0
    host.name=c1
    listeners=PLAINTEXT://192.168.1.200:9092
    advertised.listeners=PLAINTEXT://c1:9092
    zookeeper.connect=c1:2181,c1_1:2181,c1_2:2181

    broker.id從0開始且在集羣中惟一ide

    listeners須要填上IP
    advertised.listeners須要填上hostname

    這裏我這麼設置是沒問題的,可是不清楚爲啥這麼設置

    其餘主機kafka配置文件一樣的操做

編寫集羣啓動腳本

  • zookeeper集羣腳本

    # vim start_zookeeper.sh
    #!/bin/bash
    echo "start zkServer..."
    for i in c1 c1_1 c1_2
        do
            ssh hadoop@$i "source ~/.bash_profile;zkServer.sh start"
        done
    # vim stop_zookeeper.sh
    #!/bin/bash
    echo "stop zkServer..."
    for i in c1 c1_1 c1_2
        do
            ssh hadoop@$i "source ~/.bash_profile;zkServer.sh stop"             
        done

    chmod a+x start_zookeeper.sh stop_zookeeper.sh

  • kafka集羣腳本

    # vim start_kafka.sh
    #!/bin/sh
    echo "start kafka..."
    for i in c1 c1_1 c1_2
    do
        ssh hadoop@$i "source ~/.bash_profile;kafka-server-start.sh -daemon ${KAFKA_HOME}/config/server.properties &"
        echo "done"                                                             
    done
    # vim stop_kafka.sh
    #!/bin/sh
    echo "stop kafka..."
    for i in c1 c1_1 c1_2
    do
        ssh hadoop@$i "source ~/.bash_profile;kafka-server-stop.sh"            
    done

    chmod a+x start_kafka.sh stop_kafka.sh

實戰

  1. 啓動程序

    # 啓動zookeeper
    [hadoop@c1 ~]$ ./start_zookeeper.sh
    [hadoop@c1 ~]$ zkServer.sh status
    ZooKeeper JMX enabled by default
    Using config: /home/hadoop/zookeeper-3.4.10/bin/../conf/zoo.cfg
    Mode: follower
    # 啓動kafka
    [hadoop@c1 ~]$ ./start_kafka.sh
    [hadoop@c1 ~]$ jps
    2953 QuorumPeerMain  # zookeeper 進程
    3291 Kafka  #kafka進程
    3359 Jps
  2. 建立Topic

    [hadoop@c1 ~]$ kafka-topics.sh --create --zookeeper c1:2181,c1_1:2181,c1_2:2181 --replication-factor 3 --partitions 1 --topic nginxtopic
  3. 檢查Topic

    [hadoop@c1 ~]$ kafka-topics.sh --zookeeper c1:2181,c1_1:2181,c1_2:2181 --list
    nginx
  4. 啓動消費者

    [hadoop@c1 ~]$ kafka-console-consumer.sh --bootstrap-server c1:9092,c1_1:9092,c1_2:9092 -topic nginxtopic --from-beginning
  5. 模擬日誌

    #vim create_log.sh
    ---
    #!/bin/sh
    # access.log-xxx 等多個文件是生產環境拖下來的真是日誌
    cat access.log-*| while read -r line
    do
    echo $line >> /home/hadoop/logs/access.log
    sleep 0.$(($RANDOM%5+1))  # 防止日誌寫入過快
    done
  6. 啓動flume

    新開一個窗口

    [hadoop@c1 ~]$ flume-ng agent --conf-file  conf/nginx_kafka.conf -c conf/ --name nginx-kafka -Dflume.root.logger=DEBUG,console

稍等片刻後

flume輸出日誌

圖片描述

kafka-console-consume 輸出日誌

圖片描述

至此項目已經徹底跑起來了~

錯誤排查及解決

  • not in the sudoers file. This incident will be reported
    沒有sudo的操做權限,須要在root權限下編輯/etc/sudoer

    ...
    ## Allow root to run any commands anywhere 
    root    ALL=(ALL)   ALL
    hadoop  ALL=(ALL)   ALL
    ...
  • 已經添加過ssh仍須要輸入密碼

    chmod 700 ~/.ssh
    chmod 644 ~/.ssh/authorized_keys
  • zookeeper: It is probably not running

    1. 有多是ssh沒法免密訪問其餘主機
    2. 有多是沒有正確的寫myid

能夠在zookeeper.out 查看詳細的錯誤信息

相關文章
相關標籤/搜索