軟件 | 版本 |
---|---|
Centos | 3.10.0-862.el7.x86_64 |
jdk | 1.8 |
zookeeper | 3.4.10 |
kafka | 1.1.0 |
flume | 1.6.0 |
Host | IP |
---|---|
c1 | 192.168.1.200 |
c1_1 | 192.168.1.201 |
c1_2 | 192.168.1.202 |
用戶統一爲hadoop
linux
這一步相當重要,若是沒有配置成功,會影響到hadoop,kafka集羣之間的鏈接nginx
[hadoop@c1 ~]$ ssh-keygen [hadoop@c1 ~]$ sudo vim /etc/ssh/sshd_config ... PubkeyAuthentication yes ... [hadoop@c1 ~]$ systemctl restart sshd [hadoop@c1 ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys # ssh 本機 [hadoop@c1 ~]$ sudo vim /etc/hosts # 添加三臺主機對應ip host ... 192.168.1.200 c1 192.168.1.201 c1_1 192.168.1.202 c1_2 ... [hadoop@c1 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@c1_1 [hadoop@c1 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@c1_2
其餘兩臺機器重複上面的操做便可.完成後,能夠ssh一下各臺機子(包括本機)是否還須要密碼shell
# 下載jdk1.8+ [hadoop@c1 ~] mkdir -p ~/app/jdk1.8 && tar -zxvf jdk-8u171-linux-x64.tar.gz -C ~/app/jdk1.8 # 下載flume1.6 [hadoop@c1 ~] tar -zxvf apache-flume-1.6.0-bin.tar.gz -C ~/app/ # 下載zookeeper3.4.10 [hadoop@c1 ~] tar -zxvf zookeeper-3.4.10.tar.gz -C ~/app/ # 下載kafka_2.11-1.1.0 [hadoop@c1 ~] tar -xzf kafka_2.11-1.1.0.tgz -C ~/app/ # 環境變量 [hadoop@c1 ~] vim .bash_profile ... export JAVA_HOME=/home/hadoop/app/jdk1.8.0 export PATH=$JAVA_HOME/bin:$PATH export FLUME_HOME=/home/hadoop/app/flume-1.6.0 export PATH=$FLUME_HOME/bin:$PATH export ZK_HOME=/home/hadoop/app/zookeeper-3.4.10 export PATH=$ZK_HOME/bin:$PATH export KAFKA_HOME=/home/hadoop/app/kafka_2.11-1.1.0 export PATH=$KAFKA_HOME/bin:$PATH ... [hadoop@c1 ~] source .bash_profile # 複製軟件和環境變量到其餘主機 [hadoop@c1 ~]scp -r ~/app hadoop@c1_1:~ [hadoop@c1 ~]scp -r ~/app hadoop@c1_2:~ [hadoop@c1 ~]scp .bash_profile hadoop@c1_1:~ [hadoop@c1 ~]scp .bash_profile hadoop@c1_2:~ # 到其餘主機執行source .bash_profile
flume配置文件apache
# vim ${FLUME_HOME}/conf/nginx_kafka.conf nginx-kafka.sources = r1 nginx-kafka.sinks = k1 nginx-kafka.channels = c1 nginx-kafka.sources.r1.type = exec nginx-kafka.sources.r1.command = tail -f /home/hadoop/data/access.log nginx-kafka.sources.r1.shell = /bin/sh -c # flume1.6 kafka sink 寫法 nginx-kafka.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink nginx-kafka.sinks.k1.brokerList = c1:9092 nginx-kafka.sinks.k1.topic= nginxtopic nginx-kafka.sinks.k1.batchSize=10 nginx-kafka.channels.c1.type = memory nginx-kafka.sources.r1.channels = c1 nginx-kafka.sinks.k1.channel = c1
zookeeper配置文件bootstrap
# cp ${ZK_HOME}/conf/zoo_simple.cfg ${ZK_HOME}/conf/zoo.cfg && vim ${ZK_HOME}/conf/zoo.cfg tickTime=2000 initLimit=10 syncLimit=5 dataDir=/home/hadoop/data/zookeeper clientPort=2181 # 注意,當前主機的zookeeper server不能設置hostname,必須是0.0.0.0 不然沒法鏈接 server.1=0.0.0.0:2888:3888 server.2=c1_1:2888:3888 server.3=c1_2:2888:3888
建立zookeeper集羣idvim
echo "1">/home/hadoop/data/zookeeper/myid
其餘主機重複相同操做,server.x
須要和myid值一致,bash
kafka配置文件app
kafka配置文件須要改動的只有幾個ssh
# ${KAFKA_HOME}/config/server.properties broker.id=0 host.name=c1 listeners=PLAINTEXT://192.168.1.200:9092 advertised.listeners=PLAINTEXT://c1:9092 zookeeper.connect=c1:2181,c1_1:2181,c1_2:2181
broker.id
從0開始且在集羣中惟一ide
listeners
須要填上IPadvertised.listeners
須要填上hostname
這裏我這麼設置是沒問題的,可是不清楚爲啥這麼設置
其餘主機kafka配置文件一樣的操做
zookeeper集羣腳本
# vim start_zookeeper.sh #!/bin/bash echo "start zkServer..." for i in c1 c1_1 c1_2 do ssh hadoop@$i "source ~/.bash_profile;zkServer.sh start" done
# vim stop_zookeeper.sh #!/bin/bash echo "stop zkServer..." for i in c1 c1_1 c1_2 do ssh hadoop@$i "source ~/.bash_profile;zkServer.sh stop" done
chmod a+x start_zookeeper.sh stop_zookeeper.sh
kafka集羣腳本
# vim start_kafka.sh #!/bin/sh echo "start kafka..." for i in c1 c1_1 c1_2 do ssh hadoop@$i "source ~/.bash_profile;kafka-server-start.sh -daemon ${KAFKA_HOME}/config/server.properties &" echo "done" done
# vim stop_kafka.sh #!/bin/sh echo "stop kafka..." for i in c1 c1_1 c1_2 do ssh hadoop@$i "source ~/.bash_profile;kafka-server-stop.sh" done
chmod a+x start_kafka.sh stop_kafka.sh
啓動程序
# 啓動zookeeper [hadoop@c1 ~]$ ./start_zookeeper.sh [hadoop@c1 ~]$ zkServer.sh status ZooKeeper JMX enabled by default Using config: /home/hadoop/zookeeper-3.4.10/bin/../conf/zoo.cfg Mode: follower # 啓動kafka [hadoop@c1 ~]$ ./start_kafka.sh [hadoop@c1 ~]$ jps 2953 QuorumPeerMain # zookeeper 進程 3291 Kafka #kafka進程 3359 Jps
建立Topic
[hadoop@c1 ~]$ kafka-topics.sh --create --zookeeper c1:2181,c1_1:2181,c1_2:2181 --replication-factor 3 --partitions 1 --topic nginxtopic
檢查Topic
[hadoop@c1 ~]$ kafka-topics.sh --zookeeper c1:2181,c1_1:2181,c1_2:2181 --list nginx
啓動消費者
[hadoop@c1 ~]$ kafka-console-consumer.sh --bootstrap-server c1:9092,c1_1:9092,c1_2:9092 -topic nginxtopic --from-beginning
模擬日誌
#vim create_log.sh --- #!/bin/sh # access.log-xxx 等多個文件是生產環境拖下來的真是日誌 cat access.log-*| while read -r line do echo $line >> /home/hadoop/logs/access.log sleep 0.$(($RANDOM%5+1)) # 防止日誌寫入過快 done
啓動flume
新開一個窗口
[hadoop@c1 ~]$ flume-ng agent --conf-file conf/nginx_kafka.conf -c conf/ --name nginx-kafka -Dflume.root.logger=DEBUG,console
稍等片刻後
flume輸出日誌
kafka-console-consume 輸出日誌
至此項目已經徹底跑起來了~
not in the sudoers file. This incident will be reported
沒有sudo的操做權限,須要在root權限下編輯/etc/sudoer
... ## Allow root to run any commands anywhere root ALL=(ALL) ALL hadoop ALL=(ALL) ALL ...
已經添加過ssh仍須要輸入密碼
chmod 700 ~/.ssh chmod 644 ~/.ssh/authorized_keys
zookeeper: It is probably not running
能夠在zookeeper.out
查看詳細的錯誤信息