背景: node
最近公司的3節點kafka集羣,發現有2個節點所在的刀箱交換機有故障風險,會隨機性的出現端口up/down的狀況。 所以須要臨時將這2個broker遷移出來,等交換機修復後再遷移回去。web
下面是實驗模擬的整個過程(擴容+縮容)vim
原先的3節點的kafka假設爲node一、node二、node3bash
準備2臺空閒點的服務器(這裏假設爲node4和node5)服務器
系統版本:CentOS7app
node1 192.168.2.187socket
node2 192.168.2.188ide
node3 192.168.2.189測試
node4 192.168.2.190ui
node5 192.168.2.191
kafka的擴容操做分爲2步:
一、zk 節點擴容
二、kafka 節點擴容
首先在node4 node5上把相關的軟件部署好:
cd /root/ tar xf zookeeper-3.4.9.tar.gz tar xf kafka_2.11-0.10.1.0.tar.gz tar xf jdk1.8.0_101.tar.gz mv kafka_2.11-0.10.1.0 zookeeper-3.4.9 jdk1.8.0_101 /usr/local/ cd /usr/local/ ln -s zookeeper-3.4.9 zookeeper-default ln -s kafka_2.11-0.10.1.0 kafka-default ln -s jdk1.8.0_101 jdk-default
第一部分:zk節點的擴容:
一、在node4上執行:
mkdir /usr/local/zookeeper-default/data/ vim /usr/local/zookeeper-default/conf/zoo.cfg 在原有的基礎上,增長最後的2行配置代碼: tickTime=2000 initLimit=10 syncLimit=5 dataDir=/usr/local/zookeeper-default/data/ clientPort=2181 maxClientCnxns=2000 maxSessionTimeout=240000 server.1=192.168.2.187:2888:3888 server.2=192.168.2.188:2888:3888 server.3=192.168.2.189:2888:3888 server.4=192.168.2.190:2888:3888 server.5=192.168.2.191:2888:3888 ## 清空目錄防止有髒數據 rm -fr /usr/local/zookeeper-default/data/* ## 添加對應的myid文件到zk數據目錄下 echo 4 > /usr/local/zookeeper-default/data/myid
二、啓動node4的zk進程:
/usr/local/zookeeper-default/bin/zkServer.sh start /usr/local/zookeeper-default/bin/zkServer.sh status 相似以下效果: ZooKeeper JMX enabled by default Using config: /usr/local/zookeeper-default/bin/../conf/zoo.cfg Mode: follower /usr/local/zookeeper-default/bin/zkCli.sh echo stat | nc 127.0.0.1 2181 結果相似以下: Zookeeper version: 3.4.9-1757313, built on 08/23/2016 06:50 GMT Clients: /127.0.0.1:50072[1](queued=0,recved=6,sent=6) /127.0.0.1:50076[0](queued=0,recved=1,sent=0) Latency min/avg/max: 0/2/13 Received: 24 Sent: 23 Connections: 2 Outstanding: 0 Zxid: 0x10000009a Mode: follower Node count: 63
三、在node5上執行:
vim /usr/local/zookeeper-default/conf/zoo.cfg 增長最後的2行代碼: tickTime=2000 initLimit=10 syncLimit=5 dataDir=/usr/local/zookeeper-default/data/ clientPort=2181 maxClientCnxns=2000 maxSessionTimeout=240000 server.1=192.168.2.187:2888:3888 server.2=192.168.2.188:2888:3888 server.3=192.168.2.189:2888:3888 server.4=192.168.2.190:2888:3888 server.5=192.168.2.191:2888:3888 ## 清空目錄防止有髒數據 rm -fr /usr/local/zookeeper-default/data/* ## 添加對應的myid文件到zk數據目錄下 echo 5 > /usr/local/zookeeper-default/data/myid
四、啓動node5的zk進程:
/usr/local/zookeeper-default/bin/zkServer.sh start /usr/local/zookeeper-default/bin/zkServer.sh status echo stat | nc 127.0.0.1 2181 結果相似以下: Zookeeper version: 3.4.9-1757313, built on 08/23/2016 06:50 GMT Clients: /127.0.0.1:45582[0](queued=0,recved=1,sent=0) Latency min/avg/max: 0/0/0 Received: 3 Sent: 2 Connections: 1 Outstanding: 0 Zxid: 0x10000009a Mode: follower Node count: 63 也可使用 echo mntr | nc 127.0.0.1 2181 這個結果更詳細,相似以下: zk_version3.4.9-1757313, built on 08/23/2016 06:50 GMT zk_avg_latency0 zk_max_latency194 zk_min_latency0 zk_packets_received101436 zk_packets_sent102624 zk_num_alive_connections4 zk_outstanding_requests0 zk_server_statefollower zk_znode_count141 zk_watch_count190 zk_ephemerals_count7 zk_approximate_data_size10382 zk_open_file_descriptor_count35 zk_max_file_descriptor_count102400
五、當咱們確認 新加的2個zk節點沒問題後,咱們須要去修改以前的老的3臺zk的配置,而後重啓這3個zk
修改 node1 node2 node3的 zk配置,以下:
vim /usr/local/zookeeper-default/conf/zoo.cfg 增長最後的2行代碼: tickTime=2000 initLimit=10 syncLimit=5 dataDir=/usr/local/zookeeper-default/data/ clientPort=2181 maxClientCnxns=2000 maxSessionTimeout=240000 server.1=192.168.2.187:2888:3888 server.2=192.168.2.188:2888:3888 server.3=192.168.2.189:2888:3888 server.4=192.168.2.190:2888:3888 server.5=192.168.2.191:2888:3888
注意重啓的時候,咱們先重啓 follower節點(例如我這裏follower是 node二、node3,leader是 node1)
/usr/local/zookeeper-default/bin/zkServer.sh stop /usr/local/zookeeper-default/bin/zkServer.sh status /usr/local/zookeeper-default/bin/zkServer.sh start /usr/local/zookeeper-default/bin/zkServer.sh status
第二部分:kafka節點的擴容:
一、node4 (192.168.2.190)上修改:
mkdir -pv /usr/local/kafka-default/kafka-logs vim /usr/local/kafka-default/config/server.properties 修改後的文件以下: broker.id=4 # 注意修改這裏 listeners=PLAINTEXT://:9094,TRACE://:9194 advertised.listeners=PLAINTEXT://192.168.2.190:9094 num.network.threads=3 num.io.threads=8 socket.send.buffer.bytes=102400 socket.receive.buffer.bytes=102400 socket.request.max.bytes=104857600 log.dirs=/usr/local/kafka-default/kafka-logs num.partitions=3 num.recovery.threads.per.data.dir=1 log.retention.hours=24 log.segment.bytes=1073741824 log.retention.check.interval.ms=300000 zookeeper.connect=192.168.2.187:2181,192.168.2.188:2181,192.168.2.189:2181,192.168.2.190:2181,192.168.2.191:2181 # 注意修改這裏 zookeeper.connection.timeout.ms=6000 default.replication.factor=2 compression.type=gzip offsets.retention.minutes=2880 controlled.shutdown.enable=true delete.topic.enable=true
二、啓動node4的kafka程序:
/usr/local/kafka-default/bin/kafka-server-start.sh -daemon /usr/local/kafka-default/config/server.properties
三、node5(192.168.2.191)上修改
mkdir -pv /usr/local/kafka-default/kafka-logs vim /usr/local/kafka-default/config/server.properties 修改後的文件以下: broker.id=5 # 注意修改這裏 listeners=PLAINTEXT://:9094,TRACE://:9194 advertised.listeners=PLAINTEXT://192.168.2.191:9094 num.network.threads=3 num.io.threads=8 socket.send.buffer.bytes=102400 socket.receive.buffer.bytes=102400 socket.request.max.bytes=104857600 log.dirs=/usr/local/kafka-default/kafka-logs num.partitions=3 num.recovery.threads.per.data.dir=1 log.retention.hours=24 log.segment.bytes=1073741824 log.retention.check.interval.ms=300000 zookeeper.connect=192.168.2.187:2181,192.168.2.188:2181,192.168.2.189:2181,192.168.2.190:2181,192.168.2.191:2181 # 注意修改這裏 zookeeper.connection.timeout.ms=6000 default.replication.factor=2 compression.type=gzip offsets.retention.minutes=2880 controlled.shutdown.enable=true delete.topic.enable=true
四、啓動node5的kafka程序:
/usr/local/kafka-default/bin/kafka-server-start.sh -daemon /usr/local/kafka-default/config/server.properties
五、測試是否有問題
這裏咱們能夠本身先用 kafka-console-producer.sh 和 kafka-console-consumer.sh 自測下是否 正常工做,而後看看 kafka-manager上是否有須要從新均衡的副本。。
第三部分:對存在風險broker節點的數據遷移(我這裏須要這麼操做,單純的擴容不須要這個步驟):
這裏咱們可使用kafka-manager這個web平臺來作 topic的遷移操做,很簡單,這裏就不截圖了。
第四部分: 對node2 node3下線操做
一、關閉node2 node3節點上面的zk進程,讓zk leader節點自動選舉
二、關閉node2 node3上面的kafka進程,讓kafka controller節點自動選舉
## 可能遇到的問題:
在遷移過程當中,遇到consumergroup在咱們遷移topic的時候發生異常,讓業務方重啓了consumer後 報錯消失。。