360度測試：KAFKA會丟數據麼？其高可用是否知足需求？

時間 2019-12-04

標籤測試 kafka 數據可用是否知足需求欄目 Kafka 简体版

原文原文鏈接

請仔細瞭解這張圖，尤爲注意有標誌的幾個關注點。咱們會不止一次回到這張圖上算法

背景

Kafka到底可以應用在高可用的業務上？官方給出的答案是確定的，最新版，已經支持消息隊列的事務，但咱們對其性能是有疑問的。
Kafka根據配置的ACK級別，其性能表現將特別大，爲了找到其適用場景，特作此測試，以便應用kafka時可以靈活應對。
測試過程還探討了許多丟消息的場景。相對於大多數僅僅針對kafka集羣自己的測試，本測試還介紹了丟消息的業務場景。整個方案應該是一個總體，纔可以達到最高級別的高可用，不因該區別對待。apache

測試目標

集羣高可用，以及須要知足高可用時須要的最小集羣大小和相關配置以及限制等
消息不丟失，以及爲了知足消息不丟失須要的配置和約定等
測試環境

broker：微信

3臺機器
8 core 
16G 
1T SSD 
Centos 6.8
kafka_2.12-0.10.2.0 
broker jvm參數配置：Xms=8G Xmx=8G

client:網絡

8 core 
16G 
Centos 6.8

原創文章，轉載註明出處 (http://sayhiai.com)session

測試場景

集羣高可靠性配置：架構

zookeeper.connection.timeout.ms=15000
zookeeper.session.timeout.ms=15000
default.replication.factor=3
num.partitions=6
 min.insync.replicas=2
 unclean.leader.election.enable=false
log.flush.interval.ms=1000

ack併發

acks= all
 retries = 3
 request.timeout.ms=5000

消息大小：1024byte異步

原創文章，轉載註明出處 (http://sayhiai.com)jvm

failover 測試

測試方法

下線一個節點，測試故障的恢復時間和故障期間的服務水平工具

測試過程

將 replica.lag.time.max.ms 從 10s 調整爲 60s（延長時間方便觀察），而後 kill Broker 0，挑選 3 個 partition，觀察 ISR 變化以下：
其中，第二 / 三階段入隊成功率受損：

第二階段期間，Partition 96/97/98 均沒法寫入，入隊成功率成功率降低至 0%。
第三階段期間，Partition 96 可繼續寫入，但 Partition 97/98 沒法寫入，由於寫入要等 Broker 0 回 ack，但 Broker 0 已 kill，入隊成功率降低至 33%。

而實際觀察，第二 / 三階段期間徹底沒吞吐，緣由是壓測工具不斷報鏈接失敗，中止了寫入。

緣由分析

Kafka Broker leader 是經過 Controller 選舉出來的，ISR 列表是 leader 維護的。
前者的的租約是 Controller 定義的，後者的租約是 Broker 配置 replica.lag.time.max.ms 指定的。
因此，第二階段持續時間較短，是 Controller 的租約時間決定的，第三階段持續時間較長，是 replica.lag.time.max.ms 決定的。
當 Broker 0 被 kill 時，前者影響原本 Broker 0 是 leader 的 1/3 partitions 的入隊成功率，後者影響 Broker 0 做爲 follower 的 2/3 partitions 的入隊成功率。

HA結論

kafka在failover期間，會有大約10秒的不可用時間，該時間由 replica.lag.time.max.ms 決定。所以應用程序須要處理此種狀況下的異常信息，設置合理的重試次數和退避算法。

原創文章，轉載註明出處 (http://sayhiai.com)

壓力測試

測試方法

測試腳本：

./kafka-producer-perf-test.sh --topic test003 --num-records 1000000 --record-size 1024  --throughput -1 --producer.config ../config/producer.properties

測試結果

不限制併發吞吐量

[root@l-monitor-logstash2.pub.prod.aws.dm bin]# time ./kafka-producer-perf-test.sh --topic ack001 --num-records 1000000 --record-size 1024 --throughput -1 --producer.config ../config/producer.properties
[2017-09-14 21:26:57,543] WARN Error while fetching metadata with correlation id 1 : {ack001=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
81112 records sent, 16219.2 records/sec (15.84 MB/sec), 1416.2 ms avg latency, 1779.0 max latency.
92070 records sent, 18414.0 records/sec (17.98 MB/sec), 1671.7 ms avg latency, 1821.0 max latency.
91860 records sent, 18368.3 records/sec (17.94 MB/sec), 1670.3 ms avg latency, 1958.0 max latency.
91470 records sent, 18294.0 records/sec (17.87 MB/sec), 1672.3 ms avg latency, 2038.0 max latency.
91050 records sent, 18202.7 records/sec (17.78 MB/sec), 1678.9 ms avg latency, 2158.0 max latency.
92670 records sent, 18534.0 records/sec (18.10 MB/sec), 1657.6 ms avg latency, 2223.0 max latency.
89040 records sent, 17808.0 records/sec (17.39 MB/sec), 1715.0 ms avg latency, 2481.0 max latency.
86370 records sent, 17274.0 records/sec (16.87 MB/sec), 1767.5 ms avg latency, 2704.0 max latency.
91290 records sent, 18254.3 records/sec (17.83 MB/sec), 1670.2 ms avg latency, 2553.0 max latency.
92220 records sent, 18444.0 records/sec (18.01 MB/sec), 1658.1 ms avg latency, 2626.0 max latency.
90240 records sent, 18048.0 records/sec (17.63 MB/sec), 1669.9 ms avg latency, 2733.0 max latency.
1000000 records sent, 17671.591150 records/sec (17.26 MB/sec), 1670.61 ms avg latency, 2764.00 ms max latency, 1544 ms 50th, 2649 ms 95th, 2722 ms 99th, 2753 ms 99.9th.
real 0m57.409s
user 0m14.544s
sys 0m2.072s

限制吞吐量 1w

[root@l-monitor-logstash2.pub.prod.aws.dm bin]# time ./kafka-producer-perf-test.sh --topic ack003 --num-records 1000000 --record-size 1024 --throughput 10000 --producer.config ../config/producer.properties
[2017-09-15 10:51:53,184] WARN Error while fetching metadata with correlation id 1 : {ack003=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
[2017-09-15 10:51:53,295] WARN Error while fetching metadata with correlation id 4 : {ack003=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
49766 records sent, 9953.2 records/sec (9.72 MB/sec), 34.9 ms avg latency, 358.0 max latency.
50009 records sent, 10001.8 records/sec (9.77 MB/sec), 23.9 ms avg latency, 39.0 max latency.
50060 records sent, 10008.0 records/sec (9.77 MB/sec), 23.9 ms avg latency, 49.0 max latency.
49967 records sent, 9991.4 records/sec (9.76 MB/sec), 23.6 ms avg latency, 38.0 max latency.
50014 records sent, 10000.8 records/sec (9.77 MB/sec), 24.0 ms avg latency, 51.0 max latency.
50049 records sent, 10007.8 records/sec (9.77 MB/sec), 23.5 ms avg latency, 37.0 max latency.
49978 records sent, 9995.6 records/sec (9.76 MB/sec), 23.5 ms avg latency, 44.0 max latency.
49803 records sent, 9958.6 records/sec (9.73 MB/sec), 23.7 ms avg latency, 47.0 max latency.
50229 records sent, 10045.8 records/sec (9.81 MB/sec), 23.6 ms avg latency, 46.0 max latency.
49980 records sent, 9996.0 records/sec (9.76 MB/sec), 23.5 ms avg latency, 36.0 max latency.
50061 records sent, 10010.2 records/sec (9.78 MB/sec), 23.6 ms avg latency, 36.0 max latency.
49983 records sent, 9996.6 records/sec (9.76 MB/sec), 23.4 ms avg latency, 37.0 max latency.
49978 records sent, 9995.6 records/sec (9.76 MB/sec), 23.9 ms avg latency, 55.0 max latency.
50061 records sent, 10012.2 records/sec (9.78 MB/sec), 24.3 ms avg latency, 55.0 max latency.
49981 records sent, 9996.2 records/sec (9.76 MB/sec), 23.5 ms avg latency, 42.0 max latency.
49979 records sent, 9991.8 records/sec (9.76 MB/sec), 23.8 ms avg latency, 39.0 max latency.
50077 records sent, 10013.4 records/sec (9.78 MB/sec), 23.6 ms avg latency, 41.0 max latency.
49974 records sent, 9994.8 records/sec (9.76 MB/sec), 23.4 ms avg latency, 36.0 max latency.
50067 records sent, 10011.4 records/sec (9.78 MB/sec), 23.8 ms avg latency, 65.0 max latency.
49963 records sent, 9992.6 records/sec (9.76 MB/sec), 23.5 ms avg latency, 54.0 max latency.
1000000 records sent, 9997.300729 records/sec (9.76 MB/sec), 24.24 ms avg latency, 358.00 ms max latency, 23 ms 50th, 28 ms 95th, 39 ms 99th, 154 ms 99.9th.
real 1m40.808s
user 0m16.620s
sys 0m1.260s
更多...
吞吐量5k 1000000 records sent, 4999.275105 records/sec (4.88 MB/sec), 22.94 ms avg latency, 127.00 ms max latency, 23 ms 50th, 27 ms 95th, 31 ms 99th, 41 ms 99.9th.
吞吐量2w 1000000 records sent, 18990.827430 records/sec (18.55 MB/sec), 954.74 ms avg latency, 2657.00 ms max latency, 739 ms 50th, 2492 ms 95th, 2611 ms 99th, 2650 ms 99.9th.
吞吐量3w 1000000 records sent, 19125.212768 records/sec (18.68 MB/sec), 1527.07 ms avg latency, 3020.00 ms max latency, 1582 ms 50th, 2815 ms 95th, 2979 ms 99th, 3011 ms 99.9th.

12分區，2.6w吞吐量

[root@l-monitor-logstash2.pub.prod.aws.dm bin]# time ./kafka-producer-perf-test.sh --topic ack001 --num-records 1000000 --record-size 1024 --throughput 26000 --producer.config ../config/producer.properties
129256 records sent, 25840.9 records/sec (25.24 MB/sec), 31.9 ms avg latency, 123.0 max latency.
129794 records sent, 25953.6 records/sec (25.35 MB/sec), 28.6 ms avg latency, 73.0 max latency.
130152 records sent, 26025.2 records/sec (25.42 MB/sec), 28.3 ms avg latency, 64.0 max latency.
130278 records sent, 26045.2 records/sec (25.43 MB/sec), 28.1 ms avg latency, 55.0 max latency.
130106 records sent, 26010.8 records/sec (25.40 MB/sec), 27.9 ms avg latency, 45.0 max latency.
130080 records sent, 26005.6 records/sec (25.40 MB/sec), 27.7 ms avg latency, 41.0 max latency.
130093 records sent, 26013.4 records/sec (25.40 MB/sec), 74.5 ms avg latency, 343.0 max latency.
1000000 records sent, 25904.051394 records/sec (25.30 MB/sec), 38.33 ms avg latency, 343.00 ms max latency, 28 ms 50th, 122 ms 95th, 242 ms 99th, 321 ms 99.9th.
real 0m39.395s
user 0m12.204s
sys 0m1.616s

cpu與內存無任何變化。網絡rx/tx :170Mbps/120Mbps，磁盤IoUtil: 6%。1百萬數據能在2分鐘內完成。

壓測結論

影響提交效率的緣由主要有：partition數量 + 超時時長 + 消息大小 + 吞吐量

不作限制：ack=all的模式，不限制吞吐量，TPS可以保持在2w左右，平均耗時在1600ms左右，99.9%的記錄可以兩秒左右正常提交反饋，最大耗時有記錄超過5秒。
超時時長：當將超時時常設置爲5秒以上時，提交所有成功（ack)。將超時逐步下降到3秒左右，陸續會有大量超時出現。官方的默認值爲30秒，考慮到網絡環境的複雜性，建議將此參數設置成10秒，如還有超時，須要客戶端捕獲異常進行特殊處理。
消息大小：當將消息大小設置爲512byte，提交的TPS可以打到3w／秒；當增長到2k左右，TPS下降到9k／s，消息大小與TPS成線性關係。
流量：當限制吞吐量爲1.3w左右，減小競爭，效果最佳。平均耗時下降到24毫秒，最大延遲僅300多毫秒，服務水平至關高。
分區數量：增長分區數能顯著提升處理能力，但分區數會影響故障恢復時間。本測試用例僅針對6分區的狀況，測試證實，當分區數增長到12，處理能力幾乎增長一倍，但繼續增長，性能不會再有顯著提高。

最終結論：假定網絡狀態良好，在ack=all模式、超時10秒、重試3次、分區爲6的狀況下，可以承受1.3w/s的消息請求，其寫入平均耗時不超過30ms，最大耗時不超過500ms。想要增長TPS，能夠增長partition到12，可以達到2.6w/s的高效寫入。

堆積測試

kafka生產和消費理論上不受消息堆積影響，消息堆積只是佔用磁盤空間，這裏的消息堆積是指topic中的消息數，和消息是否消費無關

原創文章，轉載註明出處 (http://sayhiai.com)

結論

kafka採用基於時間的SLA(服務水平保證)，重要消息保存3天。

性能

基本配置：消息1k大小，ack=all，即全部副本都同步的狀況。爲確保消息可靠，所有采用3個副本。

3副本，1個partition的狀況：6k-8k
3副本，6個partition的狀況：1.3w-1.6w
3副本，12個partion的狀況：2.6w-2.8w

注意：生產端，考慮一種場景，單條發送，而後調用future.get()確認，TPS會急劇下降到2k如下，請確認確實須要這麼作，不然，使用異步提交，callback調用的方式。相對於ACK模式1.6w的TPS，普通模式提交，可以達到13w（主要是網絡和IO瓶頸，帶寬佔滿）。當吞吐量限制在1w左右而且開啓ACK（很是符合咱們的業務特徵），kafka是高效且高可用的，平均耗時僅24毫秒，生產者的最佳實踐是將超時設置成10秒，重試3次。消費者一樣是高效的，6個partition、ack模式，平均耗時在20毫秒左右，具體處理耗時取決於消費端的處理能力。

kafka消息可靠性

寫3個副本，開啓ack=all模式，每1秒刷一次磁盤。一條消息要經歷Client --> Leader →Replica這個過程。leader等待全部的replica的ack應答，而後ack給Client端，整個過程屢次確認；ack失敗的消息，會再次重試，此模式能保證數據不丟失。要想達到此種消息級別，請務必按照架構組提供的最佳實踐進行配置（kafka不一樣版本間參數相差不少）。
消息傳遞有三種模式，kafka同步發送是At least one模式（0.10版)。消費端，要作冪等處理。可能產生重複消息的場景爲：生產端發送了消息到leader節點，leader節點同步到全部follower節點並獲得確認，此時leader節點當機，未將ack返回給生產端，生產端此時會嘗試重發消息。而後follower節點中某臺機器提高爲leader，重複的數據由此產生。

擴容，故障的影響

單節點當機，短暫影響生產消費，故障恢復時間與leader選舉時間與partition數量有關（約10秒isr探測時間）。使用ACK模式，配合重試，可以保證故障期間數據不丟失。上圖的2位置。
擴容，等同於節點上線，不影響使用方。但節點到達可用狀態，與總體落後數據量相關（簡單的網絡拷貝過程）。根據經驗，部分消息拉取時間會變長，但影響不大。壓測過程無明顯抖動。建議消費端設置較長的超時來進行處理（包括異步處理狀況）。上圖的3位置。
=2節點當機（機房斷電等），服務不可用。故障恢復須要兩個節點達到同步狀態，與總體數據量相關。磁盤每秒fsync，極端狀況（所有當機），最多會丟失1秒數據。

原創文章，轉載註明出處 (http://sayhiai.com)

何時會丟數據

使用batch模式發送，緩衝區有數據時沒有優雅關閉，此時緩衝區中數據會丟失。上圖1位置。
使用batch模式消費，拉取消息後，異步使用線程池處理，若是線程池沒有優雅關閉，此時消費數據會丟失。上圖4位置。

原創文章，轉載註明出處 (http://sayhiai.com)

風險

壓測TPS僅做參考，實際運行中受網絡延遲，壞盤、高低峯流量等影響，服務會有抖動。生產和消費端務必將全部處理失敗的消息進行記錄，以便極端狀況下進行數據回放。
消息中請勿傳遞大塊沒必要要數據，消息大小對服務質量有直接線性影響。（請保持消息<2kb)
消費端消費，除考慮冪等，不正確的異步線程池使用（好比使用了無界隊列），常常形成消費端故障，請謹慎消費。
如分配了6個partition，若是你有7臺消費機器，其中有一臺會是空閒的。設計時請考慮kafka的限制。
默認kafka生產端開啓了batch提交模式，也就是說，若是此時你的生產者當了，buffer中的消息會丟。請確保：生產者使用"kill -15"殺進程以給服務flush的機會；同時，若是你的消息很重要，請同時寫入到日誌文件中。請權衡利弊再確認使用。

訂閱微信公衆號，小姐姐，教你玩架構～