關於Kafka配額的討論(1)

時間 2019-12-04

標籤關於 kafka 配額討論欄目 Kafka 简体版

原文原文鏈接

Kafka自0.9.0.0版本引入了配額管理(quota management)，旨在broker端對clients發送請求進行限流(throttling)。目前Kafka支持兩大類配額管理：算法

網絡帶寬（network bandwidth）配額管理：定義帶寬閾值來限制請求發送速率，閾值單位是字節/秒，即bytes/s。該功能是0.9.0.0版本引入的
CPU配額管理：定義CPU使用率閾值來限制請求發送速率，閾值以百分比的形式給出，如quota = 50表示50%的CPU使用率上限。該功能是0.11.0.0版本引入的

本文主要討論網絡帶寬配額管理。關於CPU配額管理的部分咱們將在下一篇中進行討論。json

1、配額能作什麼？bootstrap

設置了基於帶寬的配額以後，Kafka可以作到：網絡

1. 限制follower副本拉取leader副本消息的速率spa

2. 限制producer的生產速率server

3. 限制consumer的消費速率進程

2、配額做用域ip

目前能夠在3個層級上設置配額：ci

1. client id作用域

2. user

3. user + client id

第一種是client id，即新版本clients端的通用參數client.id設置的值。一旦爲某個client id設置了配額，全部具備該client id的clients段程序都將受到該配額的限制。

第二種是user，爲啓用認證以後位於認證體系中的某個用戶主體（user principal），好比一個Kerberos用戶：user1/kafka.host1.com@REALM，Kafka解析出來的用戶名是'user1’。固然咱們能夠設置sasl.kerberos.principal.to.local.rules參數修改這種解析規則，不過這不在本文的討論範圍內。

第三種就是user + client id，其實是包含前兩種的一個二元組。它是最細粒度化的配額管理做用域。

固然，這3種做用域還能夠設置各自的默認值配額（默認是沒有配額的，即默認值一般是無窮大），包括：client id做用域默認值、user做用域默認值、user + client id做用域默認值，其中最後一項又可細分爲4個子做用域，即

user做用域默認值 + client id做用域指定值
user做用域指定值 + client id做用域指定值
user做用域默認值 + client id做用域默認值
user做用域指定值 + client id做用域默認值

所以，實際上總共有8種可能的配額做用域設置值，它們的優先級關係依次以下（從高到低）：

user做用域指定值 + client id做用域指定值（即爲user + client id設置了特定值配額）
user做用域指定值 + client id做用域默認值（爲user設置了特定值配額，爲client id設置了默認值配額）
user做用域（爲user設置了特定值配額）
user做用域默認值 + client id做用域指定值（爲user設置了默認值配額，爲client id設置了特定值配額）
user做用域默認值 + client id做用域默認值（爲user和client id設置了默認值配額）
user做用域默認值（爲user設置了默認值配額）
client id做用域（爲client id設置了特定值配額）
client id做用域默認值（爲client id設置了默認值配額）

當多條配額規則衝突時咱們能夠根據以上規則肯定應用的是哪一條。舉個例子，咱們爲user = 'good-user'的用戶配置了100MB/s的配額，同時爲[user='good-user', client id = 'producer-1']設置配額爲50MB/s，那麼當good-user用戶使用名爲‘producer-1’的producer發送消息時Kafka保證它的請求處理速率不會超過50MB/s，即第二條規則覆蓋了第一條規則。

3、如何設置？

咱們根據第一小節中提到的3種功能來分別討論。

3.1 限制follower副本拉取leader副本消息的速率

方法1：設置broker端動態參數leader.replication.throttled.rate和follower.replication.throttled.rate。

前者控制leader副本端處理FETCH請求的速率，後者控制follower副本發送FETCH請求的速率。既然是動態參數，說明它們的值能夠實時修改而無需重啓broker。假設我要爲broker 0和1設置follower和leader限速爲100MB/s，方法以下：

bin/kafka-configs.sh --zookeeper localhost:2181 --alter --add-config 'leader.replication.throttled.rate=104857600,follower.replication.throttled.rate=104857600' --entity-type brokers --entity-name 0
Completed Updating config for entity: brokers '0'.

bin/kafka-configs.sh --zookeeper localhost:2181 --alter --add-config 'leader.replication.throttled.rate=104857600,follower.replication.throttled.rate=104857600' --entity-type brokers --entity-name 1
Completed Updating config for entity: brokers '1'.

執行下列命令檢查下是否配置成功：

bin/kafka-configs.sh --zookeeper localhost:2181 --describe --entity-type brokers
Configs for brokers '0' are leader.replication.throttled.rate=104857600,follower.replication.throttled.rate=104857600
Configs for brokers '1' are leader.replication.throttled.rate=104857600,follower.replication.throttled.rate=104857600

同時，還必須爲topic設置leader.replication.throttled.replicas和follower.replication.throttled.replicas 。這組參數須要指定要限速的副本，若是想讓topic的全部副本都生效，可使用*通配符：

bin/kafka-configs.sh --zookeeper localhost:2181 --alter --add-config 'leader.replication.throttled.replicas=*,follower.replication.throttled.replicas=*' --entity-type topics --entity-name test

Completed Updating config for entity: topic 'test'.

方法2：執行分區重分配時設置

在使用kafka-reassign-partitions.sh(bat)腳本執行分區重分配時也能夠設定，方法以下（依然設置成100MB/s）：

bin/kafka-reassign-partitions.sh --zookeeper localhost:2181 --execute --reassignment-json-file to-be-reassigned.json --throttle 104857600

實際上，該腳本經過--throttle參數間接設置了leader.replication.throttled.rate和follower.replication.throttled.rate參數，故本質上和方法1是相同的。值得注意的，該腳本只會對參與到分區重分配的broker設置配額，對其餘broker是不起做用的。

3.2 限制producer端速率

方法1：設置broker端靜態參數quota.producer.default參數

好比：在server.properties中加入quota.producer.default=15728640將限制全部連入該broker的producer的TPS所有降到15MB/s如下。設置此參數的好處是可以限制集羣上的全部producer，但劣處也在於此，對全部producer「一視同仁」，沒法細粒度地對個別clients進行設置。故社區在0.11.0.0版本將其標記爲"Deprecated"，並始終推薦用戶使用動態參數的方式來爲producer端進行限速。

方法2：設置動態參數producer_byte_rate

首先演示爲全部client id設置默認值，假設咱們爲全部producer程序設置其TPS不超過20MB/s，即20971520B/s，命令以下：

bin/kafka-configs.sh --zookeeper localhost:2181 --alter --add-config 'producer_byte_rate=20971520' --entity-type clients --entity-default
Completed Updating config for entity: default client-id.

而後咱們爲client.id=‘producer-1'的producer單獨設置其TPS不超過10MB/s，即10485760B/s，命令以下：

bin/kafka-configs.sh --zookeeper localhost:2181 --alter --add-config 'producer_byte_rate=10485760' --entity-type clients --entity-name producer-1
Completed Updating config for entity: client-id 'producer-1'.

下面作個簡單的試驗驗證下，咱們啓動兩個client.id是producer-1和producer-2的producer程序去驗證它們的TPS小於設置的閾值：

bin/kafka-producer-perf-test.sh --topic t1 --throughput -1 --num-records 9000000 --record-size 500 --producer-props bootstrap.servers=localhost:9092 acks=-1 client.id=producer-2
......
9000000 records sent, 41632.936278 records/sec (19.85 MB/sec), 1563.41 ms avg latency, 6488.00 ms max latency, 912 ms 50th, 5576 ms 95th, 6169 ms 99th, 6474 ms 99.9th.

可見producer-2的TPS被限制在了20MB/s如下。接下來咱們試試producer-1（由於其閾值設置得小，故此次咱們少發一些消息以加速整個試驗進程）：

bin/kafka-producer-perf-test.sh --topic t1 --throughput -1 --num-records 3000000 --record-size 500 --producer-props bootstrap.servers=localhost:9092 acks=-1 client.id=producer-1
......
3000000 records sent, 20771.020273 records/sec (9.90 MB/sec), 3128.39 ms avg latency, 8960.00 ms max latency, 986 ms 50th, 8784 ms 95th, 8941 ms 99th, 8953 ms 99.9th.

爲user設置配額的方法與client id相似，設置全局默認值：

bin/kafka-configs.sh --zookeeper localhost:2181 --alter --add-config 'producer_byte_rate=20971520' --entity-type users --entity-default
Completed Updating config for entity: default user-principal.

爲特定用戶（user1)設置：

bin/kafka-configs.sh --zookeeper localhost:2181 --alter --add-config 'producer_byte_rate=20971520' --entity-type users --entity-name user1

Completed Updating config for entity: user-principal 'user1'.

最後是設置(user + client id)做用域設置配額。依然是全局默認值：

user1的client id默認配額

bin/kafka-configs.sh --zookeeper localhost:2181 --alter --add-config 'producer_byte_rate=20971520' --entity-type users --entity-name user1 --entity-type clients --entity-default
Completed Updating config for entity: user-principal 'user1', default client-id.

producer-1的user默認配額

bin/kafka-configs.sh --zookeeper localhost:2181 --alter --add-config 'producer_byte_rate=20971520' --entity-type users --entity-default --entity-type clients --entity-name producer-1
Completed Updating config for entity: default user-principal, client-id 'producer-1'.

而後是特定值：

user1 + producer-1的配額值

bin/kafka-configs.sh --zookeeper localhost:2181 --alter --add-config 'producer_byte_rate=20971520' --entity-type users --entity-name user1 --entity-type clients --entity-name producer-1
Completed Updating config for entity: user-principal 'user1', client-id 'producer-1'.

3.3 限制consumer端速率

和producer相似，也存在兩種方法：

方法1：設置broker端靜態參數quota.consumer.default參數

與quota.producer.default徹底相同的參數，只不過是適用於consumer端的，再也不贅述。

方法2：設置動態參數consumer_byte_rate

與producer_byte_rate徹底對等的參數，只是適用於consumer端的，再也不贅述

4、配額算法

簡單來講，咱們假設當前實際值是O，T是咱們設置的閾值，而W表示某一段時間範圍，咱們但願在W時間內O可以降低到T如下（若是O原本就比T小，則什麼都不用作），那麼broker端就須要延緩等待一段時間。若是假設這段時間是X，那麼如下等式成立：

O * W = (W + X) * T

由此得出X = (O - T) / T * W。這就是Kafka用於計算等待時間的公式。固然在具體實現時，Kafka提供了兩個參數來共同計算W：W = quota.window.num * quota.window.size.seconds。前者表示取樣的時間窗口個數，後者表示時間窗口大小。特別是後者會在CPU配額管理中用到。不過在本文中，咱們能夠統一使用W便可。當Kafka檢測到配額透支狀況發生時，broker不會返回錯誤而是直接將超支配額的客戶端進行減速處理。它會計算須要X而後令client強制sleep直至令其降到配額之下。該方法對於client來講徹底透明。同時，client也不須要本身實現任何特殊的策略來應對。事實上，有的client在應對這種狀況時會不停地重試反而加重了本要解決的問題。

5、可能的問題

如前所述，限速是在broker端執行的！broker端故意「sleep」來限速的作法雖然對clients端透明，但確實也會引發clients端請求的超時，故在實際使用過程當中適當地增長request.timeout.ms對於啓用了限速的Kafka環境而言就顯得很是必要了。

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。