寫給大忙人看的Flink 消費 Kafka 已經對 Flink 消費 kafka 進行了源碼級別的講解。但是有一點沒有說的很明白那就是 offset 是怎麼存儲到狀態中的?java
Kafka Offset 是如何存儲在 state 中的
在 寫給大忙人看的Flink 消費 Kafka 的基礎上繼續往下說。web
// get the records for each topic partition // 咱們知道 partitionDiscoverer.discoverPartitions 已經保證了 subscribedPartitionStates 僅僅包含該 task 的 KafkaTopicPartition for (KafkaTopicPartitionState<TopicPartition> partition : subscribedPartitionStates()) { //僅僅取出屬於該 task 的數據 List<ConsumerRecord<byte[], byte[]>> partitionRecords = records.records(partition.getKafkaPartitionHandle()); for (ConsumerRecord<byte[], byte[]> record : partitionRecords) { //傳進來的 deserializer. 即自定義 deserializationSchema final T value = deserializer.deserialize(record); //當咱們自定義 deserializationSchema isEndOfStream 設置爲 true 的時候,整個流程序就停掉了 if (deserializer.isEndOfStream(value)) { // end of stream signaled running = false; break; } // emit the actual record. this also updates offset state atomically // and deals with timestamps and watermark generation emitRecord(value, partition, record.offset(), record); } }
其中 subscribedPartitionStates 方法其實是獲取屬性 subscribedPartitionStates。
繼續往下追蹤,一直到app
protected void emitRecordWithTimestamp( T record, KafkaTopicPartitionState<KPH> partitionState, long offset, long timestamp) throws Exception { if (record != null) { // 沒有 watermarks if (timestampWatermarkMode == NO_TIMESTAMPS_WATERMARKS) { // fast path logic, in case there are no watermarks generated in the fetcher // emit the record, using the checkpoint lock to guarantee // atomicity of record emission and offset state update synchronized (checkpointLock) { sourceContext.collectWithTimestamp(record, timestamp); // 設置 state 中的 offset( 實際上設置 subscribedPartitionStates 而當 snapshotState 時,獲取 subscribedPartitionStates 中的值進行 snapshotState) partitionState.setOffset(offset); } } else if (timestampWatermarkMode == PERIODIC_WATERMARKS) { emitRecordWithTimestampAndPeriodicWatermark(record, partitionState, offset, timestamp); } else { emitRecordWithTimestampAndPunctuatedWatermark(record, partitionState, offset, timestamp); } } else { // if the record is null, simply just update the offset state for partition synchronized (checkpointLock) { partitionState.setOffset(offset); } } }
當 sourceContext 發送完這條消息的時候,才設置 offset 到 subscribedPartitionStates 中。async
而當 FlinkKafkaConsumer 作 Snapshot 時,會從 fetcher 中獲取 subscribedPartitionStates。svg
//從 fetcher subscribedPartitionStates 中獲取相應的值 HashMap<KafkaTopicPartition, Long> currentOffsets = fetcher.snapshotCurrentState(); if (offsetCommitMode == OffsetCommitMode.ON_CHECKPOINTS) { // the map cannot be asynchronously updated, because only one checkpoint call can happen // on this function at a time: either snapshotState() or notifyCheckpointComplete() pendingOffsetsToCommit.put(context.getCheckpointId(), currentOffsets); } for (Map.Entry<KafkaTopicPartition, Long> kafkaTopicPartitionLongEntry : currentOffsets.entrySet()) { unionOffsetStates.add( Tuple2.of(kafkaTopicPartitionLongEntry.getKey(), kafkaTopicPartitionLongEntry.getValue())); }
至此進行 checkpoint 時,相應的 offset 就存入了 state。fetch
本文同步分享在 博客「shengjk1」(CSDN)。
若有侵權,請聯繫 support@oschina.cn 刪除。
本文參與「OSC源創計劃」,歡迎正在閱讀的你也加入,一塊兒分享。this