6 Kafka Streams Joins

Joining means taking KStream and / or KTable and creating a new KStream or KTable from ithtml

There are 4 kind of joins (SQL-like)api

https://docs.confluent.io/current/streams/developer-guide/dsl-api.html#joiningapp

 

Joins Constraints - Co-Partition of Data

This can only be happened when the data is co-partitioned. Otherwise the join won't be doable and Kafka streams will fail with a Runtime Error.ide

co-partition means that the same number of partition is there on the stream and / or tableoop

For example we have two topics, test1 and test2, both of them must have the same number of Partitions.ui

If not, Kafka Streams will not able to join them using join operation. The only way is, that you write one of this data into another one, which has the same number of partitions.this

GlobalKTable3d

If your KTable data is reasonably small, and can fit on each of your Kafka Streams application, you can read it is as GlobalKTable.htm

With GlobalKTables, you can join any stream to your table even if the data doesn't have the same number of partition. -> it is like distributed cache in classical Hadoop MapReduce. Bblog

Different Types of Joins

There are 3 Kafka objects, which could be joint:

  • KStream - KStream
  • KTable - KTable
  • KStream - KTable

Inner Join

Join the data only if it has matches in both of data

 

Left Join

Join all the data from the left whether or not it has a match on the right

 

Outer Join

  • Only available for KStream / KStream joins
  • It's a left join combined with a right join
  • From the API doc, it looks like this:

相關文章
相關標籤/搜索