I didn't catch the Developer Voices episode, but it's on my listening list now!
At a low level, I'm guessing that we do many of the same things - batching writes, aggressively colocating and caching reads, leveraging multi-part uploads, and doing all the standard tail-at-scale stuff to manage S3 latency. We have been testing with Antithesis, and we reached out to Kyle Kingsbury.
Zoomed out a bit, a few differences with Warpstream jump out:
- Directionally, we want Bufstream to _understand_ the data flowing through it. We see so many Kafka teams struggling to manage data quality and effectively govern data usage, and we think they'd be better served by a queue that can do more than shuttle bytes around. Naturally, we come at that problem with a bias toward Protobuf.
- Bufstream runs fully isolated from Buf in its default configuration, and it doesn't rely on a proprietary metadata service.
- Bufstream supports transactions and exactly-once semantics [0]. We see these modern Kafka features used often, especially with Kafka Streams and Connect. Per their docs, Warpstream doesn't support them yet.
Disaggregating storage and compute is a well-trodden path for infrastructure in the cloud, and it's past time for Kafka to join the party. I'm excited to see what shakes out of the next few years of innovation in this space.
[0]: https://buf.build/docs/bufstream/kafka-compatibility/conform...