> By writing data directly to S3-compatible object storage, Bufstream can dispense with expensive disks and nearly eliminate inter-zone networking, slashing costs dramatically.
I am curious what the situation looks like for self-cloud-ers. If you own the servers, its less clear how much advantage there would be using s3 object store versus disk attached storage services. But reciprocally, getting good at Ceph and Ceph Object Gateway (their s3 compat) then being able to use and tune that storage knowledge at the platform level makes sense, versus having a separate storage service for x, y, and z.
Still, I think there is huge potential for something like Pulsar with BookKeeper tablets of data to rise. Our object stores don't seem to be excellent at data-locality, at replicating, and being able to tap that could yield some incredible systems efficiencies & speeds that object storage has abstracted away from us, that object storage has to brute force.
I'm curious what architecture flourishes there are with your latency tuning. Is this still object store based, or something different?
> For particularly latency-sensitive or latency-tolerant workloads, operators can tune how aggressively Bufstream trades latency for cost.
The upcoming plan to keep an Iceberg store materialized, available for querying sounds so so cool. Nice. You have my attention & interest!!
> In the coming months, we'll also allow Bufstream operators to opt into storing some topics as Apache Iceberg tables. Kafka consumers will still be able to read from the topic, but SQL engines will also be able to query the data directly.
It'll be neat to see how this differs versus the connector based architecture. Whether maintenance or latency or efficiency are the major winners of this would make an excellent deep dive.