I'm curious to hear if you have examples of any database using only object storage as a backend, because back when I started, I couldn't fin any.
https://docs.datomic.com/operation/architecture.html
(However they cheat with dynamo lol)
There's also some listed here
https://davidgomes.com/separation-of-storage-and-compute-and...
And as you mention, Datomic uses DynamoDB as well (so, not a pure s3 solution). What I'm proposing is to only use object storage for everything, pay the price in latency, but don't give up on throughput, cost and consistency. The differentiator is that this comes with strict serializability guarantees, so this is not an eventually consistent system (https://jepsen.io/consistency/models/strong-serializable).
No matter how sophisticated the caching is, if you want to retain strict serializability, writes must be confirmed by s3 and reads must validate in s3 before returning, which puts a lower bound on latency.
I focused a lot on throughput, which is the one we can really optimize.
Hopefully that's clear from the blog, though.
Basically an in-memory database which uses S3 as cold storage. Definitely an interesting approach, but no transactions AFAICT.
Take a look at Delta Lake
https://notes.eatonphil.com/2024-09-29-build-a-serverless-ac...
I think DuckDB is very close to this. It's a bit different, because it's mostly for read-heavy workloads.
https://duckdb.org/docs/extensions/httpfs/s3api
(BTW great article, excellent read!)
> In Databricks service deployments, we use a separate lightweight coordination service to ensure that only one client can add a record with each log ID.
The key difference is that Delta Lake implements MVCC and relies on total ordering of transaction IDs. Something I didn't want to do to avoid forced synchronization points (multiple clients need to fight for IDs). This is certainly a trade-off, because in my case you are forced to read the latest version or retry (but then you get strict serializability), while in Delta Lake you can rely on snapshot isolation, which might give you slightly stale, but consistent data and minimize retries on reads.
It also seems that you can't get transactions across different tables? Another interesting tradeoff.