They don't have the full suite of GCS's capabilities (https://cloud.google.com/storage/docs/request-preconditions#...) but it's something.
I'm curious to hear if you have examples of any database using only object storage as a backend, because back when I started, I couldn't fin any.
My approach on S3 would be to ensure to modify the ETag of an object whenever other transactions looking at it must be blocked. This makes it easier to use conditional reads (https://docs.aws.amazon.com/AmazonS3/latest/userguide/condit...) on COPY or GET operations.
For write, I would use PUT on a temporary staging area and then conditional COPY + DELETE afterward. This is certainly slower than GCS, but I think it should work.
Locking without modifying the object is the part that needs some optimization though.
https://docs.datomic.com/operation/architecture.html
(However they cheat with dynamo lol)
There's also some listed here
https://davidgomes.com/separation-of-storage-and-compute-and...
And as you mention, Datomic uses DynamoDB as well (so, not a pure s3 solution). What I'm proposing is to only use object storage for everything, pay the price in latency, but don't give up on throughput, cost and consistency. The differentiator is that this comes with strict serializability guarantees, so this is not an eventually consistent system (https://jepsen.io/consistency/models/strong-serializable).
No matter how sophisticated the caching is, if you want to retain strict serializability, writes must be confirmed by s3 and reads must validate in s3 before returning, which puts a lower bound on latency.
I focused a lot on throughput, which is the one we can really optimize.
Hopefully that's clear from the blog, though.
Basically an in-memory database which uses S3 as cold storage. Definitely an interesting approach, but no transactions AFAICT.
Take a look at Delta Lake
https://notes.eatonphil.com/2024-09-29-build-a-serverless-ac...
* https://rockset.com/blog/separate-compute-storage-rocksdb/
* https://github.com/rockset/rocksdb-cloud
Keep in mind Rockset is definitely a bit biased towards vector search use cases.BTW, the comparison was only to give an idea about isolation levels, it wasn't meant to be a feature-to-feature comparison.
Perhaps I didn't make it prominent enough, but at some point I say that many SQL databases have key-value stores at their core, and implement a SQL layer on top (e.g. https://www.cockroachlabs.com/docs/v22.1/architecture/overvi...).
Basically SQL can be a feature added later to a solid KV store as a base.
Nicely detailed here https://simonwillison.net/2024/Oct/13/zero-latency-sqlite-st... And https://developers.cloudflare.com/durable-objects/best-pract...
GlassDB is much more accessible for smaller volume workloads, but gets very costly for high volume because of requests to S3 per-transaction. In-turn the consistency model is easier to reason about because the system is entirely stateless.
I think DuckDB is very close to this. It's a bit different, because it's mostly for read-heavy workloads.
https://duckdb.org/docs/extensions/httpfs/s3api
(BTW great article, excellent read!)
However it will be much simpler with the new conditional writes
But this is entirely possible. You can wrap GlassDB transactions and encode multiple keys into the same object at a higher level. Transactions across different objects will still preserve the same isolation.
The current version is meant to be a base from which to build higer level APIs, somewhat like FoundationDB.
Is it the cheapest possible storage in existence? No, if you take raw disks and put them in a rack, but I also feel it wouldn't be an entirely fair comparison.
The flipside is that Cloudflare DO will be a lot faster.
Interesting that all these similar solutions are popping out now.
I think it would be interesting to combine a SQLite per-object approach with transactions on top of different objects.
When i moved from S3 to DO, my bill went from hundreds to $20/mo. The only thing that changed was the hosting provider.
> In Databricks service deployments, we use a separate lightweight coordination service to ensure that only one client can add a record with each log ID.
The key difference is that Delta Lake implements MVCC and relies on total ordering of transaction IDs. Something I didn't want to do to avoid forced synchronization points (multiple clients need to fight for IDs). This is certainly a trade-off, because in my case you are forced to read the latest version or retry (but then you get strict serializability), while in Delta Lake you can rely on snapshot isolation, which might give you slightly stale, but consistent data and minimize retries on reads.
It also seems that you can't get transactions across different tables? Another interesting tradeoff.