Transactional Object Storage?

(blog.mbrt.dev)

93 points mbrt | 1 comments | 17 Nov 24 13:20 UTC | HN request time: 0.001s | source

Show context

Onavo ◴[18 Nov 24 06:13 UTC] No.42170159[source]▶

Congrats on reinventing the data lake? This is actually how most of the newer generations of "cloud native" databases work, where they separate compute and storage. The key is that they have a more sophisticated caching layer so that the latency cost of a query can be amortized across requests.

replies(2): >>42170647 #>>42238491 #

mbrt ◴[18 Nov 24 08:09 UTC] No.42170647[source]▶

>>42170159 #

It's my understanding that the newer generation of data lakes still make use of a tiny, strongly consistent metadata database to keep track of what is where. This is orders of magnitudes smaller than what you'd have by putting everything in the same database, but it's still there. This is also the case in newer data streaming platforms (e.g. https://www.warpstream.com/blog/kafka-is-dead-long-live-kafk...).

I'm curious to hear if you have examples of any database using only object storage as a backend, because back when I started, I couldn't fin any.

replies(3): >>42170771 #>>42239063 #>>42241578 #

1. vineyardmike ◴[26 Nov 24 00:45 UTC] No.42241578[source]▶

>>42170647 #

> if you have examples of any database using only object storage as a backend

I think DuckDB is very close to this. It's a bit different, because it's mostly for read-heavy workloads.

https://duckdb.org/docs/extensions/httpfs/s3api

(BTW great article, excellent read!)

↑