←back to thread

93 points mbrt | 1 comments | | HN request time: 0.001s | source
Show context
Onavo ◴[] No.42170159[source]
Congrats on reinventing the data lake? This is actually how most of the newer generations of "cloud native" databases work, where they separate compute and storage. The key is that they have a more sophisticated caching layer so that the latency cost of a query can be amortized across requests.
replies(2): >>42170647 #>>42238491 #
mbrt ◴[] No.42170647[source]
It's my understanding that the newer generation of data lakes still make use of a tiny, strongly consistent metadata database to keep track of what is where. This is orders of magnitudes smaller than what you'd have by putting everything in the same database, but it's still there. This is also the case in newer data streaming platforms (e.g. https://www.warpstream.com/blog/kafka-is-dead-long-live-kafk...).

I'm curious to hear if you have examples of any database using only object storage as a backend, because back when I started, I couldn't fin any.

replies(3): >>42170771 #>>42239063 #>>42241578 #
1. vineyardmike ◴[] No.42241578[source]
> if you have examples of any database using only object storage as a backend

I think DuckDB is very close to this. It's a bit different, because it's mostly for read-heavy workloads.

https://duckdb.org/docs/extensions/httpfs/s3api

(BTW great article, excellent read!)