←back to thread

224 points mlissner | 1 comments | | HN request time: 0.242s | source
Show context
jdnier ◴[] No.45776183[source]
Yesterday there was a somewhat similar DuckDB post, "Frozen DuckLakes for Multi-User, Serverless Data Access". https://news.ycombinator.com/item?id=45702831
replies(2): >>45776288 #>>45777908 #
1. pacbard ◴[] No.45777908[source]
I set up something similar at work. But it was before the DuckLake format was available, so it just uses manually generated Parquet files saved to a bucket and a light DuckDB catalog that uses views to expose the parquet files. This lets us update the Parquet files using our ETL process and just refresh the catalog when there is a schema change.

We didn't find the frozen DuckLake setup useful for our use case. Mostly because the frozen catalog kind of doesn't make sense with the DuckLake philosophy and the cost-benefit wasn't there over a regular duckdb catalog. It also made making updates cumbersome because you need to pull the DuckLake catalog, commit the changes, and re-upload the catalog (instead of just directly updating the Parquet files). I get that we are missing the time travel part of the DuckLake, but that's not critical for us and if it becomes important, we would just roll out a PostgreSQL database to manage the catalog.