Iceberg, the Right Idea – The Wrong Spec

We just finished implementing Iceberg on top of a large set of Parquet files, stored in S3. It’s a neat idea to be able to turn a lot of data files into a SQL database, but I absolutely understand the pain and confusion the author writes, especially around how it handles metadata. It creates a lot of those files and makes a large mess of the directory. Some queries that I know would return a single parquet file take up to 30 seconds.

I don’t think we’ll scrap it and there are certainly ways to speed up the problematic aspects of querying the catalog, but I’m also rooting for DuckLake to make it a lot more approachable by not completely shying away from the database as an idea.