←back to thread

439 points david927 | 1 comments | | HN request time: 0.201s | source

What are you working on? Any new ideas which you're thinking about?
Show context
slau ◴[] No.44416757[source]
A Parquet file compactor. I have a client whose data lakes are partitioned by date, and obviously they end up with thousands of files all containing single/dozens/thousands of rows.

I’d estimate 30-40% of their S3 bill could be eliminated just by properly compacting and sorting the data. I took it as an opportunity to learn DuckDB, and decided to build a tool that does this. I’ll release it tomorrow or Tuesday as FOSS.

replies(3): >>44416777 #>>44418412 #>>44434008 #
1. ◴[] No.44416777[source]