←back to thread

439 points david927 | 4 comments | | HN request time: 0.579s | source

What are you working on? Any new ideas which you're thinking about?
1. slau ◴[] No.44416757[source]
A Parquet file compactor. I have a client whose data lakes are partitioned by date, and obviously they end up with thousands of files all containing single/dozens/thousands of rows.

I’d estimate 30-40% of their S3 bill could be eliminated just by properly compacting and sorting the data. I took it as an opportunity to learn DuckDB, and decided to build a tool that does this. I’ll release it tomorrow or Tuesday as FOSS.

replies(3): >>44416777 #>>44418412 #>>44434008 #
2. ◴[] No.44416777[source]
3. zX41ZdbW ◴[] No.44418412[source]
Load the data into MergeTree instead? https://clickhouse.com/docs/engines/table-engines/mergetree-...
4. slau ◴[] No.44434008[source]
Published here: https://codeberg.org/unticks/comparqter