←back to thread

119 points tosh | 1 comments | | HN request time: 0s | source
Show context
ikesau ◴[] No.42157797[source]
> This means your optimizations need to be applied by hand, which is sustainable if your data starts changing.

Seems like a missing "un" here

Compelling article! I've already found DuckDB to be the most ergonomic tool for quick and dirty wrangling, it's good to know it can handle massive jobs too.

replies(1): >>42159012 #
1. Nihilartikel ◴[] No.42159012[source]
I regularly use duckdb on datasets of 1B+ rows, with nasty strong columns that may be over 10MB per value in the outliers. Mostly it just works, and fast too! When it doesn't, I'll usually just dump to parquet and hit it with sparksql, but that is the exception rather than the rule.