Use DuckDB-WASM to query TB of data in browser

(lil.law.harvard.edu)

224 points mlissner | 3 comments | 31 Oct 25 17:37 UTC | HN request time: 0s | source

Show context

wewewedxfgdf ◴[31 Oct 25 19:33 UTC] No.45775817[source]▶

>>45774571 (OP) #

I tried DuckDB - liked it a lot - was ready to go further.

But found it to be a real hassle to help it understand the right number of threads and the amount of memory to use.

This led to lots of crashes. If you look at the projects github issues you will see many OOM out of memory errors.

And then there was some indexed bug that crashed seemingly unrelated to memory.

Life is too short for crashy database software so I reluctantly dropped it. I was disappointed because it was exactly what I was looking for.

replies(4): >>45776001 #>>45776020 #>>45776900 #>>45777350 #

1. thenaturalist ◴[31 Oct 25 22:25 UTC] No.45777350[source]▶

>>45775817 #

How long ago was this, or can you share more context about data and mem size you experienced this with?

DuckDB has introduced spilling to disk and some other tweaks since a good year now: https://duckdb.org/2024/07/09/memory-management

replies(1): >>45777572 #

2. wewewedxfgdf ◴[31 Oct 25 22:49 UTC] No.45777572[source]▶

>>45777350 (TP) #

3 days ago.

The final straw was an index which generated fine on MacOS and failed on Linux - exact same code.

Machine had plenty of RAM.

The thing is, it is really the responsibility of the application to regulate its behavior based on available memory. Crashing out just should not be an option but that's the way DuckDB is built.

replies(1): >>45783704 #

3. alex-korr ◴[01 Nov 25 17:51 UTC] No.45783704[source]▶

>>45777572 #

I had the same experience - everything runs great on an AWS Linux EC2 with 32GB of memory, same workload in a docker on ECS with 32GB allocated gets an OOM. But for smaller workloads, DuckDB is fantastic... however, there's a certain point when Spark or Snowflake start to make more sense.

↑