(hwisnu.bearblog.dev)

398 points sebg | 2 comments | 14 Nov 24 11:48 UTC | HN request time: 0.647s | source

Show context

imranq ◴[20 Nov 24 12:37 UTC] No.42193396[source]▶

>>42135303 (OP) #

This presentation does a good job distilling why FireDucks is so fast:

https://fireducks-dev.github.io/files/20241003_PyConZA.pdf

The main reasons are

* multithreading

* rewriting base pandas functions like dropna in c++

* in-built compiler to remove unused code

Pretty impressive especially given you import fireducks.pandas as pd instead of import pandas as pd, and you are good to go

However I think if you are using a pandas function that wasn't rewritten, you might not see the speedups

replies(1): >>42193761 #

faizshah ◴[20 Nov 24 13:28 UTC] No.42193761[source]▶

>>42193396 #

It’s not clear to me why this would be faster than polars, duckdb, vaex or clickhouse. They seem to be taking the same approach of multithreading, optimizing the plan, using arrow, optimizing the core functions like group by.

replies(2): >>42193939 #>>42195630 #

1. mettamage ◴[20 Nov 24 13:53 UTC] No.42193939[source]▶

>>42193761 #

Maybe it isn’t? Maybe they just want a fast pandas api?

replies(1): >>42195755 #

2. geysersam ◴[20 Nov 24 16:48 UTC] No.42195755[source]▶

>>42193939 (TP) #

According to their benchmarks they are faster. Not by a lot, but still significantly.

↑

FireDucks: Pandas but Faster