←back to thread

FireDucks: Pandas but Faster

(hwisnu.bearblog.dev)
397 points sebg | 3 comments | | HN request time: 0.639s | source
1. __mharrison__ ◴[] No.42197729[source]
Lots of Pandas hate in this thread. However, for folks with lots of lines of Pandas in production, Fireducks can be a lifesaver.

I've had the chance to play with it on some of my code it queries than ran in 8+ minutes come down to 20 seconds.

Re-writing in Polars involves more code changes.

However, with Pandas 2.2+ and arrow, you can use .pipe to move data to Polars, run the slow computation there, and then zero copy back to Pandas. Like so...

    (df
     # slow part
     .groupby(...)
     .agg(...)
    )
to:

    def polars_agg(df):
      return (pl.from_pandas(df)
        .group_by(...)
        .agg(...)
        .to_pandas()
      )

    (df
      .pipe(polars_agg)
    )
replies(1): >>42205439 #
2. dr_kiszonka ◴[] No.42205439[source]
Very effective!
replies(1): >>42210418 #
3. __mharrison__ ◴[] No.42210418[source]
I'm hoping there's a double meaning in that. ;)