Lots of Pandas hate in this thread. However, for folks with lots of lines of Pandas in production, Fireducks can be a lifesaver.
I've had the chance to play with it on some of my code it queries than ran in 8+ minutes come down to 20 seconds.
Re-writing in Polars involves more code changes.
However, with Pandas 2.2+ and arrow, you can use .pipe to move data to Polars, run the slow computation there, and then zero copy back to Pandas. Like so...
(df
# slow part
.groupby(...)
.agg(...)
)
to: def polars_agg(df):
return (pl.from_pandas(df)
.group_by(...)
.agg(...)
.to_pandas()
)
(df
.pipe(polars_agg)
)
replies(1):