←back to thread

FireDucks: Pandas but Faster

(hwisnu.bearblog.dev)
374 points sebg | 1 comments | | HN request time: 0.001s | source
Show context
ssivark ◴[] No.42194215[source]
Setting aside complaints about the Pandas API, it's frustrating that we might see the community of a popular "standard" tool fragment into two or even three ecosystems (for libraries with slightly incompatible APIs) -- seemingly all with the value proposition of "making it faster". Based on the machine learning experience over the last decade, this kind of churn in tooling is somewhat exhausting.

I wonder how much of this is fundamental to the common approach of writing libraries in Python with the processing-heavy parts delegated to C/C++ -- that the expressive parts cannot be fast and the fast parts cannot be expressive. Also, whether Rust (for polars, and other newer generation of libraries) changes this tradeoff substantially enough.

replies(2): >>42194643 #>>42194990 #
1. tgtweak ◴[] No.42194643[source]
I think it's a natural path of software life that compatibility often stands in the way of improving the API.

This really does seem like a rare thing that everything speeds up without breaking compatability. If you want a fast revised API for your new project (or to rework your existing one) then you have a solution for that with Polars. If you just want your existing code/workloads to work faster, you have a solution for that now.

It's OK to have a slow, compatible, static codebase to build things on then optimize as-needed.

Trying to "fix" the api would break a ton of existing code, including existing plugins. Orphaning those projects and codebases would be the wrong move, those things take a decade to flesh out.

This really doesn't seem like the worst outcome, and doesn't seem to be creating a huge fragmented mess.