←back to thread

FireDucks: Pandas but Faster

(hwisnu.bearblog.dev)
374 points sebg | 5 comments | | HN request time: 0.833s | source
Show context
rich_sasha ◴[] No.42193043[source]
It's a bit sad for me. I find the biggest issue for me with pandas is the API, not the speed.

So many foot guns, poorly thought through functions, 10s of keyword arguments instead of good abstractions, 1d and 2d structures being totally different objects (and no higher-order structures). I'd take 50% of the speed for a better API.

I looked at Polars, which looks neat, but seems made for a different purpose (data pipelines rather than building models semi-interactively).

To be clear, this library might be great, it's just a shame for me that there seems no effort to make a Pandas-like thing with better API. Maybe time to roll up my sleeves...

replies(22): >>42193093 #>>42193139 #>>42193143 #>>42193309 #>>42193374 #>>42193380 #>>42193693 #>>42193936 #>>42194067 #>>42194113 #>>42194302 #>>42194361 #>>42194490 #>>42194544 #>>42194670 #>>42195628 #>>42196720 #>>42197192 #>>42197489 #>>42198158 #>>42199832 #>>42200060 #
1. amelius ◴[] No.42193374[source]
Yes. Pandas turns 10x developers into .1x developers.
replies(1): >>42193785 #
2. berkes ◴[] No.42193785[source]
It does to me. Well, a 1x developer into a .01x dev in my case.

My conclusion was that pandas is not for developers. But for one-offs by managers, data-scientists, scientists, and so on. And maybe for "hackers" who cludge together stuff 'till it works and then hopefully never touch it.

Which made me realize such thoughts can come over as smug, patronizing or belittling. But they do show how software can be optimized for different use-cases.

The danger then lies into not recognizing these use-cases when you pull in smth like pandas. "Maybe using panda's to map and reduce the CSVs that our users upload to insert batches isn't a good idea at all".

This is often worsened by the tools/platforms/lib devs or communities not advertising these sweet spots and limitations. Not in the case of Pandas though: that's really clear about this not being a lib or framework for devs, but a tool(kit) to do data analysis with. Kudo's for that.

replies(2): >>42194002 #>>42201883 #
3. analog31 ◴[] No.42194002[source]
I'm one of those people myself, and have whittled my Pandas use down to displaying pretty tables in Jupyter. Everything else I do in straight Numpy.
replies(1): >>42196508 #
4. theLiminator ◴[] No.42196508{3}[source]
Imo numpy is not better than pandas for the things you'd use pandas for, though polars is far superior.
5. fastasucan ◴[] No.42201883[source]
>My conclusion was that pandas is not for developers. But for one-offs by managers, data-scientists, scientists, and so on. And maybe for "hackers" who cludge together stuff 'till it works and then hopefully never touch it.

It doesn't work for me so it can't work for anyone?