←back to thread

FireDucks: Pandas but Faster

(hwisnu.bearblog.dev)
397 points sebg | 1 comments | | HN request time: 0.213s | source
Show context
__mharrison__ ◴[] No.42197676[source]
Many of the complaints about Pandas here (and around the internet) are about the weird API. However, if you follow a few best practices, you never run into the issue folks are complaining about.

I wrote a nice article about chaining for Ponder. (Sadly, it looks like the Snowflake acquisition has removed that. My book, Effective Pandas 2, goes deep into my best practices.)

replies(1): >>42197792 #
otsaloma ◴[] No.42197792[source]
I don't quite agree, but if this was true, what would you tell a junior colleague in a code review? You can't use this function/argument/convention/etc you found in the official API documentation because...I don't like it? I think any team-maintained Pandas codebase will unavoidably drift into the inconsistent and bad. If you're always working alone, then it can of course be a bit better.
replies(1): >>42197912 #
1. __mharrison__ ◴[] No.42197912[source]
I have strong opinions about Pandas. I've used it since it came out and have coalesced on patterns that make it easy to use.

(Disclaimer: I'm a corporate trainer and feed my family teaching folks how to work with their data using Pandas.)

When I teach about "readable" code, I caveat that it should be "readable for a specific audience". I hold that if you are a professional, that audience is other professionals. You should write code for professionals and not for newbies. Newbies should be trained up to write professional code. YMMV, but that is my bias based on experience seeing this work at some of the biggest companies in the world.