FireDucks: Pandas but Faster

(hwisnu.bearblog.dev)

398 points sebg | 2 comments | 14 Nov 24 11:48 UTC | HN request time: 0.404s | source

Show context

rich_sasha ◴[20 Nov 24 11:56 UTC] No.42193043[source]▶

It's a bit sad for me. I find the biggest issue for me with pandas is the API, not the speed.

So many foot guns, poorly thought through functions, 10s of keyword arguments instead of good abstractions, 1d and 2d structures being totally different objects (and no higher-order structures). I'd take 50% of the speed for a better API.

I looked at Polars, which looks neat, but seems made for a different purpose (data pipelines rather than building models semi-interactively).

To be clear, this library might be great, it's just a shame for me that there seems no effort to make a Pandas-like thing with better API. Maybe time to roll up my sleeves...

replies(22): >>42193093 #>>42193139 #>>42193143 #>>42193309 #>>42193374 #>>42193380 #>>42193693 #>>42193936 #>>42194067 #>>42194113 #>>42194302 #>>42194361 #>>42194490 #>>42194544 #>>42194670 #>>42195628 #>>42196720 #>>42197192 #>>42197489 #>>42198158 #>>42199832 #>>42200060 #

stared ◴[20 Nov 24 15:03 UTC] No.42194490[source]▶

>>42193043 #

Yes, every time I write df[df.sth = val], a tiny part of me dies.

For a comparison, dplyr offers a lot of elegant functionality, and the functional approach in Pandas often feels like an afterthought. If R is cleaner than Python, it tells a lot (as a side note: the same story for ggplot2 and matplotlib).

Another surprise for friends coming from non-Python backgrounds is the lack of column-level type enforcement. You write df.loc[:, "col1"] and hope it works, with all checks happening at runtime. It would be amazing if Pandas integrated something like Pydantic out of the box.

I still remember when Pandas first came out—it was fantastic to have a tool that replaced hand-rolled data structures using NumPy arrays and column metadata. But that was quite a while ago, and the ecosystem has evolved rapidly since then, including Python’s gradual shift toward type checking.

replies(3): >>42195076 #>>42197375 #>>42202116 #

bdjsiqoocwk ◴[21 Nov 24 08:02 UTC] No.42202116[source]▶

>>42194490 #

Nonsense, if you understand why df[df.sh ==val] you'll see it's great. If you don't, you can also do df.query("sh == val").

replies(1): >>42214311 #

1. stared ◴[22 Nov 24 14:57 UTC] No.42214311[source]▶

>>42202116 #

If you type df[df2.sh == val] you will understand why it is not great.

replies(1): >>42218233 #

2. bdjsiqoocwk ◴[22 Nov 24 23:33 UTC] No.42218233[source]▶

>>42214311 (TP) #

That might or might not make sense depending on what df1 and df2 contain.

But what are you saying, that typing wrong things you might get wrong results? Yes, coding is like that.

What's your point, make a point.

↑