(hwisnu.bearblog.dev)

398 points sebg | 1 comments | 14 Nov 24 11:48 UTC | HN request time: 0.204s | source

Show context

rich_sasha ◴[20 Nov 24 11:56 UTC] No.42193043[source]▶

It's a bit sad for me. I find the biggest issue for me with pandas is the API, not the speed.

So many foot guns, poorly thought through functions, 10s of keyword arguments instead of good abstractions, 1d and 2d structures being totally different objects (and no higher-order structures). I'd take 50% of the speed for a better API.

I looked at Polars, which looks neat, but seems made for a different purpose (data pipelines rather than building models semi-interactively).

To be clear, this library might be great, it's just a shame for me that there seems no effort to make a Pandas-like thing with better API. Maybe time to roll up my sleeves...

replies(22): >>42193093 #>>42193139 #>>42193143 #>>42193309 #>>42193374 #>>42193380 #>>42193693 #>>42193936 #>>42194067 #>>42194113 #>>42194302 #>>42194361 #>>42194490 #>>42194544 #>>42194670 #>>42195628 #>>42196720 #>>42197192 #>>42197489 #>>42198158 #>>42199832 #>>42200060 #

1. otsaloma ◴[20 Nov 24 15:10 UTC] No.42194544[source]▶

>>42193043 #

Agreed, never had a problem with the speed of anything NumPy or Arrow based.

Here's my alternative: https://github.com/otsaloma/dataiter https://dataiter.readthedocs.io/en/latest/_static/comparison...

Planning to switch to NumPy 2.0 strings soon. Other than that I feel all the basic operations are fine and solid.

Note for anyone else rolling up their sleeves: You can get quite far with pure Python when building on top of NumPy (or maybe Arrow). The only thing I found needing more performance was group-by-aggregate, where Numba seems to work OK, although a bit difficult as a dependency.

↑

FireDucks: Pandas but Faster