(pola.rs)

183 points jonbaer | 2 comments | 04 Sep 25 03:01 UTC | HN request time: 0.419s | source

Show context

drej ◴[04 Sep 25 10:51 UTC] No.45125792[source]▶

Having done a bit of data engineering in my day, I'm growing more and more allergic to the DataFrame API (which I used 24/7 for years). From what I've seen over the past ~10 years, 90+% of use cases would be better served by SQL, both from the development perspective as well as debugging, onboarding, sharing, migrating etc.

Give an analyst AWS Athena, DuckDB, Snowflake, whatever, and they won't have to worry about looking up what m6.xlarge is and how it's different from c6g.large.

replies(7): >>45125845 #>>45126294 #>>45127389 #>>45127993 #>>45128144 #>>45128518 #>>45134858 #

1. drej ◴[04 Sep 25 12:00 UTC] No.45126294[source]▶

>>45125792 #

Fun aside - I actually used polars for a bit - first time I tried it, I actually thought it was broken, because it finished processing so quickly I thought it silently exited or something.

So I'm definitely a fan, IF you need the DataFrame API. My point was that most people don't need it and it's oftentimes standing in the way. That's all.

replies(1): >>45128630 #

2. orochimaaru ◴[04 Sep 25 15:50 UTC] No.45128630[source]▶

>>45126294 (TP) #

Polars is very nice. I’ve used it off and on. The option to write rust udf’s for performance, easy integration of rust with Python with pyo3 will make it a real contender.

Yes, I know spark and scala exist. I use it. But the underlying Java engines and the tacky Python gateway impact performance and capacity usage. Having your primary processing engine in the same process compiled natively always helps.

↑

Polars Cloud and Distributed Polars now available