(pola.rs)

183 points jonbaer | 1 comments | 04 Sep 25 03:01 UTC | HN request time: 0.246s | source

Show context

drej ◴[04 Sep 25 10:51 UTC] No.45125792[source]▶

Having done a bit of data engineering in my day, I'm growing more and more allergic to the DataFrame API (which I used 24/7 for years). From what I've seen over the past ~10 years, 90+% of use cases would be better served by SQL, both from the development perspective as well as debugging, onboarding, sharing, migrating etc.

Give an analyst AWS Athena, DuckDB, Snowflake, whatever, and they won't have to worry about looking up what m6.xlarge is and how it's different from c6g.large.

replies(7): >>45125845 #>>45126294 #>>45127389 #>>45127993 #>>45128144 #>>45128518 #>>45134858 #

1. spenczar5 ◴[04 Sep 25 15:07 UTC] No.45128144[source]▶

>>45125792 #

I agree, but there are other possibilities in between those two extremes, like Quivr [1]. Schemas are good, but they can be defined in Python and you get a lot more composability and modularity than you would find in SQL (or pandas, realistically).

1: https://github.com/B612-Asteroid-Institute/quivr

↑

Polars Cloud and Distributed Polars now available