←back to thread

183 points jonbaer | 1 comments | | HN request time: 0.246s | source
Show context
drej ◴[] No.45125792[source]
Having done a bit of data engineering in my day, I'm growing more and more allergic to the DataFrame API (which I used 24/7 for years). From what I've seen over the past ~10 years, 90+% of use cases would be better served by SQL, both from the development perspective as well as debugging, onboarding, sharing, migrating etc.

Give an analyst AWS Athena, DuckDB, Snowflake, whatever, and they won't have to worry about looking up what m6.xlarge is and how it's different from c6g.large.

replies(7): >>45125845 #>>45126294 #>>45127389 #>>45127993 #>>45128144 #>>45128518 #>>45134858 #
1. spenczar5 ◴[] No.45128144[source]
I agree, but there are other possibilities in between those two extremes, like Quivr [1]. Schemas are good, but they can be defined in Python and you get a lot more composability and modularity than you would find in SQL (or pandas, realistically).

1: https://github.com/B612-Asteroid-Institute/quivr