←back to thread

183 points jonbaer | 1 comments | | HN request time: 0.205s | source
Show context
drej ◴[] No.45125792[source]
Having done a bit of data engineering in my day, I'm growing more and more allergic to the DataFrame API (which I used 24/7 for years). From what I've seen over the past ~10 years, 90+% of use cases would be better served by SQL, both from the development perspective as well as debugging, onboarding, sharing, migrating etc.

Give an analyst AWS Athena, DuckDB, Snowflake, whatever, and they won't have to worry about looking up what m6.xlarge is and how it's different from c6g.large.

replies(7): >>45125845 #>>45126294 #>>45127389 #>>45127993 #>>45128144 #>>45128518 #>>45134858 #
robertkoss ◴[] No.45125845[source]
That is a false dichotomy. You can use SQL tools but still have to choose the instance type.

Especially when considering testability and composability, using a DataFrame API inside regular languages like Python is far superior IMO.

replies(2): >>45125950 #>>45126643 #
1. gigatexal ◴[] No.45126643[source]
Yeah it makes no sense.

Why is the dataframe approach getting hate when you’re talking about runtime details?

That folks understand the almost conversational aspect of SQL vs. that of the dataframe api but the other points make no difference.

If you’re a competent dev/data person and are productive with the dataframe then yay. Also setup and creating test data and such it’s all objects and functions after all — if anything it’s better than the horribad experience of ORMs.