Give an analyst AWS Athena, DuckDB, Snowflake, whatever, and they won't have to worry about looking up what m6.xlarge is and how it's different from c6g.large.
Give an analyst AWS Athena, DuckDB, Snowflake, whatever, and they won't have to worry about looking up what m6.xlarge is and how it's different from c6g.large.
I've been using Malloy [2], which compiles to SQL (like Typescript compiles to Javascript), so instead of editing a 1000 line SQL script, it's only 18 lines of Malloy.
I'd love to see a blog post comparing a pandas approach to cleaning to an SQL/Malloy approach.
[1] https://www.youtube.com/watch?v=PFUZlNQIndo [2] https://www.malloydata.dev/
That's pandas. Polars builds on much of the same 50 years of progress in database research by offering a lazy DataFrame API which does query optimization, morsel-based columnar execution, predicate pushdown into file I/O, etc, etc.
Disclaimer: I work for Polars on said query execution.