Polars Cloud and Distributed Polars now available

(pola.rs)

183 points jonbaer | 1 comments | 04 Sep 25 03:01 UTC | HN request time: 0.212s | source

Show context

drej ◴[04 Sep 25 10:51 UTC] No.45125792[source]▶

Having done a bit of data engineering in my day, I'm growing more and more allergic to the DataFrame API (which I used 24/7 for years). From what I've seen over the past ~10 years, 90+% of use cases would be better served by SQL, both from the development perspective as well as debugging, onboarding, sharing, migrating etc.

Give an analyst AWS Athena, DuckDB, Snowflake, whatever, and they won't have to worry about looking up what m6.xlarge is and how it's different from c6g.large.

replies(7): >>45125845 #>>45126294 #>>45127389 #>>45127993 #>>45128144 #>>45128518 #>>45134858 #

mrtimo ◴[04 Sep 25 14:01 UTC] No.45127389[source]▶

>>45125792 #

I agree with this 100%. The creator of duckdb argues that people using pandas are missing out of the 50 years of progress in database research, in the first 5 minutes of his talk here [1].

I've been using Malloy [2], which compiles to SQL (like Typescript compiles to Javascript), so instead of editing a 1000 line SQL script, it's only 18 lines of Malloy.

I'd love to see a blog post comparing a pandas approach to cleaning to an SQL/Malloy approach.

[1] https://www.youtube.com/watch?v=PFUZlNQIndo [2] https://www.malloydata.dev/

replies(3): >>45127742 #>>45128223 #>>45128330 #

fumeux_fume ◴[04 Sep 25 15:15 UTC] No.45128223[source]▶

>>45127389 #

In the same talk, Mark acknowledges that "for data science workflows, database systems are frustrating and slow." Granted DuckDB is an attempt to fix that, most data scientists don't get to choose what database the data is stored in.

replies(1): >>45128536 #

willvarfar ◴[04 Sep 25 15:42 UTC] No.45128536[source]▶

>>45128223 #

(I use duckdb to query data stored in parquet files)

replies(1): >>45130116 #

mrtimo ◴[04 Sep 25 17:54 UTC] No.45130116[source]▶

>>45128536 #

Same. But, I use Malloy which uses duckdb to query data stored in hundreds of parquet files (as if they were one big file).

replies(1): >>45135211 #

1. willvarfar ◴[05 Sep 25 05:21 UTC] No.45135211[source]▶

>>45130116 #

I haven't looked at Mallory, but I do regularly scan lots of parquet files using wildcards etc from duckdb. Its a neat builtin duckdb feature.

↑