←back to thread

119 points tosh | 1 comments | | HN request time: 0.212s | source
1. lmeyerov ◴[] No.42162004[source]
Always a fan of query plan articles!

Note: the dig at dataframe libs is worth some care in case you think that means duckdb can optimize and they cannot

Dask, Polars, and others pick a lazy default in order to make distribution and other optimizations easier. When staying in their pure fragments ('vectorized'), the same scheduler rewriting opportunity is here.

This is a subtle but important distinction when looking at these frameworks. We are making our new graph query language 'gfql' to be dataframe-native so it can run naturally & natively as a step of pipelines people are already doing, but also to ensure we automatically run as optimized CPU/GPU columnar opts. At the same time, because of the intent to allow room for query plan optimization, we are staying declarative / lazy, even if the generated & interpreted code uses an eager DF runtime . I'm optimistic about output target lazy DF systems doing query planner work for us long-term here, but for the eager framework targets, the query planning has to be on our side.