Polars Cloud and Distributed Polars now available

(pola.rs)

183 points jonbaer | 2 comments | 04 Sep 25 03:01 UTC | HN request time: 0s | source

Show context

cbb330 ◴[04 Sep 25 09:51 UTC] No.45125448[source]▶

>>45123034 (OP) #

can you dive a bit deeper into the comparison with spark rdd

replies(1): >>45125509 #

ritchie46 ◴[04 Sep 25 10:03 UTC] No.45125509[source]▶

>>45125448 #

I am not an expert on Spark RDDs, but AFAIK they are a more low-level data structure that offer resilience and a lower level map-reduce API.

Polars Cloud maps the Polars API/DSL to distributed compute. This is more akin to Spark's high level DataFrame API.

With regard to implementation, we create stages that run parts of Polars IR (internal representation) on our OSS streaming engine. Those stages run on 1 or many workers create data that will be shuffled in between stages. The scheduler is responsible for creating the distributed query plan and work distribution.

replies(1): >>45125866 #

1. ayhanfuat ◴[04 Sep 25 11:05 UTC] No.45125866[source]▶

>>45125509 #

Can you tell a little about the status of Iceberg write support? Partitioning, maintenance etc.

replies(1): >>45127484 #

2. ritchie46 ◴[04 Sep 25 14:11 UTC] No.45127484[source]▶

>>45125866 (TP) #

We have full iceberg read support. We have done some preliminary work for iceberg write support. I think we will ship that once we have decided which Catalog we will add. The iceberg write API is intertwined with that.

↑