can you dive a bit deeper into the comparison with spark rdd
replies(1):
Polars Cloud maps the Polars API/DSL to distributed compute. This is more akin to Spark's high level DataFrame API.
With regard to implementation, we create stages that run parts of Polars IR (internal representation) on our OSS streaming engine. Those stages run on 1 or many workers create data that will be shuffled in between stages. The scheduler is responsible for creating the distributed query plan and work distribution.