Optimizing Datalog for the GPU

(have been a big fan of this work for years now)

From the nearby perspective of building GFQL, an embeddable oss GPU graph dataframe query language somewhere between cypher and duckdb/pandas/spark, at an even higher-level on top of pandas, cudf, etc:

It's nice using higher-level languages with rich libraries underneath so we can focus on the foundational algorithm & data ecosystem problems while still achieving crazy numbers

cudf gives us optimized GPU joins, so jumping from cheap personal CPU or GPU boxes to 80GB server GPUs and deep 2B edge whole-graph queries running in a second without work has been nice :) we want our focus on getting regular graph operations fully data parallel in the way we want while being easy for users, figuring out areas like bigger-than-memory and data lakes, etc, so we want to defer lower-level efforts to when the rust etc rewrite is more merited. I do see value in starting low when the target value and workload is obvious for building our (eg, vector indexes / DBs), but when breaking new ground at every point, value to going where you can roll & extend faster.