←back to thread

Building Databases over a Weekend

(www.denormalized.io)
81 points ambrood | 1 comments | | HN request time: 0.001s | source
Show context
Gepsens ◴[] No.42201022[source]
I remember 2 years ago someone proposed adding stream processing in datafusion and PRs followed. But IMO stream processing is an entirely different beast, some people could use the sql engine of df for it though. There are rust projects like Arroyo
replies(2): >>42201199 #>>42201573 #
1. necubi ◴[] No.42201199[source]
Creator of Arroyo here—we agree that stream processing is a different beast and needs different infrastructure from a batch engine like DataFusion.

Our approach has been to take pieces of DF (including the SQL frontend and expression engine) but embedding them in our own dataflow and operators. This allows us to support low latency, distribution, watermark processing, and consistent checkpointing.

But the great thing about DF is that it’s designed as a toolkit for SQL-oriented data processing, so it’s relatively easy to pick and use just the pieces you need.