We leverage DuckDB as the stream processing engine, which gives SQLFlow the ability to process 10's of thousands of messages a second using ~250MiB of memory!
DuckDB also supports a rich ecosystem of sinks and connectors!
https://sql-flow.com/docs/category/tutorials/
https://github.com/turbolytics/sql-flow
We were tired of running JVM's for simple stream processing, and also of bespoke one off stream processors
I would love your feedback, criticisms and/or experiences!
Thank you
The stream is achieved by the continuous flow of data from Kafka.
SQLFlow exposes a variable for batch size. Setting the batch size to 1 will make it so SQLFlow reads a kafka message, applies the processor SQL logic and then ensures it successfully commits the SQL results to the sink, one after another.
SQLFlow provides at least once delivery guarantees. It will only commit the source message once it successfully writes to the pipeline output (sink).
https://sql-flow.com/docs/operations/handling-errors
The batch table is just a convention which allows for seamless batch size configuration. If your throughput is low, or if you require message by message processing, SQLFlow can be toggled to a batch of 1. If you need higher throughput and can tolerate the latency, then the batch can be toggled higher.