Show HN: DuckDB for Kafka Stream Processing

(sql-flow.com)

75 points dm03514 | 3 comments | 08 Dec 25 17:25 UTC | HN request time: 0.602s | source

Hello Everyone! We built SQLFlow as a lightweight stream processing engine.

We leverage DuckDB as the stream processing engine, which gives SQLFlow the ability to process 10's of thousands of messages a second using ~250MiB of memory!

DuckDB also supports a rich ecosystem of sinks and connectors!

https://sql-flow.com/docs/category/tutorials/

https://github.com/turbolytics/sql-flow

We were tired of running JVM's for simple stream processing, and also of bespoke one off stream processors

I would love your feedback, criticisms and/or experiences!

Thank you

Show context

mihevc ◴[08 Dec 25 18:38 UTC] No.46195958[source]▶

>>46195007 (OP) #

How does this compare to https://github.com/Query-farm/tributary ?

replies(2): >>46196154 #>>46196322 #

1. rustyconover ◴[08 Dec 25 19:10 UTC] No.46196322[source]▶

>>46195958 #

The next major release of Tributary will support Avro, Protobuf and JSON along with the Schema Registry it will also bring the ability to write to Kafka with transactions.

But really you should get excited for DuckDB Labs to build out materialized views. Materialized views where you can ingest more streaming data to update aggregates. This way you could just keep pushing rows through aggregates from Kafka.

It is going to be a POWER HOUSE for streaming analytics.

Contact DuckDB Labs if you want to sponsor the work on materialized views: https://duckdb.org/roadmap

replies(2): >>46197772 #>>46200924 #

2. buremba ◴[08 Dec 25 21:18 UTC] No.46197772[source]▶

>>46196322 (TP) #

Exactly. I have also been playing with DuckDB for streaming use cases, but it feels hacky to issue micro-batching queries on streaming data in short intervals.

DuckDB has everything that streaming engines such as Flink have; it just needs to support managing intermediate aggregate states and scheduling the materialized views itself.

3. trueno ◴[09 Dec 25 03:27 UTC] No.46200924[source]▶

>>46196322 (TP) #

Is this to be used in an analytics application backend sort of scenario?

I am familiar with materialized views / dynamic tables from enterprise-grade cloud lake type offerings, but I've never quite understood where duckdb, though impressive, fits into everyones use case. I've toyed with it for personal things, it's very cool having a local instance of something akin to snowflake when it comes to processing and aggregating on Big Data™ but generally I don't see it used in operational settings. For application development people are generally tied to sqlite and postgres.

It all does seem really cool though, I guess I'm just not feeling creative enough to conjure up a stream-to-duckdb use case. Feel free to bombard me with cool ideas.

↑