←back to thread

IBM to acquire Confluent

(www.confluent.io)
443 points abd12 | 1 comments | | HN request time: 0s | source
Show context
itsanaccount ◴[] No.46192728[source]
And the enshittification treadmill continues. Great time to be a kafka alternative.

I'll start.

https://github.com/tansu-io/tansu

replies(7): >>46192825 #>>46192969 #>>46192986 #>>46193113 #>>46193154 #>>46193282 #>>46196760 #
spyspy ◴[] No.46193154[source]
`SELECT * FROM mytable ORDER BY timestamp ASC`
replies(1): >>46193266 #
alexjplant ◴[] No.46193266[source]
Ah yes, and every consumer should just do this in a while (true) loop as producers write to it. Very efficient and simple with no possibility of lock contention or hot spots. Genius, really.
replies(2): >>46193348 #>>46193839 #
CharlieDigital ◴[] No.46193839[source]
I've implemented a distributed worker system on top of this paradigm.

I used ZMQ to connect nodes and the worker nodes would connect to an indexer/coordinator node that effectively did a `SELECT FROM ORDER BY ASC`.

It's easier than you may think and the bits here ended up with probably < 1000 SLOC all told.

    - Coordinator node ingests from a SQL table
    - There is a discriminator key for each row in the table for ordering by stacking into an in-memory list-of-lists
    - Worker nodes are started with _n_ threads
    - Each thread sends a "ready" message to the coordinator and coordinator replies with a "work" message
    - On each cycle, the coordinator advances the pointer on the list, locks the list, and marks the first item in the child list as "pending"
    - When worker thread finishes, it sends a "completed" message to the coordinator and coordinator replies with another "work" message
    - Coordinator unlocks the list the work item originated from and dequeues the finished item.
    - When it reaches the end of the list, it cycles to the beginning of the list and starts over, skipping over any child lists marked as locked (has a pending work item)
Effectively a distributed event loop with the events queued up via a simple SQL query.

Dead simple design, extremely robust, very high throughput, very easy to scale workers both horizontally (more nodes) and vertically (more threads). ZMQ made it easy to connect the remote threads to the centralized coordinator. It was effectively "self balancing" because the workers would only re-queue their thread once it finished work. Very easy to manage, but did not have hot failovers since we kept the materialized, "2D" work queue in memory. Though very rarely did we have issues with this.

replies(2): >>46194812 #>>46198993 #
ahoka ◴[] No.46194812{3}[source]
Yeah, but that's like doing actual engineering. Instead you should just point to Kafka and say that it's going to make your horrible architecture scale magically. That's how the pros do it.
replies(1): >>46195462 #
1. tormeh ◴[] No.46195462{4}[source]
Kafka isn't magic, but it's close. If a single-node solution like an SQL database can handle your load then why shouldn't you stick with SQL? Kafka is not for you. Kafka is for workloads that would DDoS Postgres.