←back to thread

66 points enether | 1 comments | | HN request time: 0.501s | source

The space is confusing to say the least.

Message queues are usually a core part of any distributed architecture, and the options are endless: Kafka, RabbitMQ, NATS, Redis Streams, SQS, ZeroMQ... and then there's the “just use Postgres” camp for simpler use cases.

I’m trying to make sense of the tradeoffs between:

- async fire-and-forget pub/sub vs. sync RPC-like point to point communication

- simple FIFO vs. priority queues and delay queues

- intelligent brokers (e.g. RabbitMQ, NATS with filters) vs. minimal brokers (e.g. Kafka’s client-driven model)

There's also a fair amount of ideology/emotional attachment - some folks root for underdogs written in their favorite programming language, others reflexively dismiss anything that's not "enterprise-grade". And of course, vendors are always in the mix trying to steer the conversation toward their own solution.

If you’ve built a production system in the last few years:

1. What queue did you choose?

2. What didn't work out?

3. Where did you regret adding complexity?

4. And if you stuck with a DB-based queue — did it scale?

I’d love to hear war stories, regrets, and opinions.

Show context
wordofx ◴[] No.44019353[source]
Postgres. Doing ~ 70k messages/second average. Nothing huge but don’t need anything dedicated yet.
replies(3): >>44019364 #>>44019616 #>>44029804 #
lawn ◴[] No.44019364[source]
I'm curious on how people use Postgres as a message queue. Do you rely on libraries or do you run a custom implementation?
replies(5): >>44019417 #>>44019421 #>>44019433 #>>44019689 #>>44023708 #
padjo ◴[] No.44019433[source]
You can go an awfully long way with just SELECT … FOR UPDATE … SKIP LOCKED
replies(1): >>44023060 #
Spivak ◴[] No.44023060[source]
I've never found a satisfying way to not hold the lock for the full duration of the task that is resilient to workers potentially dying. And postgres isn't happy holding a bunch of locks like that. You end up having to register and track workers with health checks and a cleanup job to prune old workers so you can give jobs exclusivity for a time.
replies(3): >>44023198 #>>44023407 #>>44024113 #
1. nick0garvey ◴[] No.44024113[source]
Hold the lock and write a row with timestamp at the time you read.

That row indicates you are the one processing the data and no one else should. When reading, abort the read if someone else wrote that row first.

When you are finished processing, hold the lock and update the row you added before to indicate processing is complete.

The timestamp can be used to timeout the request.