←back to thread

66 points enether | 1 comments | | HN request time: 0.23s | source

The space is confusing to say the least.

Message queues are usually a core part of any distributed architecture, and the options are endless: Kafka, RabbitMQ, NATS, Redis Streams, SQS, ZeroMQ... and then there's the “just use Postgres” camp for simpler use cases.

I’m trying to make sense of the tradeoffs between:

- async fire-and-forget pub/sub vs. sync RPC-like point to point communication

- simple FIFO vs. priority queues and delay queues

- intelligent brokers (e.g. RabbitMQ, NATS with filters) vs. minimal brokers (e.g. Kafka’s client-driven model)

There's also a fair amount of ideology/emotional attachment - some folks root for underdogs written in their favorite programming language, others reflexively dismiss anything that's not "enterprise-grade". And of course, vendors are always in the mix trying to steer the conversation toward their own solution.

If you’ve built a production system in the last few years:

1. What queue did you choose?

2. What didn't work out?

3. Where did you regret adding complexity?

4. And if you stuck with a DB-based queue — did it scale?

I’d love to hear war stories, regrets, and opinions.

1. yesnomaybe ◴[] No.44020950[source]
Been on Kafka (MSK) for a couple of years. I find the programming model and getting everything perfectly set up to be sitting behind a steep learning curve, to my surprise. For example, at some point I had a timestamp header but only very much later realised that it all ends up as number[] on the consumer side. So I lost data. My fault, but still. I came to the realisation that the programming model especially in MSK is rather unintuitive.

I found it hard to shift mentally from MSK and its even triggers back to regular consumer spun up in containers etc. but that also it rather MSK than Kafka.

I am currently swapping out the whole pub/sub layer to MongoDB change streams, which I have found to be working really well. For queuing it attempts to lock on read so I can scale consumers with retry / backoff etc. Broadcast is simple and without locking, auto delete in Mongo.

I will have to see how it really scales and I'm sure I'm trading one problem for another but, it will definitely help to remove a moving part. Overall, app is rather low volume with the occasional spike. I would have stayed with Kafka were there be let's say >100rpm on the core functions.