←back to thread

66 points enether | 1 comments | | HN request time: 0.205s | source

The space is confusing to say the least.

Message queues are usually a core part of any distributed architecture, and the options are endless: Kafka, RabbitMQ, NATS, Redis Streams, SQS, ZeroMQ... and then there's the “just use Postgres” camp for simpler use cases.

I’m trying to make sense of the tradeoffs between:

- async fire-and-forget pub/sub vs. sync RPC-like point to point communication

- simple FIFO vs. priority queues and delay queues

- intelligent brokers (e.g. RabbitMQ, NATS with filters) vs. minimal brokers (e.g. Kafka’s client-driven model)

There's also a fair amount of ideology/emotional attachment - some folks root for underdogs written in their favorite programming language, others reflexively dismiss anything that's not "enterprise-grade". And of course, vendors are always in the mix trying to steer the conversation toward their own solution.

If you’ve built a production system in the last few years:

1. What queue did you choose?

2. What didn't work out?

3. Where did you regret adding complexity?

4. And if you stuck with a DB-based queue — did it scale?

I’d love to hear war stories, regrets, and opinions.

Show context
Jemaclus ◴[] No.44006992[source]
For large applications in a service-oriented architecture, I leverage Kafka 100% of the time. With Confluent Cloud and Amazon MSK, infra is relatively trivial to maintain. There's really no reason to use anything else for this.

For smaller projects of "job queues," I tend to use Amazon SQS or RabbitMQ.

But just for clarity, Kafka is not really a message queue -- it's a persistent structured log that can be used as a message queue. More specifically, you can replay messages by resetting the offset. In a queue, the idea is once you pop an item off the queue, it's no longer in the queue and therefore is gone once it's consumed, but with Kafka, you're leaving the message where it is and moving an offset instead. This means, for example, that you can have many many clients read from the same topic without issue.

SQS and other MQs don't have that persistence -- once you consume the message and ack, the message disappears and you can't "replay it" via the queue system. You have to re-submit the message to process it. This means you can really only have one client per topic, because once the message is consumed, it's no longer available to anyone else.

There are pros and cons to either mechanism, and there's significant overlap in the usage of the two systems, but they are designed to serve different purposes.

The analogy I tend to use is that Kafka is like reading a book. You read a page, you turn the page. But if you get confused, you can flip back and reread a previous page. An MQ like RabbitMQ or Sidekiq is more like the line at the grocery store: once the customer pays, they walk out and they're gone. You can't go back and re-process their cart.

Again, pros and cons to both approaches.

"What didn't work out?" -- I've learned in my career that, in general, I really like replayability, so Kafka is typically my first choice, unless I know that re-creating the messages are trivial, in which case I am more inclined to lean toward RabbitMQ or SQS. I've been bitten several times by MQs where I can't easily recreate the queue, and I lose critical messages.

"Where did you regret adding complexity?" -- Again, smaller systems that are just "job queues" (versus service-to-service async communication) don't need a whole lot of complexity. So I've learned that if it's a small system, go with an MQ first (any of them are fine), and go with Kafka only if you start scaling beyond a single simple system.

"And if you stuck with a DB-based queue -- did it scale?" -- I've done this in the past. It scales until it doesn't. Given my experience with MQs and Kafka, I feel it's a trivial amount of work to set up an MQ/Kafka, and I don't get anything extra by using a DB-based queue. I personally would avoid these, unless you have a compelling reason to use it (eg, your DB isn't huge, and you can save money).

replies(2): >>44019302 #>>44019334 #
mlhpdx ◴[] No.44019334[source]
We build applications very differently. SQS queues with 1000s of clients have been a go to for me for over a decade. And the opposite as well — 1000s of queues (one per client device, they’re free). Zero maintenance, zero cost when unused. Absurd scalability.
replies(2): >>44021704 #>>44030872 #
1. Jemaclus ◴[] No.44030872[source]
Certainly. There are many paths to victory here.

One thing to consider is whether you _want_ your producers to be aware of the clients or not. If you use SQS, then your producer needs to be aware of where it's sending the message. In event-driven architecture, ideally producers don't care who's listening. They just broadcast a message: "Hey, this thing just happened." And anyone who wants to subscribe can subscribe. The analogy is a radio tower -- the radio broadcaster has no idea who's listening, but thousands and thousands of people can tune in and listen.

Contrast to making a phone call, where you have to know who it is that you're dialing and you can only talk to one person at a time.

There are pros and cons to both, but there's tremendous value in large applications for making the producer responsible for producing, but not having to worry about who is consuming. Particularly in organizations with large teams where coordinating that kind of thing can be a big pain.

But you're absolutely right: queues/topics are basically free, and you can have as many as you want! I've certainly done it the SQS way that you describe many times!

As I mentioned, there are many paths to victory. Mine works really well for me, and it sounds like yours works really well for you. That's fantastic :)