Postgres LISTEN/NOTIFY does not scale

1. sorentwo ◴[10 Jul 25 20:55 UTC] No.44525509[source]▶

Postgres LISTEN/NOTIFY was a consistent pain point for Oban (background job processing framework for Elixir) for a while. The payload size limitations and connection pooler issues alone would cause subtle breakage.

It was particularly ironic because Elixir has a fantastic distribution and pubsub story thanks to distributed Erlang. That’s much more commonly used in apps now compared to 5 or so years ago when 40-50% of apps didn’t weren’t clustered. Thanks to the rise of platforms like Fly that made it easier, and the decline of Heroku that made it nearly impossible.

replies(3): >>44525640 #>>44526115 #>>44535609 #

2. cpursley ◴[10 Jul 25 21:09 UTC] No.44525640[source]▶

>>44525509 (TP) #

How did you resolve this? Did you consider listening to the WAL?

replies(2): >>44525760 #>>44525833 #

3. sorentwo ◴[10 Jul 25 21:21 UTC] No.44525760[source]▶

>>44525640 #

We have Postgres based pubsub, but encourage people to use a distributed Erlang based notifier instead whenever possible. Another important change was removing insert triggers, partially for the exact reasons mentioned in this post.

replies(1): >>44527140 #

4. parthdesai ◴[10 Jul 25 21:30 UTC] No.44525833[source]▶

>>44525640 #

Distributed Erlang if application is clustered, redis if it is not.

Source: Dev at one of the companies that hit this issue with Oban

5. alberth ◴[10 Jul 25 22:00 UTC] No.44526115[source]▶

>>44525509 (TP) #

I didn’t realize Oban didn’t use Mnesia (Erlang built-in).

replies(1): >>44526299 #

6. sorentwo ◴[10 Jul 25 22:17 UTC] No.44526299[source]▶

>>44526115 #

Very very few applications use mnsesia. There’s absolutely no way I would recommend it over Postgres.

replies(3): >>44526730 #>>44527665 #>>44528009 #

7. arcanemachiner ◴[10 Jul 25 23:08 UTC] No.44526730{3}[source]▶

>>44526299 #

I have heard the mnesia is very unreliable, which is a damn shame.

I wonder if that is fixable, or just inherent to its design.

replies(1): >>44527065 #

8. sb8244 ◴[11 Jul 25 00:04 UTC] No.44527065{4}[source]▶

>>44526730 #

My understanding is that mnesia is sort of a relic. Really hard to work with and lots of edge / failure cases.

I'm not sure if it should be salvaged?

9. MuffinFlavored ◴[11 Jul 25 00:16 UTC] No.44527140{3}[source]▶

>>44525760 #

> Another important change was removing insert triggers, partially for the exact reasons mentioned in this post.

What did you replace them with instead?

replies(1): >>44527560 #

10. sorentwo ◴[11 Jul 25 01:28 UTC] No.44527560{4}[source]▶

>>44527140 #

In app notifications, which can be disabled. Our triggers were only used to get subsecond job dispatching though.

11. asg0451 ◴[11 Jul 25 01:46 UTC] No.44527665{3}[source]▶

>>44526299 #

can you explain why?

replies(2): >>44528019 #>>44531214 #

12. tecleandor ◴[11 Jul 25 02:53 UTC] No.44528009{3}[source]▶

>>44526299 #

I think RabbitMQ still uses by default for its metadata storage. Is it problematic?

replies(1): >>44528351 #

13. spooneybarger ◴[11 Jul 25 02:56 UTC] No.44528019{4}[source]▶

>>44527665 #

Mnesia along with clustering was a recipe for split brain disasters a few years ago I assume that hasn't been addressed.

14. schaum ◴[11 Jul 25 04:18 UTC] No.44528351{4}[source]▶

>>44528009 #

They are in the process of migrating away from it https://www.rabbitmq.com/docs/metadata-store

15. ahoka ◴[11 Jul 25 12:10 UTC] No.44531214{4}[source]▶

>>44527665 #

I have only worked with a product that used it, so no direct experience, but one problem that was often mentioned is split-brains happening very frequently.

16. nightpool ◴[11 Jul 25 18:41 UTC] No.44535609[source]▶

>>44525509 (TP) #

What about Heroku made Erlang clustering difficult? It's had the same DNS clustering feature that Fly has, and they've had it since 2017: https://devcenter.heroku.com/articles/dyno-dns-service-disco....

replies(1): >>44535732 #

17. sorentwo ◴[11 Jul 25 18:55 UTC] No.44535732[source]▶

>>44535609 #

The problem was with restrictive connections, not DNS based discovery for clustering. It wasn't possible (as far as I'm aware) to connect directly from one dyno to another through tcp/udp.

replies(1): >>44535918 #

18. nightpool ◴[11 Jul 25 19:16 UTC] No.44535918{3}[source]▶

>>44535732 #

That is not an issue when using Private Spaces, which have been available since 2015