Postgres LISTEN/NOTIFY does not scale

(www.recall.ai)

570 points davidgu | 2 comments | 07 Jul 25 14:05 UTC | HN request time: 0.455s | source

Show context

JoelJacobson ◴[11 Jul 25 10:52 UTC] No.44530687[source]▶

Hey folks, I ran into similar scalability issues and ended up building a benchmark tool to analyze exactly how LISTEN/NOTIFY behaves as you scale up the number of listeners.

Turns out that all Postgres versions from 9.6 through current master scale linearly with the number of idle listeners — about 13 μs extra latency per connection. That adds up fast: with 1,000 idle listeners, a NOTIFY round-trip goes from ~0.4 ms to ~14 ms.

To better understand the bottlenecks, I wrote both a benchmark tool and a proof-of-concept patch that replaces the O(N) backend scan with a shared hash table for the single-listener case — and it brings latency down to near-O(1), even with thousands of listeners.

Full benchmark, source, and analysis here: https://github.com/joelonsql/pg-bench-listen-notify

No proposals yet on what to do upstream, just trying to gather interest and surface the performance cliff. Feedback welcome.

replies(3): >>44533379 #>>44538362 #>>44544402 #

cryptonector ◴[11 Jul 25 15:36 UTC] No.44533379[source]▶

>>44530687 #

That's pretty cool.

IMO LISTEN/NOTIFY is badly designed as an interface to begin with because there is no way to enforce access controls (who can notify; who can listen) nor is there any way to enforce payload content type (e.g., JSON). It's very unlike SQL to not have a `CREATE CHANNEL` and `GRANT` commands for dealing with authorization to listen/notify.

If you have authz then the lack of payload content type constraints becomes more tolerable, but if you add a `CREATE CHANNEL` you might as well add something there regarding payload types, or you might as well just make it so it has to always be JSON.

With a `CREATE CHANNEL` PG could provide:

  - authz for listen
  - authz for notify
  - payload content type constraints
    (maybe always JSON if you CREATE
    the channel)
  - select different serialization
    semantics (to avoid this horrible,
    no good, very bad locking behavior)
  - backwards-compatibility for listen/
    notify on non-created channels

replies(1): >>44537258 #

maxbond ◴[11 Jul 25 22:08 UTC] No.44537258[source]▶

>>44533379 #

> there is no way to enforce access controls

(I thought this was a fun puzzle, so don't take this as advice or as disagreement with your point.)

There is the option to use functions with SECURITY DEFINER to hack around this, but the cleanest way to do it (in the current API) would be to encrypt your messages on the application side using an authenticated system (eg AES-GCM). You can then apply access control to the keys. (Compromised services could still snoop on when adjacent channels were in use, however.)

replies(1): >>44539921 #

1. cryptonector ◴[12 Jul 25 06:52 UTC] No.44539921[source]▶

>>44537258 #

Yes, I've thought about this too, but it's annoying to have to resort to that, no?

replies(1): >>44540005 #

2. maxbond ◴[12 Jul 25 07:18 UTC] No.44540005[source]▶

>>44539921 (TP) #

Absolutely, Postgres is fantastic but LISTEN/NOTIFY is it's weakest feature. It's convenient, it has the potential to open up compelling use cases, it very nearly works, but has all these nasty limitations and rough edges that cause people to steer clear. I think a lot of people don't know it exists, you almost never hear it mentioned in discussions about async job queues in Postgres (which would seem like and obvious use case). I don't think it's ever been mentioned on the Postgres.FM podcast (I'm sure they're aware of it but it speaks to the lack of usage). I'd love to see it get some love in future releases, and I agree that access control is necessary for it to really work.

↑