Postgres LISTEN/NOTIFY does not scale

(www.recall.ai)

Show context

leontrolski ◴[10 Jul 25 21:00 UTC] No.44525560[source]▶

I'd be interested as to how dumb-ol' polling would compare here (the FOR UPDATE SKIP LOCKED method https://leontrolski.github.io/postgres-as-queue.html). One day I will set up some benchmarks as this is the kind of thing people argue about a lot without much evidence either way.

Wasn't aware of this AccessExclusiveLock behaviour - a reminder (and shameless plug 2) of how Postgres locks interact: https://leontrolski.github.io/pglockpy.html

replies(9): >>44525593 #>>44525651 #>>44525828 #>>44525857 #>>44527315 #>>44527425 #>>44527778 #>>44528689 #>>44533402 #

singron ◴[10 Jul 25 21:33 UTC] No.44525857[source]▶

>>44525560 #

Polling is the way to go, but it's also very tricky to get right. In particular, it's non-trivial to make a reliable queue that's also fast when transactions are held open and vacuum isn't able to clean tuples. E.g. "get the first available tuple" might have to skip over 1000s of dead tuples.

Holding transactions open is an anti-pattern for sure, but it's occasionally useful. E.g. pg_repack keeps a transaction open while it runs, and I believe vacuum also holds an open transaction part of the time too. It's also nice if your database doesn't melt whenever this happens on accident.

replies(3): >>44527188 #>>44528430 #>>44530342 #

atombender ◴[11 Jul 25 09:59 UTC] No.44530342[source]▶

>>44525857 #

Dead tuples is a real and significant problem, not just because it has to skip the tuples, but because the statistics that drive the planner don't account for them.

I found this out the hard way when I had a simple query that suddenly got very, very slow on a table where the application would constantly do a `SELECT ... FOR UPDATE SKIP LOCKED` and then immediately delete the rows after a tiny bit of processing.

It turned out that with a nearly empty table of about 10-20k dead tuples, the planner switched to using a different index scan, and would overfetch tons of pages just to discard them, as they only contained dead tuples. What I didn't realize is that the planner statistics doesn't care about dead tuples, and ANALYZE doesn't take them into account. So the planner started to think the table was much bigger than it actually was.

It's really important for these uses cases to tweak the autovacuum settings (which can be set on a per-table basis) to be much more aggressive, so that under high load, the vacuum runs pretty much continuously.

Another option is to avoid deleting rows, but instead use a column to mark rows as complete, which together with a partial index can avoid dead tuples. There are both pros and cons; it requires doing the cleanup (and VACUUM) as a separate job.

replies(1): >>44535859 #

1. singron ◴[11 Jul 25 19:10 UTC] No.44535859[source]▶

>>44530342 #

Unfortunately, updating the row also creates dead tuples. It's very tricky!

replies(1): >>44535896 #

2. atombender ◴[11 Jul 25 19:14 UTC] No.44535896[source]▶

>>44535859 (TP) #

It does, but because of how indexes work, I believe it won't be skewed by the presence of dead tuples (though the bloat can cause the live dat to be spread across a lot more blocks and therefore generate more I/O) as long as you run autoanalyze semi-regularly.

replies(1): >>44536811 #

3. singron ◴[11 Jul 25 21:10 UTC] No.44536811[source]▶

>>44535896 #

It depends on if you are getting Heap Only Tuples (HOT) updates or not. https://www.postgresql.org/docs/current/storage-hot.html

In this case, you might have enough dead tuples across your heap that you might get a lot of HOT updates. If you are processing in insertion order, you will also probably process in heap order, and you can actually get 0 HOT updates since the other tuples in the page aren't fully dead yet. You could try using a lower fillfactor to avoid this, but that's also bad for performance so it might not help.

replies(1): >>44537020 #

4. atombender ◴[11 Jul 25 21:39 UTC] No.44537020{3}[source]▶

>>44536811 #

If you have a "done" column that you filter on using a partial index, then it would never use HOT updates anyway, since HOT requires that none of the modified columns have an index.

replies(1): >>44537423 #

5. menthe ◴[11 Jul 25 22:30 UTC] No.44537423{4}[source]▶

>>44537020 #

False.

As of PG16, HOT updates are tolerated against summarizing indexes, such as BRIN.

https://www.postgresql.org/docs/16/storage-hot.html

Besides, you probably don't want "done" jobs in the same table as pending or retriable jobs - as you scale up, you likely want to archive them as it provides various operational advantages, at no cost.

replies(1): >>44537457 #

6. atombender ◴[11 Jul 25 22:34 UTC] No.44537457{5}[source]▶

>>44537423 #

Not false. Nobody would ever use BRIN for this. I'm talking about regular indexes, which do prevent HOT.

If you read my earlier comment properly, you'll notice a "done" column is to avoid deleting columns on the hot path and avoid dead tuples messing up the planner. I agree that a table should not contain done jobs, but then you risk running into the dead tuple problem. Both approaches are a compromise.

↑