Most active commenters

cryptonector(15)
osigurdson(12)
sgarland(8)
sorentwo(6)
mike_hearn(6)
williamdclt(5)
parthdesai(5)
v5v3(5)
atombender(4)
GuinansEyebrows(4)

Popular/hot comments

>>44528216 #
>>44527817 #
>>44525560 #
>>44528275 #
>>44529699 #
>>44525677 #
>>44527255 #
>>44528293 #
>>44525152 #
>>44527418 #
>>44526073 #
>>44525717 #
>>44525857 #
>>44526788 #
>>44527188 #
>>44526261 #
>>44525875 #
>>44527431 #
>>44526299 #
>>44525747 #

Postgres LISTEN/NOTIFY does not scale

(www.recall.ai)

1. hombre_fatal ◴[10 Jul 25 20:24 UTC] No.44525152[source]▶

>>44490510 (OP) #

Interesting. What if you just execute `NOTIFY` in its own connection outside of / after the transaction?

replies(4): >>44525332 #>>44525408 #>>44526003 #>>44528534 #

2. polote ◴[10 Jul 25 20:38 UTC] No.44525307[source]▶

>>44490510 (OP) #

Rls and triggers dont scale either

replies(2): >>44525677 #>>44525805 #

3. soursoup ◴[10 Jul 25 20:40 UTC] No.44525332[source]▶

>>44525152 #

Isn’t it standard practice to have a separate TCP stream for NOTIFY or am I mistaken

replies(1): >>44525529 #

4. cpursley ◴[10 Jul 25 20:42 UTC] No.44525353[source]▶

>>44490510 (OP) #

Right, plus there's character limitations (column size). This is why I prefer listening to the Postgres WAL for database changes:

https://github.com/cpursley/walex?tab=readme-ov-file#walex (there's a few useful links in here)

replies(3): >>44525747 #>>44526330 #>>44526350 #

5. CaliforniaKarl ◴[10 Jul 25 20:44 UTC] No.44525375[source]▶

>>44490510 (OP) #

I appreciate this post for two reasons:

* It gives an indication of how much you need to grow before this Postgres functionality starts being a blocker.

* Folks encountering this issue—and its confusing log line—in the future will be able to find this post and quickly understand the issue.

replies(1): >>44527418 #

6. nick_ ◴[10 Jul 25 20:47 UTC] No.44525408[source]▶

>>44525152 #

My thought as well. You could add notify commands to a temp table during the transaction, then run NOTIFY on each row in that temp table after the transaction commits successfully?

replies(2): >>44526402 #>>44526428 #

7. h1fra ◴[10 Jul 25 20:49 UTC] No.44525436[source]▶

>>44490510 (OP) #

You had one problem with listen notify which was a fair one, but now you have a problem with http latency, network issues, DNS, retries, self-DDoS, etc.

replies(1): >>44525472 #

8. NightMKoder ◴[10 Jul 25 20:49 UTC] No.44525446[source]▶

>>44490510 (OP) #

Facebook’s wormhole seems like a better approach here - just tailing the MySQL bin log gets you commit safety for messages without running into this kind of locking behavior.

9. mulmen ◴[10 Jul 25 20:50 UTC] No.44525448[source]▶

>>44490510 (OP) #

Sounds like one centralized Postgres instance, am I understanding that correctly? Wouldn’t meeting bots be very easy to parallelize across single-tenant instances?

10. GuinansEyebrows ◴[10 Jul 25 20:52 UTC] No.44525472[source]▶

>>44525436 #

it sounds like the impact of LISTEN/NOTIFY scaling issues was much greater on the overall DB performance than the actual load/scope of the task being performed (based on the end of the article), and they're aware that if they needed something more performant for that offloaded task, they have options (pub/sub via redis or w/e).

11. supportengineer ◴[10 Jul 25 20:52 UTC] No.44525481[source]▶

>>44490510 (OP) #

LISTEN/NOTIFY isn’t just a lock-free trigger. It can jeopardize concurrency under load.

Features that seem harmless at small scale can break everything at large scale.

replies(1): >>44527072 #

12. andrewstuart ◴[10 Jul 25 20:54 UTC] No.44525490[source]▶

>>44490510 (OP) #

There’s lots of ways to invoke NOTIFY without doing it from with the transaction doing the work.

The post author is too focused on using NOTIFY in only one way.

This post fails to explain WHY they are sending a NOTIFY. Not much use telling us what doesn’t work without telling us the actual business goal.

It’s crazy to send a notify for every transaction, they should be debounced/grouped.

The point of a NOTIFY is to let some other system know something has changed. Don’t do it every transaction.

replies(3): >>44525723 #>>44525784 #>>44526191 #

13. sorentwo ◴[10 Jul 25 20:55 UTC] No.44525509[source]▶

>>44490510 (OP) #

Postgres LISTEN/NOTIFY was a consistent pain point for Oban (background job processing framework for Elixir) for a while. The payload size limitations and connection pooler issues alone would cause subtle breakage.

It was particularly ironic because Elixir has a fantastic distribution and pubsub story thanks to distributed Erlang. That’s much more commonly used in apps now compared to 5 or so years ago when 40-50% of apps didn’t weren’t clustered. Thanks to the rise of platforms like Fly that made it easier, and the decline of Heroku that made it nearly impossible.

replies(3): >>44525640 #>>44526115 #>>44535609 #

14. remram ◴[10 Jul 25 20:57 UTC] No.44525529{3}[source]▶

>>44525332 #

You mean for LISTEN?

15. cshimmin ◴[10 Jul 25 21:00 UTC] No.44525555[source]▶

>>44490510 (OP) #

If I understood correctly, the global lock is so that notify events are emitted in order. Would it make sense to have a variant that doesn't make this ordering guarantee if you don't care about it, so that you can "notify" within transactions without locking the whole thing?

replies(1): >>44525731 #

16. leontrolski ◴[10 Jul 25 21:00 UTC] No.44525560[source]▶

>>44490510 (OP) #

I'd be interested as to how dumb-ol' polling would compare here (the FOR UPDATE SKIP LOCKED method https://leontrolski.github.io/postgres-as-queue.html). One day I will set up some benchmarks as this is the kind of thing people argue about a lot without much evidence either way.

Wasn't aware of this AccessExclusiveLock behaviour - a reminder (and shameless plug 2) of how Postgres locks interact: https://leontrolski.github.io/pglockpy.html

replies(9): >>44525593 #>>44525651 #>>44525828 #>>44525857 #>>44527315 #>>44527425 #>>44527778 #>>44528689 #>>44533402 #

17. aurumque ◴[10 Jul 25 21:03 UTC] No.44525593[source]▶

>>44525560 #

I'll take the shameless plug. Thank you for putting this together! Very helpful overview of pg locks.

replies(1): >>44535273 #

18. cpursley ◴[10 Jul 25 21:09 UTC] No.44525640[source]▶

>>44525509 #

How did you resolve this? Did you consider listening to the WAL?

replies(2): >>44525760 #>>44525833 #

19. cpursley ◴[10 Jul 25 21:10 UTC] No.44525651[source]▶

>>44525560 #

Have you played with pgmq? It's pretty neat: https://github.com/pgmq/pgmq

replies(1): >>44526409 #

20. shivasaxena ◴[10 Jul 25 21:12 UTC] No.44525677[source]▶

>>44525307 #

Yeah, I'm going to remove triggers in next deploy of a POS system since they are adding 10-50ms to each insert.

Becomes a problem if you are inserting 40 items to order_items table.

replies(5): >>44525741 #>>44526565 #>>44526644 #>>44526757 #>>44526964 #

21. shivasaxena ◴[10 Jul 25 21:17 UTC] No.44525717[source]▶

>>44490510 (OP) #

Out of curiosity: Would appreciate if others can share what other things like AccessExclusiveLock should postgres users beware of?

What I already know

- Unique indexes slow inserts since db has to acquire a full table lock

- Case statements in Where break query planner/optimizer and require full table scans

- Read only postgres functions should be marked as `STABLE PARALLEL SAFE`

replies(3): >>44525810 #>>44528659 #>>44529294 #

22. 0xCMP ◴[10 Jul 25 21:17 UTC] No.44525723[source]▶

>>44525490 #

Agreed, I am struggling to understand why "it does not scale" is not "we used it wrong and hit the point where it's a problem" here.

Like if it needs to be very consistent I would use an unlogged table (since we're worried about "scale" here) and then `FOR UPDATE SKIP LOCKED` like others have mentioned. Otherwise what exactly is notify doing that can't be done after the first transaction?

Edit: in-fact, how can they send an HTTP call for something and not be able to do a `NOTIFY` after as well?

One possible way I could understand what they wrote is that somewhere in their code, within the same transaction, there are notifies which conditionally trigger and it would be difficult to know which ones to notify again in another transaction after the fact. But they must know enough to make the HTTP call, so why not NOTIFY?

replies(1): >>44525970 #

23. GuinansEyebrows ◴[10 Jul 25 21:18 UTC] No.44525731[source]▶

>>44525555 #

possibly, but i think at that point it would make more sense to move the business logic outside of the database (you can wait for a successful commit before triggering an external process via the originating app, or monitor the WAL with an external pub/sub system, or something else more clever than i can think of).

24. cellis ◴[10 Jul 25 21:19 UTC] No.44525734[source]▶

>>44490510 (OP) #

It does scale. Just not to recall levels of traffic. Come on guys let's not rewrite everything in cassandra and rust now.

25. GuinansEyebrows ◴[10 Jul 25 21:20 UTC] No.44525741{3}[source]▶

>>44525677 #

that, and keeping your business logic in the database makes everything more opaque!

replies(1): >>44526749 #

26. williamdclt ◴[10 Jul 25 21:20 UTC] No.44525747[source]▶

>>44525353 #

I found recently that you can write directly to the WAL with transactional guarantees, without writing to an actual table. This sounds like it would be amazing for queue/outbox purposes, as the normal approaches of actually inserting data in a table cause a lot of resource usage (autovacuum is a major concern for these use cases).

Can’t find the function that does that, and I’ve not seen it used in the wild yet, idk if there’s gotchas

Edit: found it, it’s pg_logical_emit_message

replies(3): >>44526012 #>>44528978 #>>44533458 #

27. sorentwo ◴[10 Jul 25 21:21 UTC] No.44525760{3}[source]▶

>>44525640 #

We have Postgres based pubsub, but encourage people to use a distributed Erlang based notifier instead whenever possible. Another important change was removing insert triggers, partially for the exact reasons mentioned in this post.

replies(1): >>44527140 #

28. tomrod ◴[10 Jul 25 21:24 UTC] No.44525784[source]▶

>>44525490 #

Assuming you skip select transaction, or require logging on it because your regulated industry had bad auditors, then every transaction changes something.

29. Spivak ◴[10 Jul 25 21:26 UTC] No.44525805[source]▶

>>44525307 #

Neither do foreign keys the moment you need to shard. Turns out that there's no free lunch when you ask your database to do "secret extra work" that's supposed to be transparent-ish to the user.

replies(1): >>44526573 #

30. franckpachot ◴[10 Jul 25 21:26 UTC] No.44525810[source]▶

>>44525717 #

Can you provide more details? Inserting with unique indexes do not lock the table. Case statements are ok in where clause, use expression indexes to index it

31. RedShift1 ◴[10 Jul 25 21:29 UTC] No.44525828[source]▶

>>44525560 #

I use polling with back off up to one minute. So when a workload is done, it immediately polls for more work. If nothing found, wait for 5 seconds, still nothing 10 seconds, ... until one minute and from then on it polls every minute until it finds work again and the back off timer resets to 0 again.

32. parthdesai ◴[10 Jul 25 21:30 UTC] No.44525833{3}[source]▶

>>44525640 #

Distributed Erlang if application is clustered, redis if it is not.

Source: Dev at one of the companies that hit this issue with Oban

33. singron ◴[10 Jul 25 21:33 UTC] No.44525857[source]▶

>>44525560 #

Polling is the way to go, but it's also very tricky to get right. In particular, it's non-trivial to make a reliable queue that's also fast when transactions are held open and vacuum isn't able to clean tuples. E.g. "get the first available tuple" might have to skip over 1000s of dead tuples.

Holding transactions open is an anti-pattern for sure, but it's occasionally useful. E.g. pg_repack keeps a transaction open while it runs, and I believe vacuum also holds an open transaction part of the time too. It's also nice if your database doesn't melt whenever this happens on accident.

replies(3): >>44527188 #>>44528430 #>>44530342 #

34. dumbfounder ◴[10 Jul 25 21:35 UTC] No.44525875[source]▶

>>44490510 (OP) #

Transactional databases are not really the best tool for writing tons of (presumably) immutable records. Why are you using it for this? Why not Elastic?

replies(3): >>44525903 #>>44525945 #>>44550782 #

35. anonu ◴[10 Jul 25 21:38 UTC] No.44525898[source]▶

>>44490510 (OP) #

was hoping the solution was: we forked postgres.

cool writeup!

replies(1): >>44526432 #

36. incoming1211 ◴[10 Jul 25 21:39 UTC] No.44525903[source]▶

>>44525875 #

Because transactional databases are perfectly fine for this type of thing when you have 0 to 100k users.

replies(2): >>44527793 #>>44560364 #

37. Kwpolska ◴[10 Jul 25 21:43 UTC] No.44525945[source]▶

>>44525875 #

[citaiton needed]

38. randall ◴[10 Jul 25 21:43 UTC] No.44525948[source]▶

>>44490510 (OP) #

wow thanks for the heads up! no idea this was a thing.

replies(1): >>44526283 #

39. andrewstuart ◴[10 Jul 25 21:45 UTC] No.44525970{3}[source]▶

>>44525723 #

Agreed.

They’re using it wrong and blaming Postgres.

Instead they should use Postgres properly and architect their system to match how Postgres works.

There’s correct ways to notify external systems of events via NOTIFY, they should use them.

40. parthdesai ◴[10 Jul 25 21:48 UTC] No.44526003[source]▶

>>44525152 #

You lose transactional guarantees if you notify outside of the transaction though

replies(2): >>44526146 #>>44534706 #

41. cyberax ◴[10 Jul 25 21:50 UTC] No.44526012{3}[source]▶

>>44525747 #

One annoying thing is that there is no counterpart for an operation to wait and read data from WAL. You can poll it using pg_logical_slot_get_binary_changes, but it returns immediately.

It'd be nice to have a method that would block for N seconds waiting for a new entry.

You can also use a streaming replication connection, but it often is not enabled by default.

replies(1): >>44526264 #

42. 0xbadcafebee ◴[10 Jul 25 21:56 UTC] No.44526073[source]▶

>>44490510 (OP) #

RBDMS are not designed for write-heavy applications, they are designed for read-heavy analysis. Also, an RDBMS is not a message queue or an RPC transport.

I feel like somebody needs to write a book on system architecture for Gen Z that's just filled with memes. A funny cat pic telling people not to use the wrong tool will probably make more of an impact than an old fogey in a comment section wagging his finger.

replies(4): >>44526273 #>>44526358 #>>44527581 #>>44550728 #

43. alberth ◴[10 Jul 25 22:00 UTC] No.44526115[source]▶

>>44525509 #

I didn’t realize Oban didn’t use Mnesia (Erlang built-in).

replies(1): >>44526299 #

44. hombre_fatal ◴[10 Jul 25 22:02 UTC] No.44526146{3}[source]▶

>>44526003 #

Yeah, but pub/sub systems already need to be robust to missed messages. And, sending the notify after the transaction succeeds usually accomplishes everything you really care about (no false positives).

replies(1): >>44526451 #

45. thom ◴[10 Jul 25 22:06 UTC] No.44526191[source]▶

>>44525490 #

Yeah, the way I've always used LISTEN/NOTIFY is just to tell some pool of workers that they should wake up and check some transactional outbox for new work. False positives are basically harmless and therefore don't need to be transactional. If you're sending sophisticated messages with NOTIFY (which is a reasonable thing to think you can do) you're probably headed for pain at some point.

46. doc_manhat ◴[10 Jul 25 22:13 UTC] No.44526261[source]▶

>>44490510 (OP) #

Got up to the TL;DR paragraph. This was a major red flag given the initial presentation of the discovery of a bottleneck:

''' When a NOTIFY query is issued during a transaction, it acquires a global lock on the entire database (ref) during the commit phase of the transaction, effectively serializing all commits. '''

Am I missing something - this seems like something the original authors of the system should have done due diligence on before implementing a write heavy work load.

replies(3): >>44526339 #>>44528757 #>>44529143 #

47. williamdclt ◴[10 Jul 25 22:13 UTC] No.44526264{4}[source]▶

>>44526012 #

I think replication is the way to go, it’s kinda what it’s for.

Might be a bit tricky to get debezium to decode the logical event, not sure

replies(2): >>44527457 #>>44528991 #

48. hombre_fatal ◴[10 Jul 25 22:14 UTC] No.44526273[source]▶

>>44526073 #

But those rules of thumb aren't true. People use Postgres for job queues and write-heavy applications.

You'd have to at least accompany your memes with empirics. What is write-heavy? A number you might hit if your startup succeeds with thousands of concurrent users on your v1 naive implementation?

Else you just get another repeat of everyone cargo-culting Mongo because they heard that Postgres wasn't web scale for their app with 0 users.

replies(1): >>44527877 #

49. wordofx ◴[10 Jul 25 22:15 UTC] No.44526283[source]▶

>>44525948 #

It’s not a thing.

replies(1): >>44527522 #

50. sorentwo ◴[10 Jul 25 22:17 UTC] No.44526299{3}[source]▶

>>44526115 #

Very very few applications use mnsesia. There’s absolutely no way I would recommend it over Postgres.

replies(3): >>44526730 #>>44527665 #>>44528009 #

51. denysonique ◴[10 Jul 25 22:21 UTC] No.44526330[source]▶

>>44525353 #

For node.js users there is postgres.js that can listen to the Postgres WAL and emit node events that can be handled by application code.

52. kccqzy ◴[10 Jul 25 22:21 UTC] No.44526339[source]▶

>>44526261 #

I think it's just difficult to predict how heavy is heavy enough to make this a problem. FWIW I had worked at a startup with a much more primitive data storage system where serialized commits were actually totally fine. The startup never outgrew that bottleneck.

53. meesles ◴[10 Jul 25 22:23 UTC] No.44526350[source]▶

>>44525353 #

Yeah until vendors butcher Postgres replication behaviors and prevent common paths of integrating these capabilities into other tools. Looking at you AWS

54. kccqzy ◴[10 Jul 25 22:23 UTC] No.44526358[source]▶

>>44526073 #

There are OLTP and OLAP RDBMSes. Only OLAP ones are designed for read-heavy analyses.

55. foota ◴[10 Jul 25 22:28 UTC] No.44526402{3}[source]▶

>>44525408 #

Wouldn't you need to then commit to remove the entries from the temp table?

replies(1): >>44526437 #

56. edoceo ◴[10 Jul 25 22:29 UTC] No.44526409{3}[source]▶

>>44525651 #

Another thing for @leontrolski to add to the benchmarks - which I cannot wait to read.

replies(1): >>44527164 #

57. zbentley ◴[10 Jul 25 22:32 UTC] No.44526428{3}[source]▶

>>44525408 #

This is roughly the “transactional outbox” pattern—and an elegant use of it, since the only service invoked during the “publish” RPC is also the database, reducing distributed reliability concerns.

…of course, you need dedup/support for duplicate messages on the notify stream if you do this, but that’s table stakes in a lot of messaging scenarios anyway.

58. threecheese ◴[10 Jul 25 22:32 UTC] No.44526432[source]▶

>>44525898 #

I had a similar thought, as I was clicking through to TFA; “NOTIFY does not scale, but our new Widget can! Just five bucks”

59. zbentley ◴[10 Jul 25 22:33 UTC] No.44526437{4}[source]▶

>>44526402 #

No, so long as the rows in there are transactionally guaranteed to be present or not, a sweeper script can handle removing failed “publishes” (notifys that didn’t delete their row) later.

This does sacrifice ordering and increases the risk of duplicates in the message stream, though.

60. parthdesai ◴[10 Jul 25 22:35 UTC] No.44526451{4}[source]▶

>>44526146 #

What happens when transaction succeeds but the execution of NOTIFY fails if it's outside of transaction, in it's own separate connection?

replies(2): >>44526695 #>>44527506 #

61. brikym ◴[10 Jul 25 22:48 UTC] No.44526565{3}[source]▶

>>44525677 #

Have you tried deferring them?

62. mulmen ◴[10 Jul 25 22:49 UTC] No.44526573{3}[source]▶

>>44525805 #

Does that only apply when you need to shard within tenants?

If each tenant gets an instance I would call that a “shard” but in that pattern there’s no need for cross-shard references.

Maybe in the analytics stack but that can be async and eventually consistent.

63. candiddevmike ◴[10 Jul 25 22:58 UTC] No.44526644{3}[source]▶

>>44525677 #

How do you handle trigger logic that compares old/new without having a round trip back to the application?

replies(1): >>44527025 #

64. saltcured ◴[10 Jul 25 23:04 UTC] No.44526695{5}[source]▶

>>44526451 #

For reliability, you can make the recipient poll the table(s) of record for relevant state and use the out-of-band notification channel as a latency-reducer. So, the poller is eventually consistent at some configured polling interval, but opportunistically can respond much sooner when told to check again ahead of the next scheduled poll time.

In my experience, this means you make sure the polling solution is complete and correct, and the notifier gets reduced to a wake-up signal. This signal doesn't even need to carry the actionable change content, if the poller can already pose efficient queries for whatever "new stuff" it needs.

This approach also allows the poller to keep its own persistent cursor state if there is some stateful sequence to how it consumes the DB content. It automatically resynchronizes and the notification channel does not need to be kept in lock-step with the consumption.

replies(2): >>44527174 #>>44528147 #

65. arcanemachiner ◴[10 Jul 25 23:08 UTC] No.44526730{4}[source]▶

>>44526299 #

I have heard the mnesia is very unreliable, which is a damn shame.

I wonder if that is fixable, or just inherent to its design.

replies(1): >>44527065 #

66. lelanthran ◴[10 Jul 25 23:12 UTC] No.44526749{4}[source]▶

>>44525741 #

> that, and keeping your business logic in the database makes everything more opaque!

Opaque to who? If there's a piece of business logic that says "After this table's record is updated, you MUST update this other table", what advantages are there to putting that logic in the application?

When (not if) some other application updates that record you are going to have a broken database.

Some things are business constraints, and as such they should be moved into the database if at all possible. The application should never enforce constraints such as "either this column or that column is NULL, but at least one must be NULL and both must never be NULL at the same time".

Your database enforces constraints; what advantages are there to code the enforcement into every application that touches the database over simply coding the constraints into the database?

replies(2): >>44526911 #>>44532956 #

67. lelanthran ◴[10 Jul 25 23:13 UTC] No.44526757{3}[source]▶

>>44525677 #

> Yeah, I'm going to remove triggers in next deploy of a POS system since they are adding 10-50ms to each insert.

Do you expect it to be faster to do the trigger logic in the application? Wouldn't be slower to execute two statements from the application (even if they are in a transaction) than to rely on triggers?

68. callamdelaney ◴[10 Jul 25 23:18 UTC] No.44526788[source]▶

>>44490510 (OP) #

My kneejerk reaction to the headline is ‘why would it?’.

It’s unsurprising to me that an AI company appears to have chosen exactly the wrong tool for the job.

replies(3): >>44526797 #>>44527255 #>>44528010 #

69. bravesoul2 ◴[10 Jul 25 23:19 UTC] No.44526797[source]▶

>>44526788 #

Yeah I have no idea whether it would. But I'd load test it if it needed to scale.

SQS may have been a good "boring" choice for this?

70. to11mtm ◴[10 Jul 25 23:32 UTC] No.44526875[source]▶

>>44490510 (OP) #

Seriously people just layer shit with NATS for pubsub after persist and make sure there's a proper way to place a 'on restart recoonect' thing.

replies(1): >>44527669 #

71. thisoneisreal ◴[10 Jul 25 23:38 UTC] No.44526911{5}[source]▶

>>44526749 #

I think the dream is that business requirements are contained to one artifact and everything else responds to that driver. In an ideal world, it would be great to have databases care only about persistence and be able to swap them out based on persistence needs only. But you're right, in the real world the database is much better at enforcing constraints than applications.

72. nine_k ◴[10 Jul 25 23:48 UTC] No.44526964{3}[source]▶

>>44525677 #

Hmm, imho, triggers do scale, they are just slow. But as you add more connections, partitionss, and CPUs, the slowness per operation remains constant.

replies(1): >>44530474 #

73. freeasinbeer2 ◴[10 Jul 25 23:55 UTC] No.44527003[source]▶

>>44490510 (OP) #

Am I supposed to be able to tell from these graphs that one was faster than the other? Because I sure can't.

What were the TPS numbers? What was the workload like? How big is the difference in %?

74. SoftTalker ◴[10 Jul 25 23:58 UTC] No.44527025{4}[source]▶

>>44526644 #

Do it in a stored procedure not a trigger. Triggers have their place but a stored procedure is almost always better. Triggers can surprise you.

replies(1): >>44527147 #

75. deadbabe ◴[11 Jul 25 00:03 UTC] No.44527057[source]▶

>>44490510 (OP) #

Honestly this article is ridiculous. Most people do not have tens of thousands of concurrent writers. And most applications out there are read heavy, not write. Which means you probably have read replicas distributing loads.

Use LISTEN/NOTIFY. You will get a lot of utility out of it before you’re anywhere close to these problems.

replies(2): >>44527789 #>>44549312 #

76. sb8244 ◴[11 Jul 25 00:04 UTC] No.44527065{5}[source]▶

>>44526730 #

My understanding is that mnesia is sort of a relic. Really hard to work with and lots of edge / failure cases.

I'm not sure if it should be salvaged?

77. edoceo ◴[11 Jul 25 00:04 UTC] No.44527072[source]▶

>>44525481 #

It's true and folk should also choose the right tool at their scale and monitor it. There are plenty of cases where LISTEN/NOTIFY is the right choice.

However, in 2025 I'd pick Redis or MQTT for this kind of role. I'm typically in multi-lamg environments. Is there something better?

78. MuffinFlavored ◴[11 Jul 25 00:16 UTC] No.44527140{4}[source]▶

>>44525760 #

> Another important change was removing insert triggers, partially for the exact reasons mentioned in this post.

What did you replace them with instead?

replies(1): >>44527560 #

79. candiddevmike ◴[11 Jul 25 00:17 UTC] No.44527147{5}[source]▶

>>44527025 #

I don't follow how you would do that in a stored procedure outside of a trigger.

replies(1): >>44527554 #

80. cpursley ◴[11 Jul 25 00:19 UTC] No.44527164{4}[source]▶

>>44526409 #

There's a pretty cool solution built on pgmq called pgflow:

https://www.pgflow.dev/concepts/how-pgflow-works

81. parthdesai ◴[11 Jul 25 00:21 UTC] No.44527174{6}[source]▶

>>44526695 #

fwiw - that's what Oban did for the most part. It sent a signal to a worker that there was a new job to pick up and work on. At scale, even that was an issue.

82. time0ut ◴[11 Jul 25 00:22 UTC] No.44527188{3}[source]▶

>>44525857 #

An approach that has worked for me is to hash partition the table and have each worker look for work in one partition at a time. There are a number of strategies depending on how you manage workers. This allows you to only consider 1/Nth of the dead tuples, where N is the number of partitions, when looking for work. It does come at the cost of strict ordering, but there are many use cases where strict ordering is not required. The largest scale implementation of this strategy that I have done had 128 partitions with a worker per partition pumping through ~100 million tasks per day.

I also found LISTEN/NOTIFY to not work well at this scale and used a polling based approach with a back off when no work was found.

Quite an interesting problem and a bit challenging to get right at scale.

replies(3): >>44527269 #>>44527477 #>>44527797 #

83. kristianc ◴[11 Jul 25 00:34 UTC] No.44527255[source]▶

>>44526788 #

Sounds like a deliberate attempt to avoid spinning up Redis, Kafka, or an outbox system early on.. and then underestimated how quickly their scale would make it blow up. Story as old as time.

replies(4): >>44527509 #>>44527547 #>>44529439 #>>44529665 #

84. dfsegoat ◴[11 Jul 25 00:36 UTC] No.44527269{4}[source]▶

>>44527188 #

If there were a toy or other public implementation of this, I would love to see it.

85. TkTech ◴[11 Jul 25 00:42 UTC] No.44527315[source]▶

>>44525560 #

With that experience behind you, would you have feedback for Chancy[1]? It aims to be a batteries-included offering for postgres+python, aiming for hundreds of millions of jobs a day, not massive horizontal worker scaling.

It both polls (configurable per queue) and supports listen/notify simply to inform workers that it can wake up early to trigger polling, and this can be turned off globally with a notifications=false flag.

[1]: https://github.com/tktech/chancy

86. maxdo ◴[11 Jul 25 00:53 UTC] No.44527375[source]▶

>>44490510 (OP) #

What a discovery , even Postgres itself doesn’t scale easy. There are so many solutions that are dedicated and cost you less.

87. Gigachad ◴[11 Jul 25 01:02 UTC] No.44527418[source]▶

>>44525375 #

Sounds like ChatGPT appreciated the post

replies(4): >>44527677 #>>44529685 #>>44531582 #>>44531928 #

88. qianli_cs ◴[11 Jul 25 01:03 UTC] No.44527425[source]▶

>>44525560 #

My colleague did some internal benchmarking and found that LISTEN/NOTIFY performs well under low to moderate load, but doesn't scale well with a large number of listeners. Our findings were pretty consistent with this blog post.

(Shameless plug [1]) I'm working on DBOS, where we implemented durable workflows and queues on top of Postgres. For queues, we use FOR UPDATE SKIP LOCKED for task dispatch, combined with exponential backoff and jitter to reduce contention under high load when many workers are polling the same table.

Would love to hear feedback from you and others building similar systems.

[1] https://github.com/dbos-inc/dbos-transact-py

replies(2): >>44527862 #>>44530722 #

89. sleepy_keita ◴[11 Jul 25 01:04 UTC] No.44527431[source]▶

>>44490510 (OP) #

LISTEN/NOTIFY was always a bit of a puzzler for me. Using it means you can't use things like pgbouncer/pgpool and there are so many other ways to do this, polling included. I guess it could be handy for an application where you know it won't scale and you just want a simple, one-dependency database.

replies(3): >>44527549 #>>44528158 #>>44528472 #

90. cyberax ◴[11 Jul 25 01:10 UTC] No.44527457{5}[source]▶

>>44526264 #

Sure, but the replication protocol requires a separate connection. And the annoying part is that it requires a separate `pg_hba.conf` entry to be allowed. So it's not enabled for IAM-based connections on AWS, for example.

pg_logical_slot_get_binary_changes returns the same entries as the replication connection. It just has no support for long-polling.

91. j16sdiz ◴[11 Jul 25 01:13 UTC] No.44527477{4}[source]▶

>>44527188 #

Can't change the number of partition dynamically.

Additional challenge if jobs comes in funny sizes

replies(1): >>44528283 #

92. Groxx ◴[11 Jul 25 01:18 UTC] No.44527506{5}[source]▶

>>44526451 #

The same thing that happens if the notified process dies suddenly.

If you're not handling that, then whatever you're doing is unreliable either way.

replies(1): >>44528017 #

93. j16sdiz ◴[11 Jul 25 01:19 UTC] No.44527509{3}[source]▶

>>44527255 #

Kafka head of line blocking sucks.

replies(2): >>44527936 #>>44534841 #

94. randall ◴[11 Jul 25 01:22 UTC] No.44527522{3}[source]▶

>>44526283 #

i don’t understand. is the serialized write global lock a thing or no?

replies(1): >>44549203 #

95. const_cast ◴[11 Jul 25 01:26 UTC] No.44527547{3}[source]▶

>>44527255 #

I find the opposite story more true: additional complexity in the form of caching early, for a scale that never comes. I've worked on one too many sprawling, distributed systems with too little users to justify it.

replies(1): >>44528682 #

96. nightfly ◴[11 Jul 25 01:26 UTC] No.44527549[source]▶

>>44527431 #

> I guess it could be handy for an application where you know it won't scale and you just want a simple, one-dependency database

That's where we use it at my work. We have host/networking deployment pipelines that used to have up to one minute latency on each step because each was ran on a one-minute cron. A short python script/service that handled the LISTENing + adding NOTIFYs when the next step was ready removed the latency and we'll never do enough for the load on the db to matter

97. const_cast ◴[11 Jul 25 01:27 UTC] No.44527554{6}[source]▶

>>44527147 #

I think instead of performing an INSERT you call a stored proc that does the insert and some extra stuff.

replies(1): >>44535087 #

98. sorentwo ◴[11 Jul 25 01:28 UTC] No.44527560{5}[source]▶

>>44527140 #

In app notifications, which can be disabled. Our triggers were only used to get subsecond job dispatching though.

99. const_cast ◴[11 Jul 25 01:31 UTC] No.44527581[source]▶

>>44526073 #

People have been using RDBMS' for write-heavy workflows for forever. Some people even use stored procs or triggers for getting complicated write operations to work properly.

Databases can do a lot of stuff, and if you're not hurting for DB performance it can be a good idea to just... do it in the database. The advantage is that, if the DB does it, you're much less likely to break things. Putting data constraints in application code can be done, but then you're just waiting for the day those constraints are broken.

replies(2): >>44527831 #>>44527908 #

100. asg0451 ◴[11 Jul 25 01:46 UTC] No.44527665{4}[source]▶

>>44526299 #

can you explain why?

replies(2): >>44528019 #>>44531214 #

101. caleblloyd ◴[11 Jul 25 01:47 UTC] No.44527669[source]▶

>>44526875 #

Amen! NATS is how we do AI streaming! JetStream subject per thread with an ordered consumer on the client.

102. spoaceman7777 ◴[11 Jul 25 01:48 UTC] No.44527674[source]▶

>>44490510 (OP) #

This is part of the basis for Supabase offering their realtime service, and broadcast, rather than supporting native LISTEN/NOTIFY. The scaling issues are well known.

103. acdha ◴[11 Jul 25 01:49 UTC] No.44527677{3}[source]▶

>>44527418 #

If you think they’re a bot, flag and move on. No need for a derail about writing style.

104. sorentwo ◴[11 Jul 25 02:06 UTC] No.44527778[source]▶

>>44525560 #

Ping requires something persistent to check. That requires creating tuples, and most likely deleting them after they’ve been consumed. That puts pressure on the database and requires vacuuming in ways that pubsub doesn’t because it’s entirely ephemeral.

Not to mention that pubsub allows multiple consumers for a single message, whereas FOR UPDATE is single consumer by design.

105. acdha ◴[11 Jul 25 02:08 UTC] No.44527789[source]▶

>>44527057 #

I would phrase this as “know where your approach hits scaling walls”. You’re right that many people never need more than LISTEN/NOTIFY but the reason that advice became so popular was the wave of people who had jumped straight into running some complicated system like Kafka when they hadn’t done any analysis to justify it; it would be nice if the lesson we taught was that you should do some analysis rather than just picking one popular option.

106. 0xbadcafebee ◴[11 Jul 25 02:09 UTC] No.44527793{3}[source]▶

>>44525903 #

The total number of users in your system is not a performance characteristic. And transactions are generally wrong for write-heavy anything. Further, if you can just append then the transaction is meaningless.

replies(1): >>44553126 #

107. CBLT ◴[11 Jul 25 02:10 UTC] No.44527797{4}[source]▶

>>44527188 #

This is how Kafka does it. Kafka has spent years working on the rough edges (e.g. partition resizing), haven't used it recently though.

108. osigurdson ◴[11 Jul 25 02:13 UTC] No.44527817[source]▶

>>44490510 (OP) #

I like this article. Lots of comments are stating that they are "using it wrong" and I'm sure they are. However, it does help to contrast the much more common, "use Postgres for everything" type sentiment. It is pretty hard to use Postgres wrong for relational things in the sense that everyone knows about indexes and so on. But using something like L/N comes with a separate learning curve anyway - evidenced in this case by someone having to read comments in the Postgres source code itself. Then if it turns out that it cannot work for your situation it may be very hard to back away from as you may have tightly integrated it with your normal Postgres stuff.

I've landed on Postgres/ClickHouse/NATS since together they handle nearly any conceivable workload managing relational, columnar, messaging/streaming very well. It is also not painful at all to use as it is lightweight and fast/easy to spin up in a simple docker compose. Postgres is of course the core and you don't always need all three but compliment each other very well imo. This has been my "go to" for a while.

replies(12): >>44528211 #>>44528216 #>>44529511 #>>44529632 #>>44529640 #>>44529854 #>>44530773 #>>44531235 #>>44531722 #>>44532418 #>>44532993 #>>44534858 #

109. ◴[11 Jul 25 02:16 UTC] No.44527831{3}[source]▶

>>44527581 #

110. mind-blight ◴[11 Jul 25 02:22 UTC] No.44527862{3}[source]▶

>>44527425 #

Nice! I'm using DBOS and am a little active on the discord. I was just wondering how y'all handled this under the hood. Glad to hear I don't have to worry much about this issue

111. 0xbadcafebee ◴[11 Jul 25 02:25 UTC] No.44527877{3}[source]▶

>>44526273 #

There are lots of ways to empirically tell what solutions are right for what applications. The simplest is using basic computer science like applying big-O notation, or using something designed as a message queue to do message queueing, etc. Slightly more complicated are simple benchmarks with immutable infrastructure.

112. 0xbadcafebee ◴[11 Jul 25 02:31 UTC] No.44527908{3}[source]▶

>>44527581 #

That is the same logic that led every failed design I've seen in my career to take months (if not years) and tons of money to fix. "YOLO engineering" is simple at first and a huge pain in the ass later. Whereas actually correct engineering is slightly painful at first and saves your ass later.

The people who design it walk away after a few years, so they don't give a crap what happens. The rest of us have to struggle to support or try to replace whatever the lumbering monstrosity is.

replies(1): >>44550735 #

113. chrnola ◴[11 Jul 25 02:37 UTC] No.44527936{4}[source]▶

>>44527509 #

Guaranteeing order has its tradeoffs.

There is work happening currently to make Kafka behave more like a queue: https://cwiki.apache.org/confluence/display/KAFKA/KIP-932%3A...

114. tecleandor ◴[11 Jul 25 02:53 UTC] No.44528009{4}[source]▶

>>44526299 #

I think RabbitMQ still uses by default for its metadata storage. Is it problematic?

replies(1): >>44528351 #

115. TheTaytay ◴[11 Jul 25 02:53 UTC] No.44528010[source]▶

>>44526788 #

Because documentation doesn’t warn about this well-loved feature effectively ruins the ability to perform parallel writes, and because everything else in Postgres scales well.

I think it’s a reasonable assumption. Based on the second half of your comment, you clearly don’t think highly of “AI companies,” but I think that’s a separate issue.

116. cheesekunator ◴[11 Jul 25 02:56 UTC] No.44528017{6}[source]▶

>>44527506 #

98% of developers can't see it

117. spooneybarger ◴[11 Jul 25 02:56 UTC] No.44528019{5}[source]▶

>>44527665 #

Mnesia along with clustering was a recipe for split brain disasters a few years ago I assume that hasn't been addressed.

118. valenterry ◴[11 Jul 25 03:29 UTC] No.44528147{6}[source]▶

>>44526695 #

> you can make the recipient poll the table(s) of record for relevant state

That is tricky due to transactions and visibility. How do you write the poller to not miss events that were written by a long/blocked transaction? You'd have to set the poller scan to a long time (e.g. "process events that were written since now minus 5minutes") and then make sure transactions are cancelled hard before those 5minutes.

replies(1): >>44542829 #

119. valenterry ◴[11 Jul 25 03:31 UTC] No.44528158[source]▶

>>44527431 #

How about using a service that runs continuously and brings it's own pool? So basically all Java/JVM based solutions that use something like HiKariCP.

120. baristaGeek ◴[11 Jul 25 03:32 UTC] No.44528160[source]▶

>>44490510 (OP) #

Postgres is a great DB, but it's the wrong tool for a write-heavy, high-concurrency, real-time system with pub-sub needs.

You should split your system into specialized components: - Kafka for event transport (you're likely already doing this). - An LSM-tree DB for write-heavy structured data (eg: Cassandra) - Keep Postgres for queries that benefit from relational features in certain parts of your architecture

replies(2): >>44528163 #>>44528672 #

121. baristaGeek ◴[11 Jul 25 03:33 UTC] No.44528163[source]▶

>>44528160 #

Very good article! Succinct, and very informative.

122. goodkiwi ◴[11 Jul 25 03:44 UTC] No.44528211[source]▶

>>44527817 #

I’ve been meaning to check out NATS - I’ve tended to default to Redis for pubsub. What are the main advantages? I use clickhouse and Postgres extensively

replies(2): >>44528370 #>>44528846 #

123. fathomdeez ◴[11 Jul 25 03:47 UTC] No.44528216[source]▶

>>44527817 #

This kind of issue always comes up when people put business logic inside the database. Databases are for data. The data goes in and the data goes out, but the data does not get to decide what happens next based on itself. That's what application code is for.

replies(12): >>44528249 #>>44528293 #>>44528307 #>>44528582 #>>44528918 #>>44529077 #>>44529583 #>>44530054 #>>44530782 #>>44530978 #>>44532428 #>>44533144 #

124. bevr1337 ◴[11 Jul 25 03:52 UTC] No.44528249{3}[source]▶

>>44528216 #

> the data does not get to decide what happens next based on itself.

Then why bother with a relational database? Relations and schemas are business logic, and I'll take all the data integrity I can get.

replies(2): >>44528275 #>>44529035 #

125. Jailbird ◴[11 Jul 25 03:59 UTC] No.44528275{4}[source]▶

>>44528249 #

I've seen both of these philosophies. I liken them to religions, the believers are devout. Code is King vs the DB is King.

I'm personally Code is King, and I have my reasons (like everyone else)

replies(6): >>44528834 #>>44530423 #>>44531011 #>>44531578 #>>44532486 #>>44539388 #

126. AlisdairO ◴[11 Jul 25 04:01 UTC] No.44528283{5}[source]▶

>>44527477 #

Depending on exactly what you need, you can often fake this with a functional index on mod(queue_value_id, 5000). You then query for mod(queue_value_id,5000) between m and n. You can then dynamically adjust the gap between m and n based on how many partitions you want

127. panzi ◴[11 Jul 25 04:04 UTC] No.44528293{3}[source]▶

>>44528216 #

So what are your thoughts on constraints then? Foreign keys? Should that only be handled by the application, like Rails does (or did, haven't used in a long time).

replies(4): >>44528346 #>>44528702 #>>44529250 #>>44531409 #

128. platzhirsch ◴[11 Jul 25 04:08 UTC] No.44528307{3}[source]▶

>>44528216 #

If you want your database to just store bytes, use a key-value store. But SQL gives you schemas and constraints for a reason; they're guardrails for your business logic. Just don’t ask your tables to run the business for you.

replies(1): >>44528824 #

129. fathomdeez ◴[11 Jul 25 04:17 UTC] No.44528346{4}[source]▶

>>44528293 #

I don't think of those as business logic, per se. They're just validity checks on what the data should look like before it's written to disk - they're not actionable in the way L/N is. That being said, constraints usually end up being duplicated outside the db anyway, but having them where the data rests (so you don't have to assume every client is using the correct constraint code) makes sense.

replies(1): >>44536541 #

130. schaum ◴[11 Jul 25 04:18 UTC] No.44528351{5}[source]▶

>>44528009 #

They are in the process of migrating away from it https://www.rabbitmq.com/docs/metadata-store

131. sbstp ◴[11 Jul 25 04:24 UTC] No.44528370{3}[source]▶

>>44528211 #

I've been disappointed by Nats. Core Nats is good and works well, but if you need stronger delivery guarantees you need to use Jetstream which has a lot of quirks, for instance it does not integrate well with the permission system in Core Nats. Their client SDKs are very buggy and unreliable. I've used the Python, Rust and Go ones, only the Go one worked as expected. I would recommend using rabbitmq, Kafka or redpanda instead of Nats.

replies(3): >>44528923 #>>44529863 #>>44531532 #

132. leontrolski ◴[11 Jul 25 04:36 UTC] No.44528430{3}[source]▶

>>44525857 #

> also fast when transactions are held open

In my linked example, on getting the item from the queue, you immediately set the status to something that you're not polling for - does Postgres still have to skip past these tuples (even in an index) until they're vacuumed up?

133. nhumrich ◴[11 Jul 25 04:43 UTC] No.44528472[source]▶

>>44527431 #

You can setup notify to run as a trigger on an events table. The job that listens shouldn't need a pool, it's a long lived connection anyway. Now you can keep using pgbouncer everywhere else.

134. aaa12365 ◴[11 Jul 25 04:52 UTC] No.44528510[source]▶

>>44490510 (OP) #

135. zerd ◴[11 Jul 25 04:57 UTC] No.44528534[source]▶

>>44525152 #

That would make the locked time shorter, but it would still contend on the global lock, right?

136. physix ◴[11 Jul 25 05:08 UTC] No.44528582{3}[source]▶

>>44528216 #

That may hold to a certain extent for relational databases where your business model doesn't align well with physical model (tables). Although you might wonder why stored procedures and triggers were invented.

In databases where your domain is also your physical data model, coupling business logic to the database can work quite well, if the DBMS supports that.

https://medium.com/@paul_42036/entity-workflows-for-event-dr...

137. hans_castorp ◴[11 Jul 25 05:27 UTC] No.44528659[source]▶

>>44525717 #

> Unique indexes slow inserts since db has to acquire a full table lock

An INSERT never results in a full table lock (as in "the lock would prevent other inserts or selects on the table)

Any expression used in the WHERE clause that isn't indexed will probably result in a Seq Scan. CASE expressions are no different than e.g. a function call regarding this.

A stable function marked as "STABLE" (or even immutable) can be optimized differently (e.g. can be "inlined"), so yes that's a good recommendation.

138. ryanjshaw ◴[11 Jul 25 05:30 UTC] No.44528672[source]▶

>>44528160 #

IMO They don’t have a high concurrency DB writing system, they just think they do.

Recordings can and should be streamed to an object store. Parallel processes can do transcription on those objects; bonus: when they inevitably have a bug in transcription, retranscribing meetings is easy.

The output of transcription can be a single file also stored in the object store with a single completion message notification, or if they really insist on “near real-time”, a message on a queue for every N seconds. Much easier to scale your queue than your DB, eg Kafka partitions.

A handful of consumers can read those messages and insert into the DB. Benefit is you have a fixed and controllable write load into the database, and your client workload never overloads the DB because you’re buffering that with the much more distributed object store (which is way simpler than running another database engine).

139. physix ◴[11 Jul 25 05:32 UTC] No.44528682{4}[source]▶

>>44527547 #

"Sprawling distributed systems".

I like that. Sounds like a synonym for "Platform Engineering". :-)

I remember being amazed that lambda architecture was considered a kind of reference, when it looked to me more like a workaround.

We like to build IT cathedrals, until we have to run them.

replies(1): >>44529620 #

140. broken_broken_ ◴[11 Jul 25 05:33 UTC] No.44528689[source]▶

>>44525560 #

I have implemented polling against a cluster of mixed mariadb/mysql databases which do not offer listen/notify. It was a pain in the neck to get right.

- The batch size needs to be adaptative for performance, latency, and recovering smoothly after downtime.

- The polling timeouts, frequency etc the same.

- You need to avoid hysteresis.

- You want to be super careful about not disturbing the main application by placing heavy load on the database or accidentally locking tables/rows

- You likely want multiple distributed workers in case of a network partition to keep handling events

It’s hard to get right especially when the databases at the time did not support SKIP LOCKED.

In retrospect I wish I had listened to the WAL. Much easier.

141. Footkerchief ◴[11 Jul 25 05:37 UTC] No.44528702{4}[source]▶

>>44528293 #

You still use constraints even if you put all your business logic in stored procedures.

142. whatevaa ◴[11 Jul 25 05:50 UTC] No.44528757[source]▶

>>44526261 #

You don't know how heavy it will be in new systems. As another commenter mentioned, you might never reach that point. Simplier is always better.

143. IgorPartola ◴[11 Jul 25 06:01 UTC] No.44528824{4}[source]▶

>>44528307 #

If only different ORMs had more support for triggers and stored procedures. Things would be so much easier if I could do things like denormalize certain frequently accessed fields across tables but with proper ability to update them automatically without having to do them in application code.

replies(1): >>44533188 #

144. IgorPartola ◴[11 Jul 25 06:05 UTC] No.44528834{5}[source]▶

>>44528275 #

I am mostly on the side of business logic should live in applications and relationships between data types are not business logic so much as just the layout of the data. But I typically access data via an ORM and they typically don’t have support for triggers and stored procedures. If they did, I would certainly use it because projects I work on might have multiple people writing application code but everyone uses a single set of database models. This would mean that critical constraints on the shape of the data could be defined and respected at all times vs some developer on my team forgetting to include some critical check in their data update routine.

replies(2): >>44532517 #>>44539523 #

145. osigurdson ◴[11 Jul 25 06:07 UTC] No.44528846{3}[source]▶

>>44528211 #

NATS gives you regular pub/sub but also streams as well (similar to Kafka along with strong durability guarantees, etc).

146. chatmasta ◴[11 Jul 25 06:22 UTC] No.44528918{3}[source]▶

>>44528216 #

The first thing I did when I saw this article was to check the Postgres docs, because I thought "heh, surely they just didn't read the fine print," but the LISTEN/NOTIFY page has zero mentions of "lock" in the entire content.

replies(3): >>44529039 #>>44532925 #>>44533171 #

147. chatmasta ◴[11 Jul 25 06:23 UTC] No.44528923{4}[source]▶

>>44528370 #

Are those recommendations based on using them all in the same context? Curious why you chose Kafka (or Redpanda which is effectively the same) over NATS.

148. merb ◴[11 Jul 25 06:26 UTC] No.44528939[source]▶

>>44490510 (OP) #

Wouldn’t it be better nowadays to listen to the Wal. With a temporary replication slot and a publication just for this table and the id column?

149. gunnarmorling ◴[11 Jul 25 06:34 UTC] No.44528978{3}[source]▶

>>44525747 #

pg_logical_emit_message() is how I recommend users on Postgres to implement the outbox pattern [1]. No table overhead as you say, no need for housekeeping, etc. It has some other cool applications, e.g. providing application-specific metadata for CDC streams or transactional logging, wrote about it at [2] a while ago. Another one is making sure replication slots can advance also if there's no traffic in the database they monitor [3].

[1] https://speakerdeck.com/gunnarmorling/ins-and-outs-of-the-ou...

[2] https://www.infoq.com/articles/wonders-of-postgres-logical-d...

[3] https://www.morling.dev/blog/mastering-postgres-replication-...

replies(2): >>44529870 #>>44533192 #

150. gunnarmorling ◴[11 Jul 25 06:36 UTC] No.44528991{5}[source]▶

>>44526264 #

Debezium handles logical decoding messages OOTB. There's also an SMT (single message transform) for decoding the binary payload: https://debezium.io/documentation/reference/stable/transform....

151. jl6 ◴[11 Jul 25 06:42 UTC] No.44529035{4}[source]▶

>>44528249 #

I think an argument can be made that relations, schemas and constraints encode a kind of business logic that is intrinsic to the definition and integrity of the data, while other types of business logic represent processes that may hinge on data but aren’t as tightly coupled to it. Similar to the difference between a primitive type and a function.

I guess some will argue that their business logic is special and really is so tightly coupled to the data definition that it belongs in the database, and I’m not going to claim those use cases don’t exist, but I’ve seen over-coupling far more often than under-coupling.

This is why I say: Applications come and go, but data is forever.

152. perlgeek ◴[11 Jul 25 06:44 UTC] No.44529039{4}[source]▶

>>44528918 #

I really hope somebody reading this article (or HN thread) writes a doc patch to mention that.

I'm unlikely to get it myself today, and by tomorrow I've probably already forgotten it :-(

replies(1): >>44529590 #

153. tsimionescu ◴[11 Jul 25 06:53 UTC] No.44529077{3}[source]▶

>>44528216 #

The way you model data and store it in your database is fundamentally a part of your business logic. The same data can be modeled in many different ways, with different trade-offs for different use cases. Especially if you have a large amount of data, you can't just work with it as is, you need to know how you will use it and model it in a way that makes the common operations fast enough. As your application evolves, this may change, and even require data migrations.

None of this means you have to or even should use stored procedures, triggers, or listen/notify. I'm just making the point that there is no clean separation between "data" and "business logic".

replies(1): >>44529518 #

154. bjornsing ◴[11 Jul 25 07:02 UTC] No.44529138[source]▶

>>44490510 (OP) #

If I’m not mistaken LISTEN/NOTIFY doesn’t work with connection poolers, and you can’t have tens of thousands of connections to a Postgres database. Not sure you need a more elaborate analysis than that to reach the same conclusion.

replies(1): >>44531159 #

155. Someone ◴[11 Jul 25 07:03 UTC] No.44529143[source]▶

>>44526261 #

If “doing due diligence” involves reading the source code of a database server to verify a design, I doubt many people writing such systems do due diligence.

The documentation doesn’t mention any caveats in this direction, and they had 3 periods of downtime in 4 days, so I don’t think it’s a given that testing would have hit this problem.

156. ilitirit ◴[11 Jul 25 07:05 UTC] No.44529161[source]▶

>>44490510 (OP) #

> The structured data gets written to our Postgres database by tens of thousands of simultaneous writers. Each of these writers is a “meeting bot”, which joins a video call and captures the data in real-time.

Maybe I missed it in some folded up embedded content, or some graph (or maybe I'm probably just blind...), but is it mentioned at which point they started running into issues? The quoted bit about "10s of thousands of simultaneous writers" is all I can find.

What is the qualitative and quantitative nature of relevant workloads? Depending on the answers, some people may not care.

I asked ChatGPT to research it and this is the executive summary:

  For PostgreSQL’s LISTEN/NOTIFY, a realistic safe throughput is:

  Up to ~100–500 notifications/sec: Handles well on most systems with minimal tuning. Low risk of contention.

  ~500–2,000 notifications/sec: Reasonable with good tuning (short transactions, fast listeners, few concurrent writers). May start to see lock contention.

  ~2,000–5,000 notifications/sec: Pushing the upper bounds. Requires careful batching, dedicated listeners, possibly separate Postgres instances for pub/sub.

  >5,000 notifications/sec: Not recommended for sustained load. You’ll likely hit serialization bottlenecks due to the global commit lock held during NOTIFY.

replies(1): >>44529202 #

157. DumBthInker007 ◴[11 Jul 25 07:20 UTC] No.44529249[source]▶

>>44490510 (OP) #

My understanding: i think as postgres takes an exclusive lock to enqueue the notifications into a shared queue in PreCommit_Notify(), as the actual commit happens after notification was enqueued into the queue,as other transactions also try to notify but wait becacause of the lock ,so does the commit waits.

158. Lio ◴[11 Jul 25 07:20 UTC] No.44529250{4}[source]▶

>>44528293 #

Rails fully supports constraints and encourages you to use them.

You can either execute SQL in your migration or use add_check_constraint.

replies(1): >>44536496 #

159. winterrx ◴[11 Jul 25 07:29 UTC] No.44529287[source]▶

>>44490510 (OP) #

Funny, I got to their homepage and get 504'd

160. 1a527dd5 ◴[11 Jul 25 07:29 UTC] No.44529294[source]▶

>>44525717 #

https://pglocks.org/?pglock=AccessExclusiveLock is my go to reference.

My other reference for a slightly different problem is https://www.thatguyfromdelhi.com/2020/12/what-postgres-sql-c...

161. winterrx ◴[11 Jul 25 07:37 UTC] No.44529338[source]▶

>>44490510 (OP) #

They're the same company that ran into this, at least they're learning! > How WebSockets cost us $1M on our AWS bill

162. oulipo ◴[11 Jul 25 07:53 UTC] No.44529439{3}[source]▶

>>44527255 #

Not sure I get it... how would you replicate this functionality with Kafka? You'd still need to have the database LISTEN to changes and push it to Kafka no?

163. ownagefool ◴[11 Jul 25 08:04 UTC] No.44529511[source]▶

>>44527817 #

Largely agree. Functionality wise if you don't have many jobs, using the database as the queue is fine.

However, I've been in several situations where scaling the queue brings down the database, and therefore the app, and am thus of the opinion you probably shouldn't couple these systems too tightly.

There are pros and cons, of course.

replies(1): >>44529699 #

164. ehansdais ◴[11 Jul 25 08:06 UTC] No.44529518{4}[source]▶

>>44529077 #

Can't upvote this enough. The point is not that procedures outside of the DB is wrong, nor is it that procedures should always go into the DB. It's that you should look at the context and decide what the best way to solve the problem is.

replies(1): >>44533156 #

165. seunosewa ◴[11 Jul 25 08:11 UTC] No.44529538[source]▶

>>44490510 (OP) #

They have a history of not prioritising performance.

166. Cthulhu_ ◴[11 Jul 25 08:19 UTC] No.44529583{3}[source]▶

>>44528216 #

It really depends, but it's also a factor of time, that is, "back in the day", databases were designed to serve many different clients, nowadays a common practice is to have a 1:1 relationship between a database and a client application.

Of course, this is sometimes abused and taken to extremes in a microservices architecture where each service has their own database and you end up with nastiness like data duplication and distributed locking.

replies(1): >>44532740 #

167. Cthulhu_ ◴[11 Jul 25 08:20 UTC] No.44529590{5}[source]▶

>>44529039 #

> and by tomorrow I've probably already forgotten it :-(

You're self-aware and are writing about it, why not maintain and add it to your todo list if this is a recurring issue?

168. const_cast ◴[11 Jul 25 08:25 UTC] No.44529620{5}[source]▶

>>44528682 #

If there's one thing I took away from school, it's that distributed systems are hard. More failure points and much more communication hops. Serialization into deserialization into serialization again over network hops.

169. ◴[11 Jul 25 08:27 UTC] No.44529632[source]▶

>>44527817 #

170. v5v3 ◴[11 Jul 25 08:28 UTC] No.44529640[source]▶

>>44527817 #

Isn't Kafka the Postgresql of pub/sub

I.e. use Kafka unless you have a explicit reason not to?

So why Nats?

replies(2): >>44529774 #>>44530350 #

171. v5v3 ◴[11 Jul 25 08:31 UTC] No.44529665{3}[source]▶

>>44527255 #

Better to be successful with simple tech and have a minor 'blow up', then over engineer and go bust.

172. jjgreen ◴[11 Jul 25 08:34 UTC] No.44529685{3}[source]▶

>>44527418 #

Just for the em-dashes? Some humans also use them.

replies(2): >>44529697 #>>44530027 #

173. Gigachad ◴[11 Jul 25 08:36 UTC] No.44529697{4}[source]▶

>>44529685 #

It’s also the fact it’s just a summary of the post content without anything extra or any opinions.

replies(1): >>44529740 #

174. mike_hearn ◴[11 Jul 25 08:36 UTC] No.44529699{3}[source]▶

>>44529511 #

Using the database for queues is more than fine, it's often essential to correctness. In many use cases for queues you need to atomically update the database with respect to popping from the queue, and if they're separate systems you end up needing either XA or brittle and unreliable custom idempotency logic. I've seen this go wrong before and it's not nice, the common outcome is business-visible data corruption that can have financial impact.

This seems like another case where Postgres gets free marketing due to companies hitting its technical limits. I get why they choose to make lemonade in these cases with an eng blog post, but this is a way too common pattern on HN. Some startup builds on Postgres then spends half their eng budget at the most critical growth time firefighting around its limits instead of scaling their business. OpenAI had a similar blog post a couple of months ago where they revealed they were probably spending more than quarter of a million a month on an Azure managed Postgres, and it had stopped scaling so they were having to slowly abandon it, where I made the same comment [1].

Postgres is a great DB for what you pay, but IMHO well capitalized blitzscaling startups shouldn't be using it. If you buy a database - and realistically most Postgres users do anyway as they're paying for a cloud managed db - then you might as well just buy a commercial DB with an integrated queue engine. I have a financial COI because I have a part time job there in the research division (on non-DB stuff), so keep that in mind, but they should just migrate to an Oracle Database. It has a queue engine called TxEQ which is implemented on top of database tables with some C code for efficient blocking polls. It scales horizontally by just adding database nodes whilst retaining ACID transactions, and you can get hosted versions of them in all the major clouds. I'm using it in a project at the moment and it's been working well. In particular the ability to dequeue a message into the same transaction that does other database writes is very useful, as is the exposed lock manager.

Beyond scaling horizontally the nice thing about TxEQ/AQ is that it's a full message queue broker with all the normal features you'd expect. Delayed messages, exception queues, queue browsing, multi-consumer etc. LISTEN/NOTIFY is barely a queue at all, really.

For startups like this, the amount of time, money and morale they are losing with all these constant stories of firefights just doesn't make sense to me. It doesn't have to be Oracle, there are other DBs that can do this too. But "We discovered X about Postgres" is a eng blog cliché by this point. You're paying $$$ to a cloud and GPU vendor anyway, just buy a database and get back to work!

[1] https://news.ycombinator.com/item?id=44074506

replies(5): >>44531153 #>>44531375 #>>44532059 #>>44532779 #>>44535131 #

175. jjgreen ◴[11 Jul 25 08:41 UTC] No.44529740{5}[source]▶

>>44529697 #

Fair point

176. vb-8448 ◴[11 Jul 25 08:44 UTC] No.44529757[source]▶

>>44490510 (OP) #

I didn't see it in the article, can some tell me what is the scale of " many writers."?

177. evnix ◴[11 Jul 25 08:46 UTC] No.44529774{3}[source]▶

>>44529640 #

After working with NATS, I wouldn't want to touch Kafka even with a long stick. Its just too complex and a memory hog for no good reason. It doesn't have all the features that NATS supports as well.

replies(1): >>44530274 #

178. ilitirit ◴[11 Jul 25 08:54 UTC] No.44529839{3}[source]▶

>>44529202 #

What is wrong with you? Why would you even bother posting a comment like this?

Maybe you also don't know what ChatGPT Research is (the Enterprise version, if you really need to know), or what Executive Summary implies, but here's a snippet of the 28 sources used:

https://imgur.com/a/eMdkjAh

replies(1): >>44530622 #

179. riedel ◴[11 Jul 25 08:56 UTC] No.44529854[source]▶

>>44527817 #

Actually LISTEN/NOTIFY does also not scale the other way. Immich also moved to that pg for everything mentality (trying to remove redis dependencies). The problem: postgres needs a WAL flush for all notifications. I ran immich on my HDD-NAS. The result was constant noise because the pg backed socket.io backend issues constant keep alive messages.

180. PaoloBarbolini ◴[11 Jul 25 08:57 UTC] No.44529863{4}[source]▶

>>44528370 #

I've had the same experience and I fixed part of the problem by writing my own Rust client, Watermelon. It's still missing a lot of features but at least I'm not blocked by weird decisions taken by upstream.

181. williamdclt ◴[11 Jul 25 08:58 UTC] No.44529870{4}[source]▶

>>44528978 #

Ha, your [2] is how I learnt about it! Thanks :)

182. TrackerFF ◴[11 Jul 25 09:18 UTC] No.44530027{4}[source]▶

>>44529685 #

A decent way to classify human vs bot when it comes to dashes, is that all bots use ‘em-dashes(—), while almost none use regular dashes (-) in writing. While plenty of humans will use regular dashes, because they won’t bother to look for ‘em-dashes on the keyboard, or phone.

Of course, you have the people that correctly use em-dashes, too.

replies(1): >>44533577 #

183. djfivyvusn ◴[11 Jul 25 09:21 UTC] No.44530054{3}[source]▶

>>44528216 #

That's purely because nobody knows how to write SQL let alone stored procedures. If stored procedures had better devex they'd be used for most of your app.

replies(1): >>44532576 #

184. redskyluan ◴[11 Jul 25 09:29 UTC] No.44530122[source]▶

>>44490510 (OP) #

Postgres users often hit scaling issues — whether it's with LISTEN/NOTIFY, PGVector, or even basic relational queries.

For startups, Postgres is a fantastic first choice. But plan ahead: as your workload grows, you’ll likely need to migrate or augment your stack.

185. grumple ◴[11 Jul 25 09:40 UTC] No.44530216[source]▶

>>44490510 (OP) #

I’m mostly a MySQL user. Two things stand out:

1) the Postgres documentation does not mention that Notify causes a global lock or lock of any sort (I checked). That’s crazy to me; if something causes a lock, the documentation should tell you it does and what kind. Performance notes also belong in documentation for dbs.

2) why the hell does notify require a lock in the first place? Reading the comment this design seems insane; there’s no good reason to queue up notifications for transactions that aren’t committed. Just add the notifications in commit order with no lock, you’re building a db with concurrency, get used to it.

186. v5v3 ◴[11 Jul 25 09:50 UTC] No.44530274{4}[source]▶

>>44529774 #

What about the Kafka V2, Pulsar?

187. atombender ◴[11 Jul 25 09:59 UTC] No.44530342{3}[source]▶

>>44525857 #

Dead tuples is a real and significant problem, not just because it has to skip the tuples, but because the statistics that drive the planner don't account for them.

I found this out the hard way when I had a simple query that suddenly got very, very slow on a table where the application would constantly do a `SELECT ... FOR UPDATE SKIP LOCKED` and then immediately delete the rows after a tiny bit of processing.

It turned out that with a nearly empty table of about 10-20k dead tuples, the planner switched to using a different index scan, and would overfetch tons of pages just to discard them, as they only contained dead tuples. What I didn't realize is that the planner statistics doesn't care about dead tuples, and ANALYZE doesn't take them into account. So the planner started to think the table was much bigger than it actually was.

It's really important for these uses cases to tweak the autovacuum settings (which can be set on a per-table basis) to be much more aggressive, so that under high load, the vacuum runs pretty much continuously.

Another option is to avoid deleting rows, but instead use a column to mark rows as complete, which together with a partial index can avoid dead tuples. There are both pros and cons; it requires doing the cleanup (and VACUUM) as a separate job.

replies(1): >>44535859 #

188. the_duke ◴[11 Jul 25 10:00 UTC] No.44530350{3}[source]▶

>>44529640 #

Kafka is far from trivial to operate, for one thing, even post zookeeper.

replies(1): >>44531198 #

189. ako ◴[11 Jul 25 10:12 UTC] No.44530423{5}[source]▶

>>44528275 #

It’s really not about code is better or database it better, it’s mostly about locality: if you want to update thousands of records, you can’t pull those records into a separate process, update them there and then write back. So you put your code next to the data in the database. Stored procedures are just code deployed to a database container…

replies(1): >>44531243 #

190. ants_a ◴[11 Jul 25 10:20 UTC] No.44530474{4}[source]▶

>>44526964 #

Triggers are not even particularly slow. They just hide the extra work that is being done and thus sometimes come back to bite programmers by adding a ton of work to statements that look like they should be quick.

191. ◴[11 Jul 25 10:25 UTC] No.44530512[source]▶

>>44490510 (OP) #

192. ants_a ◴[11 Jul 25 10:43 UTC] No.44530622{4}[source]▶

>>44529839 #

In that snippet are links to Postgres docs and two blog posts, one being the blog post under discussion. None of those contain the information needed to make the presented claims about throughput.

To make those claims it's necessary to know what work is being done while the lock is held. This includes a bunch of various resource cleanup, which should be cheap, and RecordTransactionCommit() which will grab a lock to insert a WAL record, wait for it to get flushed to disk and potentially also for it to get acknowledged by a synchronous replica. So the expected throughput is somewhere between hundreds and tens of thousands of notifies per second. But as far as I can tell this conclusion is only available from PostgreSQL source code and some assumptions about typical storage and network performance.

replies(1): >>44531705 #

193. JoelJacobson ◴[11 Jul 25 10:52 UTC] No.44530687[source]▶

>>44490510 (OP) #

Hey folks, I ran into similar scalability issues and ended up building a benchmark tool to analyze exactly how LISTEN/NOTIFY behaves as you scale up the number of listeners.

Turns out that all Postgres versions from 9.6 through current master scale linearly with the number of idle listeners — about 13 μs extra latency per connection. That adds up fast: with 1,000 idle listeners, a NOTIFY round-trip goes from ~0.4 ms to ~14 ms.

To better understand the bottlenecks, I wrote both a benchmark tool and a proof-of-concept patch that replaces the O(N) backend scan with a shared hash table for the single-listener case — and it brings latency down to near-O(1), even with thousands of listeners.

Full benchmark, source, and analysis here: https://github.com/joelonsql/pg-bench-listen-notify

No proposals yet on what to do upstream, just trying to gather interest and surface the performance cliff. Feedback welcome.

replies(3): >>44533379 #>>44538362 #>>44544402 #

194. eatonphil ◴[11 Jul 25 10:57 UTC] No.44530722{3}[source]▶

>>44527425 #

Why not read the WAL?

replies(1): >>44534419 #

195. KronisLV ◴[11 Jul 25 11:04 UTC] No.44530773[source]▶

>>44527817 #

> Postgres/ClickHouse/NATS

Maybe throw in a dedicated key-value store like Redis or Valkey.

Oh and maybe something S3 compatible like MinIO, Garage or SeaweedFS for storing bunches of binary data.

With all of that, honestly it should cover most of the common workloads out there! Of course, depends on how specialized vs generic you like your software to be.

replies(1): >>44530853 #

196. KronisLV ◴[11 Jul 25 11:07 UTC] No.44530782{3}[source]▶

>>44528216 #

> That's what application code is for.

I've seen people who disagree with that statement and say that having a separate back end component often leads to overfetching and in-database processing is better. I've worked on some systems where the back end is essentially just passing data to and from stored procedures.

It was blazing fast, but working with it absolutely sucked - though for whatever reason the people who believe that seem to hold those views quite strongly.

197. whaleofatw2022 ◴[11 Jul 25 11:16 UTC] No.44530853{3}[source]▶

>>44530773 #

NATS does KV pretty well now (didn't have expiration till earlier this year)

replies(1): >>44532194 #

198. whstl ◴[11 Jul 25 11:37 UTC] No.44530978{3}[source]▶

>>44528216 #

This is one of those absolute statements that cause the kind of problem stated by grandparent. There are lots of those: "Use Postgres for everything", "No business data on the DB", "No methods bigger than 10 lines", "Abstractions only after 3 usages".

Back to the topic: Lots of potential bugs and data corruption issues are solved by moving part of the business logic to the database. Other people already covered two things: data validation and queue atomicity.

On the other hand, lots of potential issues can also arise by putting other parts of business logic to the database, for example, calling HTTPS endpoints from inside the DB itself is highly problematic.

The reality is that the world is not black and white, and being an engineer is about navigating this grey area.

replies(1): >>44533203 #

199. whstl ◴[11 Jul 25 11:42 UTC] No.44531011{5}[source]▶

>>44528275 #

And both of those philosophies will lead to bad engineering.

There are things that work better, are safer and simpler to do on the database, and things that work better, are safer and simpler in code. And those things might change depending on context, technology, requirements, size of project, experience of contributors, etc.

Forcing round pegs into square holes will always lead to brittle code and brittle products, often for more cost (mental and financial!) than actually using each tool correctly.

200. daitangio ◴[11 Jul 25 11:57 UTC] No.44531130[source]▶

>>44490510 (OP) #

I wrapped together a simple yet powerful queue system:

https://github.com/daitangio/pque

I evaluated Listen/notify but it seems to loose messages if no one is listening, so its use case seems pretty limited to me (my 2 cents).

Anyway, If you need to scale, I suggest an ad hoc queue server like rabbitmq.

201. jumski ◴[11 Jul 25 12:01 UTC] No.44531153{4}[source]▶

>>44529699 #

Using queues in atomic, transactional way was a core principle for building https://pgflow.dev - having whole workflow state transactionally updated alongside the work on the in db queue really simplifies a lot of things: debugging is easier, audit log is easy, reporting, stats etc are one SQL query away.

replies(2): >>44531256 #>>44531592 #

202. calderwoodra ◴[11 Jul 25 12:02 UTC] No.44531159[source]▶

>>44529138 #

Why doesn't LISTEN/NOTIFY work with connection poolers?

replies(1): >>44533606 #

203. ahoka ◴[11 Jul 25 12:07 UTC] No.44531198{4}[source]▶

>>44530350 #

And it's kinda wrong to use as a queue (in most cases), being a log stream you can seek in.

204. ahoka ◴[11 Jul 25 12:10 UTC] No.44531214{5}[source]▶

>>44527665 #

I have only worked with a product that used it, so no direct experience, but one problem that was often mentioned is split-brains happening very frequently.

205. j45 ◴[11 Jul 25 12:11 UTC] No.44531235[source]▶

>>44527817 #

There’s no reason this article and start with Postgres for everything can’t be true.

In the beginning having fewer parts to connect and maintain lets the needs and bottlenecks of the actual application emerge.

If it was listen/notify in such a scenario at some volume where optimizing it isn’t in the cards… so be it. It would be some time down the road before sharding a function into a specific subsystem like what you described.

Appreciate learning about the Postgres/Clickhouse/nats combo. If there might be an article if the three together that you liked would be happy to read and learn.

206. 0xFEE1DEAD ◴[11 Jul 25 12:12 UTC] No.44531243{6}[source]▶

>>44530423 #

Sure you can, I've done it plenty of times. I'm genuinely curious why you think it's not possible.

The only reasons I can think of:

- you're rewriting a legacy system and migrate parts incrementally

- data compliance

- you're running a dangerous database setup

I try my best to avoid putting any business logic inside databases and see stored procedures only as a temporary solution.

replies(2): >>44532398 #>>44539437 #

207. mike_hearn ◴[11 Jul 25 12:14 UTC] No.44531256{5}[source]▶

>>44531153 #

Nice! I'm also using queues as part of a workflow engine.

replies(1): >>44531406 #

208. FZambia ◴[11 Jul 25 12:27 UTC] No.44531352[source]▶

>>44490510 (OP) #

For real-time notifications, I believe Nats (https://nats.io) or Centrifugo (https://centrifugal.dev) are worth checking out these days. Messages may be delivered to those systems from PostgreSQL over replication protocol through Kafka as an intermediary buffer. Reliable real-time messaging comes with lots of complexities though, like late message delivery, duplicate message delivery. If the system can be built around at most once guarantees – can help to simplify the design dramatically. Depends on the use case of course, often both at least once and at most once should co-exist in one app.

replies(1): >>44533629 #

209. ownagefool ◴[11 Jul 25 12:29 UTC] No.44531375{4}[source]▶

>>44529699 #

It actually depends on the workload.

Sending webhooks, as an example, often has zero need to go back and update the database, but I've seen that exact example take down several different managed databases ( i.e., not just postgres ).

replies(1): >>44533280 #

210. jumski ◴[11 Jul 25 12:33 UTC] No.44531406{6}[source]▶

>>44531256 #

Oh really? Would love to check it out and borrow some ideas! :)

211. parthdesai ◴[11 Jul 25 12:33 UTC] No.44531409{4}[source]▶

>>44528293 #

What happens to FKs when you've to partition/shard the db? At a certain scale, they actually hinder the inserts.

replies(2): >>44532672 #>>44533277 #

212. westurner ◴[11 Jul 25 12:39 UTC] No.44531448[source]▶

>>44490510 (OP) #

Re: Postgres LISTEN/NOTIFY and PgQueuer, which is built on LISTEN/NOTIFY: https://news.ycombinator.com/item?id=41284703#41285614

213. FZambia ◴[11 Jul 25 12:48 UTC] No.44531532{4}[source]▶

>>44528370 #

Client SDKs are often a major challenge in systems like these. In my experience, building SDKs on top of asynchronous protocols is particularly tricky. It's generally much easier to make the server-side part reliable. The complexity arises because SDKs must account for a wide range of usage patterns - and you are not controlling the usage.

Asynchronous protocols frequently result in callback-based or generator-style APIs on the client side, which are hard to implement safely and intuitively. For example, consider building a real-time SDK for something like NATS. Once a message arrives, you need to invoke a user-defined callback to handle it. At that point, you're faced with a design decision: either call the callback synchronously (which risks blocking the socket reading loop), or do it asynchronously (which raises issues like backpressure handling).

Also, SDKs are often developed by different people, each with their own design philosophy and coding style, leading to inconsistency and subtle bugs.

So this isn't only about NATS. Just last week, we ran into two critical bugs in two separate Kafka SDKs at work.

214. dotancohen ◴[11 Jul 25 12:54 UTC] No.44531578{5}[source]▶

>>44528275 #

I believe that both code and data are kings, under different realms. Code is king of the "what we're doing today" realm. Data is king of the "what's possible tomorrow" realm.

Both have their place in business.

215. yrds96 ◴[11 Jul 25 12:54 UTC] No.44531582{3}[source]▶

>>44527418 #

I'm ESL, so I often check my grammar on ChatGPT, and 99% of the time it includes em dashes in the corrected sentences, which I remove or just replace with commas or hyphens to sound more natural. So maybe this was not entirely written but just revised by ChatGPT.

216. pbronez ◴[11 Jul 25 12:56 UTC] No.44531592{5}[source]▶

>>44531153 #

Looks interesting- but why the Supabase dependency? That’s a much tighter requirement than a vanilla PostgreSQL extension or something like PostgREST

replies(1): >>44534474 #

217. ilitirit ◴[11 Jul 25 13:06 UTC] No.44531705{5}[source]▶

>>44530622 #

> In that snippet are links to Postgres docs and two blog posts

Yes, that's what a snippet generally is. The generated document from my very basic research prompt is over 300k in length. There are also sources from the official mailing lists, graphile, and various community discussions.

I'm not going to post the entire outout because it is completely beside the point. In my original post, I expliclity asked "What is the qualitative and quantitative nature of relevant workloads?" exactly because it's not clear from the blog post. If, for example, they only started hitting these issues with 10k simultaneous reads/writes, then it's reasonable to assume that many people who don't have such high work loads won't really care.

The ChatGPT snippet was included to show that that's what ChatGPT research told me. Nothing more. I basically typed a 2-line prompt and asked it to include the original article. Anyone who thinks that what I posted is authoritative in any way shouldn't be considering doing this type of work.

218. jelder ◴[11 Jul 25 13:08 UTC] No.44531722[source]▶

>>44527817 #

"use Postgres for everything" is certainly wrong, eventually. It's still the second-best choice for every new project, and most products will never see the traffic levels that justify using something more specialized. Obviously, recall.ai hit the level of traffic where Postgres was no longer ideal. I bet they don't regret it for the other parts of their product.

replies(2): >>44533773 #>>44537646 #

219. FZambia ◴[11 Jul 25 13:09 UTC] No.44531732[source]▶

>>44490510 (OP) #

Many here recommend using Kafka or RabbitMQ for real-time notifications. While these tools work well with a relatively stable, limited set of topics, they become costly and inefficient when dealing with a large number of dynamic subscribers, such as in a messaging app where users frequently come and go. In RabbitMQ, queue bindings are resource-intensive, and in Kafka, creating new subscriptions often triggers expensive rebalancing operations. I've seen a use case for a messenger app with 100k concurrent subscribers where developers used RabbitMQ and individual queues for each user. It worked at 60 CPU on Rabbit side during normal situation and during mass reconnections of users (due to some proxy reload in infra) – it took up to several minutes for users to reconnect. I suggested switching to https://github.com/centrifugal/centrifugo with Redis engine (combines PUB/SUB + Redis streams for individual queues) – and it went to 0.3 CPU on Redis side. Now the system serves about 2 million concurrent connections.

replies(1): >>44547068 #

220. CaliforniaKarl ◴[11 Jul 25 13:31 UTC] No.44531928{3}[source]▶

>>44527418 #

I did not use ChatGPT—nor any AI—in writing the post. I'm curious, would you mind emailing—or replying—with what made you think that it was written by AI? Or why you do not believe my statement?

221. sgarland ◴[11 Jul 25 13:43 UTC] No.44532059{4}[source]▶

>>44529699 #

> "We discovered X about Postgres" is a eng blog cliché by this point.

It really is, and it’s often surprising to me how basic some of the issues are being discovered. Like Figma, when they waited a shocking amount of time add [0] PgBouncer and read replicas. This is such a well-trod path that it’s baffling to me why you wouldn’t add it once it’s clear you have a winning product. At the very least, PgBouncer (or PgCat, or any other connection pooler / proxying service) - it adds negligible cost per month (in comparison to DB read replicas) to run a couple of containers with a load balancer.

Re: Oracle, as much as I despise the company for its litigious practices, I’ll hand it to you that the features your DB has are astonishing. RAC is absolutely incredible (on paper - I’ve never used it).

[0]: https://www.figma.com/blog/how-figma-scaled-to-multiple-data...

222. indeyets ◴[11 Jul 25 13:55 UTC] No.44532194{4}[source]▶

>>44530853 #

Nats is getting there, but not yet.

Redis is still much more powerful: lists, sorted sets and bazillion of other data structures

replies(1): >>44539371 #

223. bevr1337 ◴[11 Jul 25 14:14 UTC] No.44532398{7}[source]▶

>>44531243 #

Although I'm partial to a SPROC, I do not deploy them because I understand my colleagues might throw me from a window. But without going full tilt DB-as-the-application,

The DB can make much stronger guarantees about transactions and updates the closer that logic happens to itself. In the world of cloud computing, this can be a cost savings for ingress/egress too.

replies(1): >>44539483 #

224. dathinab ◴[11 Jul 25 14:16 UTC] No.44532418[source]▶

>>44527817 #

Honestly whatever kind of DB you are speaking about always be wary of "niche/side features" which don't fit it's core design goals, they tend to have unexpected limitations.

listen/notify isn't necessary a replacement for redis or other pub/sub systems, redis pub/sub and similar isn't necessary a replacement for idk. Kafka or similar queue/messaging system

but a lot of companies have (for modern standards) surprisingly small amounts of data, very even a increase by 2,3,4x still isn't that big. In that case listen/notify and similar might just work fine :shrug:

also same is true the other way around, depending on you application you can go redis only, as long as you data volume stays small enough and needs for transactional/sync are reasonable simple enough (with watch+exec, NX,XX options etc. and maybe some redis side lua scripts you can do quite a lot for data synchronization). Issue with that is that stylistically redis data sync/transaction code is often much more similar to writing atomic data-structures then to SQL transactions, and even for SQL transactions there is a trend of devs severely overestimating what they provide, so often you are better of not touching on it when you can avoid it, also BTW. redis has something very similar to sqlite or Notify where "basically" (oversimplified by a lot) there is only one set of writes done at a time ;) (and then afterwards distributed to replicas), just that outside of some micro lua scripts you don't really run much logic outside of some NX, XX checks etc. so it's not blocking much and it's "more or less" all just in memory not touching a WAL (again oversimplified).

replies(1): >>44539322 #

225. sgarland ◴[11 Jul 25 14:17 UTC] No.44532428{3}[source]▶

>>44528216 #

Disagree; these issues come up when people use more advanced features of DBs without having the requisite DB expertise on staff. I’ll give OP that Postgres’ docs do not mention this gotcha (and props to them for drilling down to source code!), but by and large, these issues are from people operating via tech blogs.

The DB is - or should be - the source of truth for your application. Also, since practically everyone is using cloud RDBMS with (usually) networked storage, the latency is atrocious. Given those, it seems silly to rely on an application to react to and direct changes to related data.

For example, if you want to soft-delete customer data while maintaining the ability to hard-delete, then instead of having an is_deleted and/or deleted_at column, have a duplicate table or tables, and an AFTER DELETE trigger on the originals that move the tuples to the other tables.

Or if you want to have get_or_create without multiple round trips (and you don’t have Postgres’ MERGE … RETURNING), you can easily accomplish this with a stored procedure.

Using database features shouldn’t be seen as verboten or outdated. What should be discouraged is not treating things like stored procedures and triggers as code. They absolutely should be in VCS, should go the same review process as anything else, and should be well-documented.

226. sgarland ◴[11 Jul 25 14:22 UTC] No.44532486{5}[source]▶

>>44528275 #

Every company I’ve been at that relied on application code to handle referential integrity had orphaned rows, and incidents related to data errors or the absurd pipelines they had built to recreate what FK constraints and triggers already do.

RDBMS are extremely well-tested pieces of software that do their job incredibly well. To think that you could do better, or even equally as well, is hubris. If you want to trade those guarantees for “velocity” go right ahead, but you also need to take into account the inevitable incidents and recoveries that will occur.

227. sgarland ◴[11 Jul 25 14:25 UTC] No.44532517{6}[source]▶

>>44528834 #

Every ORM I’m aware of allows you to drop down to raw SQL. Write your stored procedure, store it in VCS, add it as a migration, and then call it. If you want to make it friendlier, wrap the call in a function in your language so you can add helpers, better error handling, etc.

replies(1): >>44537448 #

228. sgarland ◴[11 Jul 25 14:32 UTC] No.44532576{4}[source]▶

>>44530054 #

Postgres lets you write stored procedures out of the box in pgSQL, C, Tcl, Perl, and Python. There are also 3rd party extensions for most languages you might want, including Rust and JS.

More broadly, not knowing how to write SQL is a very solvable problem, and frankly anyone accessing an RDBMS as a regular part of their job should know it. Even if you’re always using an ORM, you should understand what it’s doing so you can understand the EXPLAIN output you’ll probably be looking at eventually.

replies(1): >>44532636 #

229. v5v3 ◴[11 Jul 25 14:37 UTC] No.44532636{5}[source]▶

>>44532576 #

>... and frankly anyone accessing an RDBMS as a regular part of their job should know it.

With entity framework code first, Microsoft made it possible for generations of developers to barely touch a database.

A lot of Devs have poor database skills nowadays.

Which suits the cloud sellers who want to push managed platforms

replies(1): >>44532852 #

230. sgarland ◴[11 Jul 25 14:41 UTC] No.44532672{5}[source]▶

>>44531409 #

FK Constraints on partitioned tables has been a solved problem for years for Postgres. MySQL still doesn’t support them, unfortunately.

For sharding, Vitess kind of supports them; Citus fully supports them.

You’re correct that they do impact performance to an extent, but as a counter argument, if your data is incorrect, it doesn’t matter how quickly you wrote it.

231. sgarland ◴[11 Jul 25 14:46 UTC] No.44532740{4}[source]▶

>>44529583 #

> Of course, this is sometimes abused and taken to extremes in a microservices architecture where each service has their own database and you end up with nastiness like data duplication and distributed locking.

Not to mention the difficulty in maintaining referential integrity with all of that duplicated data. There are various workarounds, but at that point it feels very much like we’re recreating a shared DB, but shittily, and netting zero benefits.

232. dathinab ◴[11 Jul 25 14:49 UTC] No.44532779{4}[source]▶

>>44529699 #

if you need transaction across a queue into a normal SQL DB or similar I believe you are doing something very wrong.

Sure you need transaction about processing things in a queue (mark as "taken out", but not yet remove then remove or "place back in (or into a failed messages inbox)" on timeout or similar can be _very_ important for queue systems.

But the moment the "fail save if something dies while processing a message" becomes a directly coupled with DB transactions you have created something very brittle and cumbersome.

To be fair that might still be the best solution for some situations.

But the better solution is to make sure you treat a queue as message passing system and handle messages as messages with the appropriate delivery semantics. And if you can't because idk. idempotency logic is supper unreliable then there indeed is a problem, but its not in the missing cross transactions but how you write that logic (missing ?_tooling_, strict code guidelines people actually comply with, interface regression checks, tests (including prop/fuzz tests, regression tests, integration/e2e tests etc., not just "dump" unit test)).

> just migrate to an Oracle Database.

In my experience while Oracle DB is very powerful but also very cumbersome in a lot of ways and if you need thing only they can provide you most likely already fucked up big time somewhere else in your design/architecture. Sure if you are at that point Oracle can lightly be the cheaper solution. But still preferable you never endup there.

As a side note, there are also a lot of decent plugins which can provide similar capabilities to PG, but they tend to have the issue that they aren't part of managed PG solutions and self managing PG (or most other reasonable powerful DB) can be a huge pain, and then yes Oracle can be a solution.

Still the amount of startups which had a overall good experience are in my experience overall non existing in my experience. (But there are some pretty big companies/projects I know of which have a overall good experience with Oracle.)

> constant stories of firefights

If you mean stories on HN, than that isn't a meaningful metric, you will only hear about the "interesting" stories which mostly are about fire fighting or "using pg for everything is grate" but rarely the majority of in-between stories and boring silent successes. If it's about stories from you carriers and asking dev friends you have what their experience is then it is more meaningful. But in a bubble (like this answer of mine is, without question, in a bubble).

Generally I think people really overestimate how representative HN is, idk. about the US but outside of it _huge_ parts of the IT industry are not represented by HN in any meaningful way. I would say in my country HN is _at most_ representative for 1/4 of the industry, but that 1/4 also contains many of the very very motivated software developers. But also very few of the "that my work but not my calling", "bread and butter" work software devs, which are often 1/3 to over 1/2 of devs in most countries as far as I can tell.

replies(2): >>44533391 #>>44535309 #

233. sgarland ◴[11 Jul 25 14:55 UTC] No.44532852{6}[source]▶

>>44532636 #

Agreed. What’s worse is when they confidently proclaim that they had to scale up N times “to handle the load,” but then a brief reading of of their schema and queries reveals that an RPi could probably handle it if they’d designed a better schema, and had a basic understanding of B+trees.

replies(1): >>44533364 #

234. dathinab ◴[11 Jul 25 15:01 UTC] No.44532925{4}[source]▶

>>44528918 #

I think, It's because the locking is part of the transaction commit locking, but yes it should be mentioned.

But it's oversimplified a case of "high queue load f* up the availability/timings for other DB operations" (and themself).

And thats a generic problem you have, even if just due to "generic CPU/WAL/disk load" if you put your queue into your DB even iff that specific lock would be somehow solved with some atomic concurrent algorithms or similar (not sure if that even is possible).

So in general make your storage db, and queue a different service (and you cache too), even if it uses the same kind of storage. (Through technically there are clever in-between solutions which run their own queue service but still use you DB for final storage but have a ton of caching, in memory locking etc. to remove a huge part of the load from the DB. )

235. GuinansEyebrows ◴[11 Jul 25 15:04 UTC] No.44532956{5}[source]▶

>>44526749 #

you make good points; i'm overcorrecting from past trigger abuses :)

236. cryptonector ◴[11 Jul 25 15:07 UTC] No.44532993[source]▶

>>44527817 #

I think PG could relax the ordering thing with NOTIFYs since... it seems a bit silly, but NOTIFYs already are unsafe to use because there is no authorization around channel access, so one might as well use change data capture (logical replication, basically) instead.

237. cryptonector ◴[11 Jul 25 15:18 UTC] No.44533144{3}[source]▶

>>44528216 #

You're reaching the wrong conclusion, probably because of confirmation bias. Certainly this LISTEN/NOTIFY problem does not lead to your conclusion, nor does it support it. After all if you were relying on LISTEN/NOTIFY you could instead rely on logical replication decoding / CDC instead. And heck, you could even have a client connected to the same database that uses logical decoding to pick up events worth NOTIFYing about and then does just that, but without burdening any other transactions.

238. brightball ◴[11 Jul 25 15:19 UTC] No.44533156{5}[source]▶

>>44529518 #

Agreed. I used triggers frequently for things like incrementing/decrementing count fields for dashboards because it's the only way to guarantee those numbers are correct while ensuring something in the application hasn't bypassed a callback or handler to modify the data.

You only need to cover three scenarios and it's very simple to implement. Recorded added +1, Record removed -1, Record moved +1 & -1.

If you have counts that are more complicated, it doesn't work but this solution easily beats semi-frequent COUNT queries.

239. cryptonector ◴[11 Jul 25 15:20 UTC] No.44533171{4}[source]▶

>>44528918 #

One can replace LISTEN/NOTIFY with logical replication / CDC. And it's funny because somehow, somewhere, PG must be serializing the writing of the WAL to some degree. So it's not clear to me why LISTEN/NOTIFY needs additional serialization. Perhaps PG should turn NOTIFY into INSERTs on a special table that a worker process watches and turns those inserts into notifies (and deletes the inserts).

240. cryptonector ◴[11 Jul 25 15:22 UTC] No.44533188{5}[source]▶

>>44528824 #

ORMs are crutches. You don't need them if you're able-bodied. Just ditch them. Just say no to ORMs.

replies(2): >>44537100 #>>44537469 #

241. brightball ◴[11 Jul 25 15:22 UTC] No.44533192{4}[source]▶

>>44528978 #

You know, this would be a great talk at the 2026 Carolina Code Conference...

replies(1): >>44542105 #

242. cryptonector ◴[11 Jul 25 15:23 UTC] No.44533203{4}[source]▶

>>44530978 #

Thank you for bringing some sanity into this discussion.

243. cryptonector ◴[11 Jul 25 15:28 UTC] No.44533277{5}[source]▶

>>44531409 #

FKs are nothing special. It's just more INSERTs/UPDATEs/DELETEs. If you can't have a few more DMLs in your transactions in your sharded DB then you've already got scaling problems.

Really, FKs are typically implemented internally by RDBMSes as TRIGGERs that do what you expect FKs to do, which means they really are nothing more than syntactic sugar.

244. mike_hearn ◴[11 Jul 25 15:29 UTC] No.44533280{5}[source]▶

>>44531375 #

Yes that's true but in good implementations you will want to surface to the recipient via some dashboard if delivery consistently fails. So at some point a message on the exception queue will want to update the db.

replies(1): >>44551338 #

245. aryav07 ◴[11 Jul 25 15:31 UTC] No.44533321[source]▶

>>44490510 (OP) #

Nice to know about this, good article.

246. v5v3 ◴[11 Jul 25 15:35 UTC] No.44533364{7}[source]▶

>>44532852 #

A lot of SQL consultants had/have a great job going into companies having issues and producing a report of the obvious!!

247. cryptonector ◴[11 Jul 25 15:36 UTC] No.44533379[source]▶

>>44530687 #

That's pretty cool.

IMO LISTEN/NOTIFY is badly designed as an interface to begin with because there is no way to enforce access controls (who can notify; who can listen) nor is there any way to enforce payload content type (e.g., JSON). It's very unlike SQL to not have a `CREATE CHANNEL` and `GRANT` commands for dealing with authorization to listen/notify.

If you have authz then the lack of payload content type constraints becomes more tolerable, but if you add a `CREATE CHANNEL` you might as well add something there regarding payload types, or you might as well just make it so it has to always be JSON.

With a `CREATE CHANNEL` PG could provide:

  - authz for listen
  - authz for notify
  - payload content type constraints
    (maybe always JSON if you CREATE
    the channel)
  - select different serialization
    semantics (to avoid this horrible,
    no good, very bad locking behavior)
  - backwards-compatibility for listen/
    notify on non-created channels

replies(1): >>44537258 #

248. imtringued ◴[11 Jul 25 15:38 UTC] No.44533391{5}[source]▶

>>44532779 #

>But the moment the "fail save if something dies while processing a message" becomes a directly coupled with DB transactions you have created something very brittle and cumbersome.

The standard workflow for processing something from a queue is to keep track of all the messages you have already processed in the transactional database and simply request the remaining unprocessed messages. Often this is as simple as storing the last successfully processed message ID in the database and updating it in the same transaction that has processed the message. If an error occurs you roll the transaction back, which also rolls back the last message ID. The consumer will automatically re-request the failed message on the next attempt, giving you out of the box idempotency for at least once messaging.

replies(2): >>44536978 #>>44569237 #

249. mattxxx ◴[11 Jul 25 15:38 UTC] No.44533394[source]▶

>>44490510 (OP) #

The article is good, but maybe a bit negative on the postgres feature. I think the article reads much better with the slant:

  "LISTEN/NOTIFY got us to this level of concurrency; here's how we diagnosed the performance cliff, and here's what we're doing now."

Which is like... cool, you were able to scale pretty far and create a lot of value before you needed to find a new solution.

250. cryptonector ◴[11 Jul 25 15:38 UTC] No.44533402[source]▶

>>44525560 #

Instead of LISTEN/NOTIFY you could listen to the wal / logical replication stream.

Or you could have a worker whose only job is to listen to the wal / logical replication stream and then NOTIFY. Being the only one to do so would not burden other transactions.

Or you could have a worker whose only job is to listen to the wal / logical replication stream and then publish on some non-PG pubsub system.

251. cryptonector ◴[11 Jul 25 15:42 UTC] No.44533458{3}[source]▶

>>44525747 #

`pg_logical_emit_message()` is great and better than `NOTIFY` in terms of how it works, but...

`pg_logical_emit_message()` perpetuates/continues the lack of authz around `NOTIFY`.

replies(1): >>44533495 #

252. williamdclt ◴[11 Jul 25 15:45 UTC] No.44533495{4}[source]▶

>>44533458 #

What do you mean by this? What authz would you expect/like?

replies(1): >>44535874 #

253. cryptonector ◴[11 Jul 25 15:50 UTC] No.44533577{5}[source]▶

>>44530027 #

On iPhones the input methods turn -- into —. If you see me using em-dashes it's cause I wrote on an iPhone. But I prefer -- to —.

replies(1): >>44537328 #

254. cryptonector ◴[11 Jul 25 15:52 UTC] No.44533606{3}[source]▶

>>44531159 #

Because if you have N connections in your pool you're going to have to execute LISTEN on all N, or else the connection pool needs to be LISTEN-aware so it can process async notifies by calling some registered callback.

I.e., the connection pool API has to be designed with this in mind.

For that matter connection pools also need to be designed with the ability to run code upon connecting to create TEMP schema elements because PG lacks GLOBAL TEMP.

255. cryptonector ◴[11 Jul 25 15:53 UTC] No.44533629[source]▶

>>44531352 #

And Debezium.

256. closeparen ◴[11 Jul 25 16:03 UTC] No.44533773{3}[source]▶

>>44531722 #

They aren't even questioning its use as a database, just as an event bus.

257. Matthias247 ◴[11 Jul 25 16:48 UTC] No.44534331[source]▶

>>44490510 (OP) #

Clarification question:

> When a NOTIFY query is issued during a transaction, it acquires a global lock on the entire database (ref) during the commit phase of the transaction, effectively serializing all commits.

It only serializes commits where NOTIFY was issued as part of the transaction, right? Transactions which did not call NOTIFY should not be affected?

258. qianli_cs ◴[11 Jul 25 16:54 UTC] No.44534419{4}[source]▶

>>44530722 #

We considered using WAL for change tracking in DBOS, but it requires careful setup and maintenance of replication slots, which may lead to unbounded disk growth if misconfigured. Since DBOS is designed to bolt onto users' existing Postgres instances (we don't manage their data), we chose a simpler, less intrusive approach that doesn't require a replication setup.

Plus, for queues, it's so much easier to leverage database constraints and transactions to implement global concurrency limit, rate limit, and deduplication.

259. jumski ◴[11 Jul 25 16:57 UTC] No.44534474{6}[source]▶

>>44531592 #

Valid point!

So pgflow is really agnostic and Postgres is it's fundamental dependency. All components are modular and ready to be adapted to other runtimes.

It's just that Supabase is what I use and I figured out this will be my first platform, but the abstraction to port to others is there!

260. gwbas1c ◴[11 Jul 25 17:08 UTC] No.44534614[source]▶

>>44490510 (OP) #

> our Postgres database

> tens of thousands of simultaneous writers

I'm surprised they aren't sharding at this scale. I wonder why?

261. gwbas1c ◴[11 Jul 25 17:15 UTC] No.44534706{3}[source]▶

>>44526003 #

... And working outside of the guarantee is harder, especially if you're in a "move fast and break things because we can fix it later" mode.

Anyway, the article indicates that the fix was very simple and primarily in the application layer. Makes me wonder if someone was getting "creative" when they used LISTEN/NOTIFY.

262. LgWoodenBadger ◴[11 Jul 25 17:26 UTC] No.44534841{4}[source]▶

>>44527509 #

Isn't this one of the things partitioning is meant to ameliorate? Either through partitions themselves, or through an appropriate partitioning strategy?

263. riku_iki ◴[11 Jul 25 17:28 UTC] No.44534858[source]▶

>>44527817 #

> However, it does help to contrast the much more common, "use Postgres for everything" type sentiment.

I think sentiment is to use "for everything in 99% business cases", which involves few 100GB of data with some thousands QPS, and could be handled by PG very well.

264. shivasaxena ◴[11 Jul 25 17:48 UTC] No.44535087{7}[source]▶

>>44527554 #

Yes, we already have all of our business logic in postgres functions(create_order, create_partial_payment etc).

Doing the extra work in stored procedures is noticeably faster than relying on triggers.

265. osigurdson ◴[11 Jul 25 17:52 UTC] No.44535131{4}[source]▶

>>44529699 #

>> but they should just migrate to an Oracle Database

No big tech companies or unicorn type startups are using Oracle. Is your claim that they are all wrong?

>> Some startup builds on Postgres then spends half their eng budget at the most critical growth time firefighting around its limits instead of scaling their business

This is why I suggest starting with some kind of normal queue / stream mechanism and columnar DB if needed. It isn't even harder than using one DB, particularly if you are using niche features.

replies(1): >>44540914 #

266. bhollis ◴[11 Jul 25 17:53 UTC] No.44535145[source]▶

>>44490510 (OP) #

The pattern I've always used for this, which I suspect is what they landed on, is to have an optimistic notification method in a separate message queue that says "something changed that's relevant to you". Then you can dedupe that, etc. Then structure the data to easily sync what's new, and let the client respond to that notification by calling the sync API. That even lets you use multiple notification methods for notification. None of that involves having to have the database coordinate notifications in the middle of a transaction.

267. notarobot123 ◴[11 Jul 25 18:08 UTC] No.44535273{3}[source]▶

>>44525593 #

It's funny how "shameless plug" actually means "excuse the self-promotion" and implies at least a little bit of shame even when the reference is appropriate and on-topic.

268. osigurdson ◴[11 Jul 25 18:11 UTC] No.44535309{5}[source]▶

>>44532779 #

>> To be fair that might still be the best solution for some situations.

It is arguable. Let's say your team knows Postgres well from a relational standpoint. Now they need to do something messages and require some type of notification / messaging. There is a learning curve here anyway. I'd argue they should spend it on more standard approaches which are not harder to start with. Of course, if you know that your site / service will only be used by yourself and your grandmother do whatever you want (just use a text file or better yet just call her instead).

269. nightpool ◴[11 Jul 25 18:41 UTC] No.44535609[source]▶

>>44525509 #

What about Heroku made Erlang clustering difficult? It's had the same DNS clustering feature that Fly has, and they've had it since 2017: https://devcenter.heroku.com/articles/dyno-dns-service-disco....

replies(1): >>44535732 #

270. sorentwo ◴[11 Jul 25 18:55 UTC] No.44535732{3}[source]▶

>>44535609 #

The problem was with restrictive connections, not DNS based discovery for clustering. It wasn't possible (as far as I'm aware) to connect directly from one dyno to another through tcp/udp.

replies(1): >>44535918 #

271. singron ◴[11 Jul 25 19:10 UTC] No.44535859{4}[source]▶

>>44530342 #

Unfortunately, updating the row also creates dead tuples. It's very tricky!

replies(1): >>44535896 #

272. cryptonector ◴[11 Jul 25 19:11 UTC] No.44535874{5}[source]▶

>>44533495 #

I'd like to say that only some roles can NOTIFY to some channels. Similarly for alternatives to LISTEN/NOTIFY.

replies(1): >>44541453 #

273. atombender ◴[11 Jul 25 19:14 UTC] No.44535896{5}[source]▶

>>44535859 #

It does, but because of how indexes work, I believe it won't be skewed by the presence of dead tuples (though the bloat can cause the live dat to be spread across a lot more blocks and therefore generate more I/O) as long as you run autoanalyze semi-regularly.

replies(1): >>44536811 #

274. nightpool ◴[11 Jul 25 19:16 UTC] No.44535918{4}[source]▶

>>44535732 #

That is not an issue when using Private Spaces, which have been available since 2015

275. panzi ◴[11 Jul 25 20:28 UTC] No.44536496{5}[source]▶

>>44529250 #

Back when I used Rails the sentiment was: You don't need foreign keys, this is all handled by ActiveRecord.

replies(1): >>44548491 #

276. panzi ◴[11 Jul 25 20:34 UTC] No.44536541{5}[source]▶

>>44528346 #

I see. Further I have used triggers to automatically populate log tables or aggregate statistics on write. Why do I need fast statistics? For API limits. Customers have N free operations per months and such, so I have to query that on every operation. Do you consider these things as business logic that don't belong in the database?

277. singron ◴[11 Jul 25 21:10 UTC] No.44536811{6}[source]▶

>>44535896 #

It depends on if you are getting Heap Only Tuples (HOT) updates or not. https://www.postgresql.org/docs/current/storage-hot.html

In this case, you might have enough dead tuples across your heap that you might get a lot of HOT updates. If you are processing in insertion order, you will also probably process in heap order, and you can actually get 0 HOT updates since the other tuples in the page aren't fully dead yet. You could try using a lower fillfactor to avoid this, but that's also bad for performance so it might not help.

replies(1): >>44537020 #

278. tracker1 ◴[11 Jul 25 21:34 UTC] No.44536978{6}[source]▶

>>44533391 #

My approach is to have fields for started/completed where started includes the system/process/timestamp of when an item was started... this gets marked as part of the process to tag and take the next item by the worker(s). It also allows for sweep and retry.

That said, I tend to reach for redis/rabbit or kafka relatively early depending on my specific needs and what's in use. Main use of a dbms queue historically is sending/tracking emails where the email service I had been using was having hiccups.

279. atombender ◴[11 Jul 25 21:39 UTC] No.44537020{7}[source]▶

>>44536811 #

If you have a "done" column that you filter on using a partial index, then it would never use HOT updates anyway, since HOT requires that none of the modified columns have an index.

replies(1): >>44537423 #

280. tracker1 ◴[11 Jul 25 21:49 UTC] No.44537100{6}[source]▶

>>44533188 #

I largely agree.. though data mapper libraries (such as Dapper for .Net) can be pretty helpful, even if there's a minor disconnect from the SQL used and the POCO/Record definitions used... It's far simpler than most ORMs and keeps you a little closer to the DB.

281. maxbond ◴[11 Jul 25 22:08 UTC] No.44537258{3}[source]▶

>>44533379 #

> there is no way to enforce access controls

(I thought this was a fun puzzle, so don't take this as advice or as disagreement with your point.)

There is the option to use functions with SECURITY DEFINER to hack around this, but the cleanest way to do it (in the current API) would be to encrypt your messages on the application side using an authenticated system (eg AES-GCM). You can then apply access control to the keys. (Compromised services could still snoop on when adjacent channels were in use, however.)

replies(1): >>44539921 #

282. tracker1 ◴[11 Jul 25 22:18 UTC] No.44537328{6}[source]▶

>>44533577 #

I've had it happen with various editors on the desktop as well. It's kind of annoying at times.

283. menthe ◴[11 Jul 25 22:30 UTC] No.44537423{8}[source]▶

>>44537020 #

False.

As of PG16, HOT updates are tolerated against summarizing indexes, such as BRIN.

https://www.postgresql.org/docs/16/storage-hot.html

Besides, you probably don't want "done" jobs in the same table as pending or retriable jobs - as you scale up, you likely want to archive them as it provides various operational advantages, at no cost.

replies(1): >>44537457 #

284. IgorPartola ◴[11 Jul 25 22:33 UTC] No.44537448{7}[source]▶

>>44532517 #

What I would prefer is integration at the model definition level. For example let’s say that I have a Customer model and an Order model. I don’t always want to pull in the customer fields when listing orders. Most ORMs would allow me to create a join and specify the field from Customer I want when fetching Orders but those joins add up quickly. I could denormalize the data and put things like the customer name and email onto each order but if the customer changes either value now the application code has to remember to update it. And yes I could put that in the model’s save() method but that is fragile too because what if someone else does run code that updates stuff at the raw SQL level and doesn’t include these updates.

Now if I could specify that I want Order.customer_name to come from a specific other model and be updated automatically the ORM could automatically create a trigger to update that field when the customer table is updated.

Obviously this is a very simplistic example but there are many more, including versioning and soft deletes that could be incredibly useful. But the key is that the ORM has to generate the code for the triggers and stored procedures. Doing that manually is possible now but (a) uses a different language even than regular SQL which not everyone is familiar with, and (b) there is no type checking for what you are doing. The ORM model definitions are the main source of truth about the shape of your database, so I want to use them as such.

replies(1): >>44539577 #

285. atombender ◴[11 Jul 25 22:34 UTC] No.44537457{9}[source]▶

>>44537423 #

Not false. Nobody would ever use BRIN for this. I'm talking about regular indexes, which do prevent HOT.

If you read my earlier comment properly, you'll notice a "done" column is to avoid deleting columns on the hot path and avoid dead tuples messing up the planner. I agree that a table should not contain done jobs, but then you risk running into the dead tuple problem. Both approaches are a compromise.

286. IgorPartola ◴[11 Jul 25 22:35 UTC] No.44537469{6}[source]▶

>>44533188 #

I used to think like this, but over the past decade and a half they have gotten a lot more performant and usable and the speed with which you can develop using them is just unmatched by writing raw SQL. Again, I say this as someone who used to be very much team just write SQL and even created a quasi-ORM that allowed me to write all the queries by hand but returned model instances that could have methods as a sort of in-between solution. I still routinely use raw SQL but only when it is actually necessary.

287. lytedev ◴[11 Jul 25 22:59 UTC] No.44537646{3}[source]▶

>>44531722 #

What is the first-best choice for a new project? SQLite?

replies(2): >>44539600 #>>44543187 #

288. WhyNotHugo ◴[12 Jul 25 00:58 UTC] No.44538362[source]▶

>>44530687 #

Thanks for attacking this issue (even if still in a research phase, that's definitely a needed start).

I'm amused at how op brags about the huge scale at which they operate, but instead of even considering fixing the issue (both for themselves and for others), they just switched to something else for pubsub.

289. osigurdson ◴[12 Jul 25 04:30 UTC] No.44539322{3}[source]▶

>>44532418 #

>> also same is true the other way around, depending on you application you can go redis only

Really the primary reason not to try stuff like this is (at least for me), feel that I won't paint myself into a corner with Postgres. I can always add a table here or a join there and things will work. If I need columnar, I use ClickHouse and NATS for messaging. I know these well but still gravitate toward Postgres because I feel it can grow in whatever direction is needed. However, it is true, I have thought about trying to just use NATS KV and make all services stateful receiving notifications when things change. It does seem that it could massively simplify some things but expect there could be some sharp edges in the face of unknown requirements. If one could just design for exactly the problem at hand it would be different but it never seems to work out like that.

290. osigurdson ◴[12 Jul 25 04:47 UTC] No.44539371{5}[source]▶

>>44532194 #

NATS has a bit more in terms of durability guarantees so I have found that it hits more use cases. I'm trying to strike a bit of a balance between "use the exact right tool for the job, even if that means you have 25 different services" and "just use Postgres". I do think Postgres/Redis/ClickHouse is probably fine as well but durable streaming would be hard to give up.

291. osigurdson ◴[12 Jul 25 04:52 UTC] No.44539388{5}[source]▶

>>44528275 #

Maybe not DB, but getting data from wherever it may be to the registers in the computer is certainly is the King of Kings.

292. osigurdson ◴[12 Jul 25 05:02 UTC] No.44539437{7}[source]▶

>>44531243 #

Its possible but of course slow because of https://gist.github.com/jboner/2841832. Data locality matters a lot. Moving data around the network when it doesn't really need to be moved is heresy (unless performance doesn't matter, then who cares). Remember the computer doesn't care about your religion which says only this can do this and only that can do that.

293. osigurdson ◴[12 Jul 25 05:11 UTC] No.44539483{8}[source]▶

>>44532398 #

>> deploy them because I understand my colleagues might throw me from a window

Maybe throw your colleagues out the window instead if they don't know what they are talking about. I'm not anti/pro SPROC at all, but I am anti anti-reality. People that don't understand the vast differences in latencies between in process and out of process work should not exist in the industry.

294. osigurdson ◴[12 Jul 25 05:20 UTC] No.44539523{6}[source]▶

>>44528834 #

>> I am mostly on the side of...

Generally customers don't care about religious views. Make understanding the actual machine and associated latencies your religion instead. The reason to write a stored proc or do some processing in the database is entirely about data locality, not to keep the drooling masses from messing things up. A library is fine for that.

295. osigurdson ◴[12 Jul 25 05:32 UTC] No.44539577{8}[source]▶

>>44537448 #

>> I don’t always want to pull in the customer fields when listing orders

Data locality is king. Everything comes down to physical things such as blocks on the SSD, network interconnect, RAM, L3, L2, L1 cache and registers. Are those customer fields in the same page as whatever else you need? If so, most of the work is already done. Yes, you can save some network bandwidth transferring things that aren't needed but does it matter? It might but it might not. The key is to know what matters and reason about things from the perspective of the machines actually doing the work.

296. osigurdson ◴[12 Jul 25 05:38 UTC] No.44539600{4}[source]▶

>>44537646 #

No, generally Postgres, just not for everything. If you understand the tradeoffs SQLite can be fine. Once you have more than one service (even just for HA) SQLite means doing kind of crazy things like using NFS in your infra. If you know you will only have one service and can bind it to an EBS like volume it is totally fine.

297. cryptonector ◴[12 Jul 25 06:52 UTC] No.44539921{4}[source]▶

>>44537258 #

Yes, I've thought about this too, but it's annoying to have to resort to that, no?

replies(1): >>44540005 #

298. maxbond ◴[12 Jul 25 07:18 UTC] No.44540005{5}[source]▶

>>44539921 #

Absolutely, Postgres is fantastic but LISTEN/NOTIFY is it's weakest feature. It's convenient, it has the potential to open up compelling use cases, it very nearly works, but has all these nasty limitations and rough edges that cause people to steer clear. I think a lot of people don't know it exists, you almost never hear it mentioned in discussions about async job queues in Postgres (which would seem like and obvious use case). I don't think it's ever been mentioned on the Postgres.FM podcast (I'm sure they're aware of it but it speaks to the lack of usage). I'd love to see it get some love in future releases, and I agree that access control is necessary for it to really work.

299. fmajid ◴[12 Jul 25 07:56 UTC] No.44540184[source]▶

>>44490510 (OP) #

Listen/Notify is potentially lossy and should not be used. At one of my previous companies we used it for cache invalidation (a trigger on tables would sent notify messages to invalidation Redis keys for potentially affected cache entries). We ended up ripping it out and replacing it with NSQ.io as a reliable transport.

300. mike_hearn ◴[12 Jul 25 10:21 UTC] No.44540914{5}[source]▶

>>44535131 #

Big tech companies do use it. Apple was advertising for Oracle DBA roles just last month. And consider that Amazon had to staff a massive multi-year project to migrate off it to their own in-house DB, which they only did because they had become a competitor.

W.R.T. unicorn type startups; yes, my argument is that they are all wrong and should be using a different database. There's competitive advantage to be had there.

301. williamdclt ◴[12 Jul 25 11:59 UTC] No.44541453{6}[source]▶

>>44535874 #

Right. It’s not something I’ve had to handle, I’ve always worked in environments where db clients are well behaved and under my control, what’s your use case out of interest?

replies(1): >>44546833 #

302. gunnarmorling ◴[12 Jul 25 13:55 UTC] No.44542105{5}[source]▶

>>44533192 #

Ha, that's interesting :) Do you have any more details to that one?

303. saltcured ◴[12 Jul 25 15:41 UTC] No.44542829{7}[source]▶

>>44528147 #

I'd say that the most reliable way is to use some mutable lifecycle metadata other than times to identify work. An indexed query will find the "new and unclaimed" work items and process them, regardless of their potentially backdated temporal metadata.

Updates of the lifecycle properties can also help coordinate multiple pollers so that they never work on the same item, but they can have overlapping query terms so that each poller is capable of picking up a particular item in the absence of others getting there first.

You also need some kind of lease/timeout policy to recognize orphaned items. I.e. claimed in the DB but not making progress. Workers can and should have exception handling and compensating updates to report failures and put items "back in the queue", but worst case this update may be missing. Some process, or even some human operator, needs to eventually compensate on behalf of the AWOL worker.

In my view, you always need this kind of table-scanning logic, even if using something like AMQP for work dispatch. You get in trouble when you fool yourself into imagining "exactly once" semantics actually exists. The message-passing layer could opportunistically scale out the workload, but a relational backstop can make sure that the real system of record is coherent and reflecting the business goals. Sometimes, you can just run this relational layer as the main work scheduler and skip the whole message-passing build-out.

replies(1): >>44547076 #

304. jelder ◴[12 Jul 25 16:37 UTC] No.44543187{4}[source]▶

>>44537646 #

That’s my point, there is no best-first choice for everything. There will always be trade-offs. But Postgres makes the right trade-offs to be good enough in almost every scenario.

305. infogulch ◴[12 Jul 25 19:23 UTC] No.44544402[source]▶

>>44530687 #

Cool! This article and thread has already been referenced on the mailing list, maybe its worth mentioning this benchmark and experiment.

https://www.postgresql.org/message-id/flat/CAM527d_s8coiXDA4...

https://www.postgresql.org/message-id/flat/175222328116.3157...

306. cryptonector ◴[13 Jul 25 01:52 UTC] No.44546833{7}[source]▶

>>44541453 #

Security in depth. If I have to give someone login access, I should be able to control what they do.

307. odie5533 ◴[13 Jul 25 02:42 UTC] No.44547068[source]▶

>>44531732 #

I wonder who works on centrifugo. Could be anyone.

308. valenterry ◴[13 Jul 25 02:44 UTC] No.44547076{8}[source]▶

>>44542829 #

The problem is that you now have to poll based on an index (maybe BRIN isn't too bad though) and you have to overwrite the row afterwards and update the index. That means you are creating a dead tuple for every row (and one more if you mark it to be "completed").

replies(1): >>44552879 #

309. Lio ◴[13 Jul 25 08:13 UTC] No.44548491{6}[source]▶

>>44536496 #

I’m guessing that must have been the 00s then. I haven’t seen anyone downplay database constraints for a very long time.

replies(1): >>44561988 #

310. wordofx ◴[13 Jul 25 10:34 UTC] No.44549203{4}[source]▶

>>44527522 #

If you’re going to use a tool. Make an attempt to use it properly. If you do dumb things. Dumb things will happen. Example. This blog post. This is not a problem for everyone using LISTEN/NOTIFY. It’s a problem with not knowing how to use it, then spreading bad info.

311. konsalexee ◴[13 Jul 25 10:50 UTC] No.44549312[source]▶

>>44527057 #

I think the title is stating this: "Postgres LISTEN/NOTIFY does not scale"

That means for moderate cases you do not even have to care about this. 99% of PostgreSQL instances out there are not big "scale".

As a sr. engineer is your responsibility to make a decision if you will build for "scale" from day zero or ignore this as you are mindful that this will not affect you until a certain point.

312. immibis ◴[13 Jul 25 14:33 UTC] No.44550728[source]▶

>>44526073 #

RDBMS (you spelled it wrong) are good for many things. Postgres is a veritable swiss army knife - have a look through the manual, scroll through all the features you don't care about, be amazed it has so many.

RDBMS are an old fogey tool. It takes a really old fogey to suggest storing records at fixed byte intervals directly on the disk - is that your proposed alternative? Or perhaps you grew up in the microservices era and that's already become old fogey.

313. immibis ◴[13 Jul 25 14:34 UTC] No.44550735{4}[source]▶

>>44527908 #

There is under-engineering, and over-engineering. Like the elusive Ballmer peak, somewhere in the middle is good engineering. Nobody's ever found that point.

314. immibis ◴[13 Jul 25 14:41 UTC] No.44550782[source]▶

>>44525875 #

Transactional databases are great, provided your write workload is low enough to fit on one server. If you have to scale up past that, you might have to use a different kind of database. But if transactions work for you, as they do for 99% of small-medium sites, they're amazing.

Multi-master transactional databases are an open area of research, as far as I'm aware, but read-only replication is a solved problem. Therefore your write traffic, including your transaction overhead, has to fit within one server's capacity, while your read traffic can scale horizontally as much as you like.

315. ownagefool ◴[13 Jul 25 16:00 UTC] No.44551338{6}[source]▶

>>44533280 #

Whilst true, it probably doesn't need ACID / strong consistency.

Don't get me wrong. Correctness is a great default, much easier to reason about.

replies(1): >>44569210 #

316. saltcured ◴[13 Jul 25 19:31 UTC] No.44552879{9}[source]▶

>>44547076 #

Yes, everything is tradeoffs.

When trying to make good use of RDMBS transactional semantics, I think an important mental shift is to think of there being multiple async processing domains rather than a single magical transaction space. DB transactions are just communication events, not actual business work. This is how the relational DB can become the message broker.

The agents need to do something akin to 2-phase commit protocols to record their "intent" and their "result" across different business resources. But, for a failure-prone, web style network of agents, I would not expose actual DB 2-phase commit protocols. Instead, the relational model reifies the 2-phase-like state ambiguity of particular business resources as tuples, and the agents communicate important phases of their work process with simpler state update transactions.

It's basically the same pattern as with safe use of AMQP, just replacing one queue primitive with another. Both approaches require delayed acknowledgement patterns, so tasks can be routed to an agent but not removed from the system until after the agent reports the work complete. Either approach has an lost or orphaned task hazard if naively written to dequeue tasks earlier in the work process. An advantage of the RDBMS-based message broker is that you can use also use SQL to supervise all the lifecycle state, or even intervene to clean up after agent failures.

In this approach, don't scale-up a central RDMBS by disabling all its useful features in a mad dash for speed. Instead, think of the network of async agents (human or machine) and RDMBS message broker(s) to make for their respective traffic. This agent network and communication workload can often be partitioned to reach scaling goals. E.g. specific business resources might go into different "home" zones with distinct queues and agent pools. Their different lifecycle states do not need to exist under a single, common transaction control.

317. incoming1211 ◴[13 Jul 25 19:55 UTC] No.44553126{4}[source]▶

>>44527793 #

Most systems are based on the number of users performing operations on the application. Majority of people on HN never work on anything with more than 100k users, yet they introduce mountains of infrastructure and blame cloud for being expensive when they never needed that infrastructure to begin with.

318. dumbfounder ◴[14 Jul 25 13:59 UTC] No.44560364{3}[source]▶

>>44525903 #

The article is about scaling issues with tons of writes, which I referenced by saying “tons of writes”. Yes, if there is no reason to scale it up you can just use a database. But that’s not the context.

319. panzi ◴[14 Jul 25 16:24 UTC] No.44561988{7}[source]▶

>>44548491 #

Sometime between 2005 and 2012 I think. It definitely was for a stretch of time where Rails people said that.

320. mike_hearn ◴[15 Jul 25 08:43 UTC] No.44569210{7}[source]▶

>>44551338 #

You're right, I think we're in agreement.

321. mike_hearn ◴[15 Jul 25 08:49 UTC] No.44569237{6}[source]▶

>>44533391 #

That approach doesn't parallelize, it's the kind of thing I mean when I say that custom idempotency logic is tricky. But yes, you can set things up so the queue is outside the database. It's just more convenient not to.

↑