Show HN: Open source, logical multi-master PostgreSQL replication

(github.com)

131 points pgedge_postgres | 5 comments | 09 Oct 25 22:53 UTC | HN request time: 0.851s | source

Show context

sgarland ◴[10 Oct 25 00:33 UTC] No.45534417[source]▶

>>45533870 (OP) #

You do not want multi-master. If you think you do, think again.

Source: I have operated a large multi-master Postgres cluster.

replies(4): >>45534725 #>>45534761 #>>45535023 #>>45535213 #

1. pgedge_postgres ◴[10 Oct 25 03:48 UTC] No.45535213[source]▶

>>45534417 #

There's a lot of ways to approach the common problems found when running multi-master / active-active PostgreSQL. (A complete guide on this, specifically using PostgreSQL in general, was written by one of our solutions engineers, Shaun Thomas: https://www.pgedge.com/blog/living-on-the-edge)

Could you elaborate on what problems you experienced?

replies(1): >>45538499 #

2. sgarland ◴[10 Oct 25 12:59 UTC] No.45538499[source]▶

>>45535213 (TP) #

To clarify, I was working with 2nd Quadrant BDR (now Enterprise DB Postgres Distributed), running on some extremely large EC2 instances, in a global mesh - IIRC, five regions. Also in fairness, EDB told us that we were the largest mesh they had seen, and so we frequently ran into fun edge cases.

Each node had N replicas running vanilla Postgres attached, which were on EC2s with node-local NVMe drives for higher performance. This was absolutely necessary for the application. There were also a smattering of Aurora Postgres instances attached, which the data folk used for analytics.

In no particular order:

* DDL is a nightmare. BDR by default will replicate DDL statements across the mesh, but the locking characteristics combined with the latency between `ap-southeast-2` and `us-east-1` (for example) meant that we couldn't use it; thus, we had to execute it separately on each node. Also, since the attached Aurora instances were blissfully unaware of anything but themselves, for any table-level operations (e.g. adding a column), we had to execute it on those first, lest we start building up WAL at an uncomfortable pace due to replication errors.

* You know how it's common to run without FK constraints, because "scalability," etc.? Imagine the fun of having devs manage referential integrity combined with eventual consistency across a global mesh.

* Things like maximum network throughput start to become concerns. Tbf, this is more due to modern development's tendency to use JSON everywhere, and to have heavily denormalized tables, but it's magnified by the need to have those changes replicated globally.

* Hiring is _hard_. I can already hear people saying, "well, you were running on bare EC2s," and sure, that requires Linux administration knowledge as a baseline - I promise you, that's a benefit. To effectively manage a multi-master RDBMS cluster, you need to know how to expertly administrate and troubleshoot the RDBMS itself, and to fully understand the implications and effects of some of those settings, you need to have a good handle on Linux. You're also almost certainly going to be doing some kernel parameter tuning. Plus, in the modern tech world, infra is declared in IaC, so you need to understand Terraform, etc. You're probably going to be writing various scripts, so you need to know shell and Python.

There were probably more, but those are the main ones that come to mind.

replies(2): >>45539029 #>>45540532 #

3. asah ◴[10 Oct 25 13:49 UTC] No.45539029[source]▶

>>45538499 #

"DDL is a nightmare"

Can I ask more about this? I assume you created a procedure around making DDL changes to the global cluster... what was that procedure like? what tools did you use (create) to automate/script this? what failure modes did it encounter?

replies(1): >>45539707 #

4. sgarland ◴[10 Oct 25 14:48 UTC] No.45539707{3}[source]▶

>>45539029 #

Bold of you to assume it was automated. The process I used was tmux with pane synchronization.

I asked to automate it (probably would've just been a shell script, _maybe_ Python, issuing SQL commands to stdin), but people were afraid of unknown unknowns.

5. bonesmoses ◴[10 Oct 25 16:08 UTC] No.45540532[source]▶

>>45538499 #

I don't recall which customer you may have been, but the standard solution to that specific DDL issue with BDR is to use Stream Triggers to enable row versioning. One of the 2ndQuadrant customers used it extensively for multi-region cross-version app schema migrations that could last for months.

Essentially what that boils down to is you create stream triggers that intercept the logical stream and modify it to fit the column orientation by version. So during the transition, the triggers would be deployed to specific nodes while modifications are rolled out. Once everything was on the new version, triggers were all dropped until the next migration.

Spock doesn't have anything like that _yet_, but as you observed, being unable to use DDL replication significantly increases complexity, and tmux is a poor substitute.

↑