MVCC – the part of PostgreSQL we hate the most (2023)

(www.cs.cmu.edu)

382 points virtualwhys | 3 comments | 20 Oct 24 15:30 UTC | HN request time: 0.645s | source

Show context

jongjong ◴[20 Oct 24 21:16 UTC] No.41898538[source]▶

For most cases, MVCC sounds like over-engineering. From the problem description:

> The goal of MVCC in a DBMS is to allow multiple queries to read and write to the database simultaneously without interfering with each other when possible.

How is that a problem for most use cases?

If there is a read query which is taking a long time, with many rows, and some of these later rows happen to be updated mid-read but the earlier rows are not... It's not really a problem for the vast majority of application. Why is it better for all rows to be delivered out of date versus just the first half fetched being out of date? It's not ideal in either case but it's unavoidable that some requests can sometimes return out of date data. It seems like a tiny advantage.

I suspect the real need to implement MVCC arose out of the desire for databases like Postgres to implement atomic transactions as a magical black box.

IMO, two-phase commit is a simpler solution to this problem. It's not possible to fully hide concurrency complexity from the user; it ends up with tradeoffs.

replies(1): >>41898609 #

kccqzy ◴[20 Oct 24 21:26 UTC] No.41898609[source]▶

>>41898538 #

One person's over engineering is another person's essential feature. I personally like the fact that Postgres supports the serializable isolation level that simplifies application programming.

> It's not really a problem for the vast majority of application.

This is true, but I don't even want to think about when it is indeed not really a problem and in the few cases when it is a problem.

replies(1): >>41898746 #

1. magicalhippo ◴[20 Oct 24 21:46 UTC] No.41898746[source]▶

>>41898609 #

> I personally like the fact that Postgres supports the serializable isolation level that simplifies application programming.

Not sure how PG implements it, but I tried it in a case where I did need it in SQLAnywhere, and only found out a bit too late that while the docs stated it was very detrimental to performance, the docs didn't explicitly say why, and it was much worse than I had assumed.

I assumed it meant the transaction would lock the table, do it's thing and release on commit/rollback. And of course, that would hurt performance a lot if there was high contention. But no, that's not what it did. It was much, much worse.

Instead of taking a lock on the whole table, it locked all the rows. Which went as swimmingly as you could expect on a table with thousands upon thousands of rows.

Not sure why they did it this way, but yeah had to ditch that and went with the good old retry loop.

replies(1): >>41900778 #

2. sitharus ◴[21 Oct 24 04:51 UTC] No.41900778[source]▶

>>41898746 (TP) #

One of the best things about postgresql is the documentation. They document not only the features, but the constraints and considerations for using it and why they exist.

So from reading https://www.postgresql.org/docs/17/transaction-iso.html#XACT... we can tell that using serializable transactions only locks data actually used.

replies(1): >>41901173 #

3. magicalhippo ◴[21 Oct 24 06:20 UTC] No.41901173[source]▶

>>41900778 #

Yea that's much, much better. I also note that it goes for the retry instead of locking.

↑