MVCC – the part of PostgreSQL we hate the most (2023)

(www.cs.cmu.edu)

382 points virtualwhys | 4 comments | 20 Oct 24 15:30 UTC | HN request time: 0.637s | source

Show context

dfox ◴[20 Oct 24 20:47 UTC] No.41898326[source]▶

> Oracle and MySQL do not have this problem in their MVCC implementation because their secondary indexes do not store the physical addresses of new versions. Instead, they store a logical identifier (e.g., tuple id, primary key) that the DBMS then uses to look up the current version’s physical address. Now this may make secondary index reads slower since the DBMS has to resolve a logical identifier, but these DBMS have other advantages in their MVCC implementation to reduce overhead.

Interesting behavior of MySQL that I have observed (~500GB database, with a schema that is more of an document oriented than relational) is that when you update single row doing SELECT id WHERE something; UPDATE what WHERE id=id is orders of magnitudes faster than UPDATE what WHERE something. I somehow suspect that this is the reason for this behavior. But well, the normal workload will not do that and this only slows down ad-hoc DML when you fix some inconsistency.

replies(2): >>41898716 #>>41902118 #

whazor ◴[20 Oct 24 21:40 UTC] No.41898716[source]▶

>>41898326 #

A SELECT is a readonly operation and can be performed in parallel. However, an UPDATE actually writes and might lock the table. Whereas UPDATE id=id allows for row level locking. There is also the risk of missing newly inserted records between the SELECT and the UPDATE.

replies(1): >>41898733 #

1. nine_k ◴[20 Oct 24 21:43 UTC] No.41898733[source]▶

>>41898716 #

SELECT FOR UPDATE was invented to address this,

replies(2): >>41898851 #>>41905546 #

2. fipar ◴[20 Oct 24 22:03 UTC] No.41898851[source]▶

>>41898733 (TP) #

Or just select + update in a transaction, which with IIRC, with the default isolation level will use optimistic locking for the select part, unlike select for update.

replies(1): >>41910708 #

3. ◴[21 Oct 24 16:03 UTC] No.41905546[source]▶

>>41898733 (TP) #

4. nuttingd ◴[22 Oct 24 02:45 UTC] No.41910708[source]▶

>>41898851 #

You would need to use serializable isolation for this to hold true. Any isolation level less than serializable will use the snapshot that was active at the time of the select.

In Postgres, even with the serializable isolation level, all transactions that touch the same rows must also be using the serializable isolation level or it's not really enforced. This is one aspect of serializable isolation in Postgres that seemed like a major gotcha for real world application development. There's no future proof solution: new code can be added that doesn't use the serializable isolation, and then the assumptions of isolation from the earlier code are no longer valid.

FOR UPDATE is the only real solution in my eyes

↑