←back to thread

382 points virtualwhys | 3 comments | | HN request time: 0.606s | source
Show context
dfox ◴[] No.41898326[source]
> Oracle and MySQL do not have this problem in their MVCC implementation because their secondary indexes do not store the physical addresses of new versions. Instead, they store a logical identifier (e.g., tuple id, primary key) that the DBMS then uses to look up the current version’s physical address. Now this may make secondary index reads slower since the DBMS has to resolve a logical identifier, but these DBMS have other advantages in their MVCC implementation to reduce overhead.

Interesting behavior of MySQL that I have observed (~500GB database, with a schema that is more of an document oriented than relational) is that when you update single row doing SELECT id WHERE something; UPDATE what WHERE id=id is orders of magnitudes faster than UPDATE what WHERE something. I somehow suspect that this is the reason for this behavior. But well, the normal workload will not do that and this only slows down ad-hoc DML when you fix some inconsistency.

replies(2): >>41898716 #>>41902118 #
1. fforflo ◴[] No.41902118[source]
I have a couple of read-heavy >2TB Postgres instances, document-oriented too. You're right that bulk updates can be too slow. Too many times I end up doing the updates incremental (in batches) or even use COPY.
replies(1): >>41903043 #
2. andruby ◴[] No.41903043[source]
You also want to avoid long transactions to avoid lock contention. Every statement is also a transaction, so chunking it up helps a lot on busy databases.
replies(1): >>41904988 #
3. atombender ◴[] No.41904988[source]
Avoiding long transactions is also about preventing the transaction from holding back vacuuming. Postgres will not vacuum tuples that are still visible to old transactions (visible as the backend_xmin in the pg_stat_activity table).

Long transactions can also cause surprising locks, because many locks taken persist to the end of the transaction, even if the transaction is no longer doing anything. This can block DDL operations as well as things like REINDEX.