MVCC – the part of PostgreSQL we hate the most (2023)

(www.cs.cmu.edu)

Show context

nightfly ◴[20 Oct 24 18:26 UTC] No.41897421[source]▶

> MySQL and Oracle store a compact delta between the new and current versions (think of it like a git diff).

Doesn't git famously _not_ store diffs and instead follows the same storage pattern postgres uses here and stores the full new and old objects?

replies(6): >>41897457 #>>41897486 #>>41897759 #>>41897885 #>>41899164 #>>41899189 #

jmholla ◴[20 Oct 24 18:31 UTC] No.41897457[source]▶

>>41897421 #

That is correct. Each version of a file is a separate blob. There is some compression done by packing to make cloning faster, but the raw for git works with is these blobs.

replies(2): >>41897535 #>>41898446 #

1. simonw ◴[20 Oct 24 19:15 UTC] No.41897771[source]▶

>>41897535 (TP) #

Saying "that's incorrect" is a lot more productive than saying "that's a lie".

Calling something a lie implies that the incorrect information was deliberate.

2. ori_b ◴[20 Oct 24 19:21 UTC] No.41897810[source]▶

>>41897535 (TP) #

Git does both. When you create a commit, it stores a full (zipped) copy of the object, without any deltas.

Periodically (I believe it used to be every thousand commits, though I'm not sure what the heuristic is today), git will take the loose objects and compress them into a pack.

The full blob format is how objects are manipulated by git internally: to do anything useful, the objects need to be extracted from the blob, with all deltas applied, before anything can be done with them.

It's also worth nothing that accessing a deltified object is slow (O(n) in the number of deltas), so the length of the delta chain is limited. Because deltification is really just a compression format, it doesn't matter how or where the deltas are done -- the trivial "no deltas" option will work just fine if you want to implement that.

You can trivially verify this by creating commits and looking in '.git/objects/*' for loose objects, running 'git repack', and then looking in '.git/objects/pack' for the deltified packs.

3. haradion ◴[20 Oct 24 19:32 UTC] No.41897887[source]▶

>>41897535 (TP) #

The file contents are logically distinct blobs. Packfiles will aggregate and delta-compress similar blobs, but that's all at a lower level than the logical model.

replies(1): >>41902053 #

4. arp242 ◴[20 Oct 24 22:21 UTC] No.41898972[source]▶

>>41897535 (TP) #

Sjeez, tone it down. People can be incorrect without lying.

5. thaumasiotes ◴[21 Oct 24 08:57 UTC] No.41902053[source]▶

>>41897887 #

Is that relevant to something? The logical model is identical for every source control system. Deltas are a form of compression for storage in every source control system.

replies(1): >>41904694 #

6. haradion ◴[21 Oct 24 14:41 UTC] No.41904694{3}[source]▶

>>41902053 #

> The logical model is identical for every source control system.

Most source control systems have some common logical concepts (e.g. files and directories), but there's actually significant divergence between their logical models. For instance:

- Classic Perforce (as opposed to Perforce Streams) has a branching model that's very different from Git's; "branches" are basically just directories, and branching/merging is tracked on a per-file basis rather than a per-commit basis. It also tracks revisions by an incrementing ID rather than hashes. - Darcs and Pijul represent the history of a file as an unordered set of patches; a "branch" is basically just a set of patches to apply to the file's initial (empty) state.

All of that is above the physical state, which also differs:

- Perforce servers track files' revision histories in a directory hierarchy that mirrors the repository's file structure rather than building a pseudo-directory hierarchy over a flat object store. - Fossil stores everything in an SQLite database.

> Is that relevant to something?

Yes. You can use a VCS reasonably effectively if you understand its logical model but not its physical storage model. It doesn't work so well the other way around.

↑