←back to thread

366 points virtualwhys | 1 comments | | HN request time: 0.455s | source
Show context
nightfly ◴[] No.41897421[source]
> MySQL and Oracle store a compact delta between the new and current versions (think of it like a git diff).

Doesn't git famously _not_ store diffs and instead follows the same storage pattern postgres uses here and stores the full new and old objects?

replies(6): >>41897457 #>>41897486 #>>41897759 #>>41897885 #>>41899164 #>>41899189 #
jmholla ◴[] No.41897457[source]
That is correct. Each version of a file is a separate blob. There is some compression done by packing to make cloning faster, but the raw for git works with is these blobs.
replies(2): >>41897535 #>>41898446 #
thaumasiotes[dead post] ◴[] No.41897535[source]
[flagged]
1. ori_b ◴[] No.41897810[source]
Git does both. When you create a commit, it stores a full (zipped) copy of the object, without any deltas.

Periodically (I believe it used to be every thousand commits, though I'm not sure what the heuristic is today), git will take the loose objects and compress them into a pack.

The full blob format is how objects are manipulated by git internally: to do anything useful, the objects need to be extracted from the blob, with all deltas applied, before anything can be done with them.

It's also worth nothing that accessing a deltified object is slow (O(n) in the number of deltas), so the length of the delta chain is limited. Because deltification is really just a compression format, it doesn't matter how or where the deltas are done -- the trivial "no deltas" option will work just fine if you want to implement that.

You can trivially verify this by creating commits and looking in '.git/objects/*' for loose objects, running 'git repack', and then looking in '.git/objects/pack' for the deltified packs.