←back to thread

12 points datastack | 1 comments | | HN request time: 0.204s | source

I recently came up with a backup strategy that seems so simple I assume it must already exist — but I haven’t seen it in any mainstream tools.

The idea is:

The latest backup (timestamped) always contains a full copy of the current source state.

Any previous backups are stored as deltas: files that were deleted or modified compared to the next (newer) version.

There are no version numbers — just timestamps. New versions can be inserted naturally.

Each time you back up:

1. Compare the current source with the latest backup.

2. For files that changed or were deleted: move them into a new delta folder (timestamped).

3. For new/changed files: copy them into the latest snapshot folder (only as needed).

4. Optionally rotate old deltas to keep history manageable.

This means:

The latest backup is always a usable full snapshot (fast restore).

Previous versions can be reconstructed by applying reverse deltas.

If the source is intact, the system self-heals: corrupted backups are replaced on the next run.

Only one full copy is needed, like a versioned rsync mirror.

As time goes by, losing old versions is low-impact.

It's user friendly since the latest backup can be browsed through with regular file explorers.

Example:

Initial backup:

latest/ a.txt # "Hello" b.txt # "World"

Next day, a.txt is changed and b.txt is deleted:

latest/ a.txt # "Hi" backup-2024-06-27T14:00:00/ a.txt # "Hello" b.txt # "World"

The newest version is always in latest/, and previous versions can be reconstructed by applying the deltas in reverse.

I'm curious: has this been done before under another name? Are there edge cases I’m overlooking that make it impractical in real-world tools?

Would love your thoughts.

Show context
rawgabbit ◴[] No.44409907[source]
What happens if in the process of all this read write rewrite, data is corrupted?
replies(1): >>44410779 #
1. datastack ◴[] No.44410779[source]
In this algo nothing is rewritten. A diff between source and latest is made, the changed or deleted files archives to a folder and the latest folder updated with source, like r sync. No more IO than any other backup tool. Versions other than the last one are never touched again