←back to thread

12 points datastack | 1 comments | | HN request time: 0.2s | source

I recently came up with a backup strategy that seems so simple I assume it must already exist — but I haven’t seen it in any mainstream tools.

The idea is:

The latest backup (timestamped) always contains a full copy of the current source state.

Any previous backups are stored as deltas: files that were deleted or modified compared to the next (newer) version.

There are no version numbers — just timestamps. New versions can be inserted naturally.

Each time you back up:

1. Compare the current source with the latest backup.

2. For files that changed or were deleted: move them into a new delta folder (timestamped).

3. For new/changed files: copy them into the latest snapshot folder (only as needed).

4. Optionally rotate old deltas to keep history manageable.

This means:

The latest backup is always a usable full snapshot (fast restore).

Previous versions can be reconstructed by applying reverse deltas.

If the source is intact, the system self-heals: corrupted backups are replaced on the next run.

Only one full copy is needed, like a versioned rsync mirror.

As time goes by, losing old versions is low-impact.

It's user friendly since the latest backup can be browsed through with regular file explorers.

Example:

Initial backup:

latest/ a.txt # "Hello" b.txt # "World"

Next day, a.txt is changed and b.txt is deleted:

latest/ a.txt # "Hi" backup-2024-06-27T14:00:00/ a.txt # "Hello" b.txt # "World"

The newest version is always in latest/, and previous versions can be reconstructed by applying the deltas in reverse.

I'm curious: has this been done before under another name? Are there edge cases I’m overlooking that make it impractical in real-world tools?

Would love your thoughts.

Show context
vrighter ◴[] No.44410887[source]
I used to work on backup software. Our first version did exactly that. It was a selling point. We later switched approach to a deduplication based one.
replies(1): >>44411130 #
datastack ◴[] No.44411130[source]
Exciting!

Yes, the deduplicated approach is superior, if you can accept requiring dedicated software to read the data or can rely on a file system that supports it (like Unix with hard links).

I'm looking for a cross-platform solution that is simple and can restore files without any app (in case I didn't maintain my app for the next twenty years).

I'm curious if the software you were working on used proprietary format, was relying on Linux, or used some other method of duplication.

replies(1): >>44443008 #
1. vrighter ◴[] No.44443008[source]
The deduplication in the product I worked on was implemented by me and a colleague of mine, in a custom format. The point of it was to do inline deduplication on a best-effort basis. I.e. handling the case where the system does NOT have enough memory to store hashes for every single block. This might have resulted in some duplicated data if you didn't have enough memory, instead of slowed down to a crawl by hitting the disk (spinning rust, at the time) for each block we wanted to deduplicate.