An intro to DeepSeek's distributed file system

(maknee.github.io)

623 points sebg | 2 comments | 17 Apr 25 12:50 UTC | HN request time: 0.417s | source

Show context

stapedium ◴[17 Apr 25 14:38 UTC] No.43717547[source]▶

I’m just a small business & homelab guy, so I’ll probably never use one of these big distributed file systems. But when people start talking petabytes, I always wonder if these things are actually backed up and what you use for backup and recovery?

replies(5): >>43717690 #>>43718697 #>>43720813 #>>43724292 #>>43726423 #

ted_dunning ◴[17 Apr 25 19:00 UTC] No.43720813[source]▶

>>43717547 #

It is common for the backup of these systems to be a secondary data center.

Remember that there are two purposes for backup. One is hardware failures, the second is fat fingers. Hardware failures are dealt with by redundancy which always involves keeping redundant information across multiple failure domains. Those domains can be as small as a cache line or as big as a data center. These failures can be dealt with transparently and automagically in modern file systems.

With fat fingers, the failure domain has no natural boundaries other than time. As such, snapshots kept in the file system are the best choice, especially if you have a copy-on-write that can keep snapshots with very little overhead.

There is also the special case of adversarial fat fingering which appears in ransomware. The answer is snapshots, but the core problem is timely detection since otherwise you may not have a single point in time to recover from.

replies(1): >>43728986 #

1. ghugccrghbvr ◴[18 Apr 25 15:29 UTC] No.43728986[source]▶

>>43720813 #

Disaster at all?

replies(1): >>43867154 #

2. ted_dunning ◴[02 May 25 07:37 UTC] No.43867154[source]▶

>>43728986 (TP) #

That's a form of hardware failure where the failure domain is a data center or region. It is always good to store enough redundant data outside of whatever failure domains you want to consider in your planning. That might be a device, a server, a rack, a room, a data center, a region or a country/regulatory domain.

Non-local storage is definitely worthwhile there. Snapshots can have additive value there because they may be more coherent states.

↑