An intro to DeepSeek's distributed file system

(maknee.github.io)

623 points sebg | 1 comments | 17 Apr 25 12:50 UTC | HN request time: 0.206s | source

Show context

stapedium ◴[17 Apr 25 14:38 UTC] No.43717547[source]▶

I’m just a small business & homelab guy, so I’ll probably never use one of these big distributed file systems. But when people start talking petabytes, I always wonder if these things are actually backed up and what you use for backup and recovery?

replies(5): >>43717690 #>>43718697 #>>43720813 #>>43724292 #>>43726423 #

shermantanktop ◴[17 Apr 25 15:57 UTC] No.43718697[source]▶

>>43717547 #

Backup and recovery is a process with a non-zero failure rate. The more you test it, the lower the rate, but there is always a failure mode.

With these systems, the runtime guarantees of data integrity are very high and the failure rate is very low. And best of all, failure is constantly happening as a normal activity in the system.

So once you have data integrity guarantees that are better in you runtime system than your backup process, why backup?

There are still reasons, but they become more specific to the data being stored and less important as a general datastore feature.

replies(1): >>43719218 #

Eikon ◴[17 Apr 25 16:37 UTC] No.43719218[source]▶

>>43718697 #

> why backup?

Because of mistakes and malicious actors...

replies(1): >>43719485 #

overfeed ◴[17 Apr 25 16:58 UTC] No.43719485[source]▶

>>43719218 #

...and the "Disaster" in "Disaster recovery" may have been localized and extensive (fire, flooding, major earthquake, brownouts due to a faulty transformer, building collapse, a solvent tanker driving through the wall into the server room, a massive sinkhole, etc)

replies(1): >>43720567 #

shermantanktop ◴[17 Apr 25 18:35 UTC] No.43720567[source]▶

>>43719485 #

Yes, the dreaded fiber vs. backhoe. But if your distributed file system is geographically redundant, you're not exposed to that, at least from an integrity POV. It sucks that 1/3 or 1/5 or whatever of your serving fleet just disappeared, but backup won't help with that.

replies(1): >>43722349 #

1. overfeed ◴[17 Apr 25 21:21 UTC] No.43722349[source]▶

>>43720567 #

> But if your distributed file system is geographically redundant

Redundancy and backups are not the same thing! There's some overlap, but treating them as interchangeable will occasionally result in terrible outcomes, like when a config change that results in all 5/5 datacenters fragmenting and failing to create a quorum, then finding out your services have circular dependencies when you are trying to bootstrap foundational services. Local backups would solve this, each DC would load last known good config, but rebuilding consensus necessary for redundancy requires coordination from now-unreachable hosts.

↑