←back to thread

621 points sebg | 1 comments | | HN request time: 0.239s | source
Show context
stapedium ◴[] No.43717547[source]
I’m just a small business & homelab guy, so I’ll probably never use one of these big distributed file systems. But when people start talking petabytes, I always wonder if these things are actually backed up and what you use for backup and recovery?
replies(5): >>43717690 #>>43718697 #>>43720813 #>>43724292 #>>43726423 #
1. KaiserPro ◴[] No.43726423[source]
Depends on what the data is.

Because of the replication factor here, I assume that this filesystem is optimised for read throughput rather than capacity. Either way, there is a concept of "nearline" storage. Its a storage tier that is designed to be only really accesed by a backupagent. The general idea is that it stores a snapshot of the main file system every n hours.

After that you have as many snapshots as you can afford.