←back to thread

621 points sebg | 1 comments | | HN request time: 0.202s | source
Show context
huntaub ◴[] No.43717158[source]
I think that the author is spot on, there are a couple of dimensions in which you should evaluate these systems: theoretical limits, efficiency, and practical limits.

From a theoretical point of view, like others have pointed out, parallel distributed file systems have existed for years -- most notably Lustre. These file systems should be capable of scaling out their storage and throughput to, effectively, infinity -- if you add enough nodes.

Then you start to ask, well how much storage and throughput can I get with a node that has X TiB of disk -- starting to evaluate efficiency. I ran some calculations (against FSx for Lustre, since I'm an AWS guy) -- and it appears that you can run 3FS in AWS for about 12-30% cheaper depending on the replication factors that you choose against FSxL (which is good, but not great considering that you're now managing the cluster yourself).

Then, the third thing you start to ask is anecdotally, are people able to actually configure these file systems into the size of deployment that I want (which is where you hear things like "oh it's hard to get Ceph to 1 TiB/s") -- and that remains to be seen from something like 3FS.

Ultimately, I obviously believe that storage and data are really important keys to how these AI companies operate -- so it makes sense that DeepSeek would build something like this in-house to get the properties that they're looking for. My hope is that we, at Archil, can find a better set of defaults that work for most people without needing to manage a giant cluster or even worry about how things are replicated.

replies(2): >>43717307 #>>43726407 #
1. KaiserPro ◴[] No.43726407[source]
the other important thing to note is what is that filesystem designed to be used for?

For example 3FS looks like its optimised for read throughput (which makes sense, like most training workloads its read heavy.) write operations look very heavy.

Can you scale the metadata server, what are the cost of metadata operations? Is there a throttling mechanism to stop a single client sucking all of the metadata server's IO? Does it support locking? Is it a COW filesystem?