←back to thread

621 points sebg | 2 comments | | HN request time: 0.397s | source
Show context
jamesblonde ◴[] No.43716889[source]
Architecturally, it is a scale-out metadata filesystem [ref]. Other related distributed file systems are Collosus, Tectonic (Meta), ADLSv2 (Microsoft), HopsFS (Hopsworks), and I think PolarFS (Alibaba). They all use different distributed row-oriented DBs for storing metadata. S3FS uses FoundationDB, Collosus uses BigTable, Tectonic some KV store, ADLSv2 (not sure), HopsFS uses RonDB.

What's important here with S3FS is that it supports (1) a fuse client - it just makes life so much easiter - and (2) NVMe storage - so that training pipelines aren't Disk I/O bound (you can't always split files small enough and parallel reading/writing enough to a S3 object store).

Disclaimer: i worked on HopsFS. HopsFS adds tiered storage - NVMe for recent data and S3 for archival.

[ref]: https://www.hopsworks.ai/post/scalable-metadata-the-new-bree...

replies(5): >>43716985 #>>43717053 #>>43717220 #>>43719689 #>>43720601 #
MertsA ◴[] No.43720601[source]
>Tectonic some KV store,

Tectonic is built on ZippyDB which is a distributed DB built on RocksDB.

>What's important here with S3FS is that it supports (1) a fuse client - it just makes life so much easier

Tectonic also has a FUSE client built for GenAI workloads on clusters backed by 100% NVMe storage.

https://engineering.fb.com/2024/03/12/data-center-engineerin...

Personally what stands out to me for 3FS isn't just that it has a FUSE client, but that they made it more of a hybrid of FUSE client and native IO path. You open the file just like normal but once you have a fd you use their native library to do the actual IO. You still need to adapt whatever AI training code to use 3FS natively if you want to avoid FUSE overhead, but now you use your FUSE client for all the metadata operations that the native client would have needed to implement.

https://github.com/deepseek-ai/3FS/blob/ee9a5cee0a85c64f4797...

replies(1): >>43723212 #
1. Scaevolus ◴[] No.43723212[source]
Being able to opt-in to the more complex and efficient user-mode IO path for critical use cases is a very good idea.
replies(1): >>43724347 #
2. carlhjerpe ◴[] No.43724347[source]
While not the same, Ceph storage is accessible as object storage, filesystem (both FUSE and kernel) and block storage.