←back to thread

621 points sebg | 2 comments | | HN request time: 0.445s | source
Show context
jamesblonde ◴[] No.43716889[source]
Architecturally, it is a scale-out metadata filesystem [ref]. Other related distributed file systems are Collosus, Tectonic (Meta), ADLSv2 (Microsoft), HopsFS (Hopsworks), and I think PolarFS (Alibaba). They all use different distributed row-oriented DBs for storing metadata. S3FS uses FoundationDB, Collosus uses BigTable, Tectonic some KV store, ADLSv2 (not sure), HopsFS uses RonDB.

What's important here with S3FS is that it supports (1) a fuse client - it just makes life so much easiter - and (2) NVMe storage - so that training pipelines aren't Disk I/O bound (you can't always split files small enough and parallel reading/writing enough to a S3 object store).

Disclaimer: i worked on HopsFS. HopsFS adds tiered storage - NVMe for recent data and S3 for archival.

[ref]: https://www.hopsworks.ai/post/scalable-metadata-the-new-bree...

replies(5): >>43716985 #>>43717053 #>>43717220 #>>43719689 #>>43720601 #
1. objectivefs ◴[] No.43717220[source]
There is also ObjectiveFS that supports FUSE and uses S3 for both data and metadata storage, so there is no need to run any metadata nodes. Using S3 instead of a separate database also allows scaling both data and metadata with the performance of the S3 object store.
replies(1): >>43726938 #
2. halifaxbeard ◴[] No.43726938[source]
OFS was a drop in replacement for EFS and tbh it's insanely good value for the problem space.