←back to thread

621 points sebg | 2 comments | | HN request time: 0.598s | source
Show context
jamesblonde ◴[] No.43716889[source]
Architecturally, it is a scale-out metadata filesystem [ref]. Other related distributed file systems are Collosus, Tectonic (Meta), ADLSv2 (Microsoft), HopsFS (Hopsworks), and I think PolarFS (Alibaba). They all use different distributed row-oriented DBs for storing metadata. S3FS uses FoundationDB, Collosus uses BigTable, Tectonic some KV store, ADLSv2 (not sure), HopsFS uses RonDB.

What's important here with S3FS is that it supports (1) a fuse client - it just makes life so much easiter - and (2) NVMe storage - so that training pipelines aren't Disk I/O bound (you can't always split files small enough and parallel reading/writing enough to a S3 object store).

Disclaimer: i worked on HopsFS. HopsFS adds tiered storage - NVMe for recent data and S3 for archival.

[ref]: https://www.hopsworks.ai/post/scalable-metadata-the-new-bree...

replies(5): >>43716985 #>>43717053 #>>43717220 #>>43719689 #>>43720601 #
1. nickfixit ◴[] No.43716985[source]
I've been using JuiceFS since the start for my AI stacks. Similar and used postgresql for the meta.
replies(1): >>43717041 #
2. jamesblonde ◴[] No.43717041[source]
JuiceFS is very good. I didn't have it as a scaleout metadata FS - it supports lots of DBs (single host and distributed DBs).