Launch HN: Regatta Storage (YC F24) – Turn S3 into a local-like, POSIX cloud FS

This is honestly the coolest thing I've seen coming out of YC in years. I have a bunch of questions which are basically related to "how does it work" and please pardon me if my questions are silly or naive!

1. If I had a local disk which was 10 GB, what happens when I try to contend with data in the 50 GB range (as in, more that could be cached locally?) Would I immediately see degradation, or thrashing, at the 10 GB mark?

2. Does this only work in practice on AWS instances? As in, I could run it on a different cloud, but in practice we only really get fast speeds due to running everything within AWS?

3. I've always had trouble with FUSE in different kinds of docker environments. And it looks like you're using both FUSE and NFS mounts. How does all of that work?

4. Is the idea that I could literally run Clickhouse or Postgres with a regatta volume as the backing store?

5. I have to ask - how do you think about open source here?

6. Can I mount on multiple servers? What are the limits there? (ie, a lambda function.)

I haven't played with the so maybe doing so would help answer questions. But I'm really excited about this! I have tried using EFS for small projects in the past but - and maybe I was holding it wrong - I could not for the life of me figure out what I needed to get faster bandwidth, probably because I didn't know how to turn the knobs correctly.

Wow, thanks for the nice note! No questions are silly, and I'll also note that we now have a docs site (https://docs.regattastorage.com) and feel free to email me (hleath [at] regattastorage.com) if I don't fully address your questions.

> If I had a local disk which was 10 GB, what happens when I try to contend with data in the 50 GB range (as in, more that could be cached locally?) Would I immediately see degradation, or thrashing, at the 10 GB mark?

We don't actually do caching on your instance's disk. Instead, data is cached in the Linux page cache (in memory) like a regular hard drive, and Regatta provides a durable, shared cache that automatically expands with the working set size of your application. For example, if you were trying to work with data in the 50 GiB range, Regatta would automatically cache all 50 GiB -- allowing you to access it with sub-millisecond latency.

> Does this only work in practice on AWS instances? As in, I could run it on a different cloud, but in practice we only really get fast speeds due to running everything within AWS?

For now, yes -- the speed is highly dependent on latency -- which is highly dependent on distance between your instance and Regatta. Today, we are only in AWS, but we are looking to launch in other clouds by the end of the year. Shoot me an email if there's somewhere specifically that you're interested in.

> I've always had trouble with FUSE in different kinds of docker environments. And it looks like you're using both FUSE and NFS mounts. How does all of that work?

There are a couple of different questions bundled together in this. Today, Regatta exposes an NFSv3 file system that you can mount. We are working on a new protocol which will be mounted via FUSE. However, in Docker environments, we also provide a CSI driver (for use with K8s) and a Docker volume plugin (for use with just Docker) that handles the mounting for you. We haven't released these publicly yet, so shoot me an email if you want early access.

> Is the idea that I could literally run Clickhouse or Postgres with a regatta volume as the backing store?

Yes, you should be able to run a database on Regatta.

> I have to ask - how do you think about open source here?

We are in the process of open sourcing all of the client code (CSI driver, mount helper, FUSE), but we don't have plans currently to open source the server code. We see the value of Regatta in managing the infrastructure so you don't have to, and if we release it via open-source, it would be difficult to run on your own.

> Can I mount on multiple servers? What are the limits there? (ie, a lambda function.)

Yes, you can mount on multiple servers simultaneously! We haven't specifically stress-tested the number of clients we support, but we should be good for O(100s) of mounts. Unfortunately, AWS locks down Lambda so we can't mount arbitrary file systems in that environment specifically.

> efs performance

Yes, the challenge here is specifically around the semantics of NFS itself and the latency of the EFS service. We think we have a path to solving both of these in the next month or two.