Launch HN: Regatta Storage (YC F24) – Turn S3 into a local-like, POSIX cloud FS

I love this space, and I have tried and failed to get cloud providers to work on it directly :). We could not get the Avere folks to admit that their block-based thing on object store was a mistake, but they were also the only real game in town.

That said, I feel like writeback caching is a bit ... risky? That is, you aren't treating the object store as the source of truth. If your caching layer goes down after a write is ack'ed but before it's "replicated" to S3, people lose their data, right?

I think you'll end up wanting to offer customers the ability to do strongly-consistent writes (and cache invalidation). You'll also likely end up wanting to add operator control for "oh and don't cache these, just pass through to the backing store" (e.g., some final output that isn't intended to get reused anytime soon).

Finally, don't sleep on NFSv4.1! It ticks a bunch of compliance boxes for various industries, and then they will pay you :). Supporting FUSE is great for folks who can do it, but you'd want them to start by just pointing their NFS client at you, then "upgrading" to FUSE for better performance.

> That is, you aren't treating the object store as the source of truth. If your caching layer goes down after a write is ack'ed but before it's "replicated" to S3, people lose their data, right?

This is exactly why we're building our caching layer to be highly-durable, like S3 itself. We will make sure that the data in the cache is safe, even if servers go down. This is what gives us the confidence to respond to the client before the data is in S3. The big difference between the data living in our cache and the data living in S3 is cost and performance, not necessarily durability.

> I think you'll end up wanting to offer customers the ability to do strongly-consistent writes (and cache invalidation). You'll also likely end up wanting to add operator control for "oh and don't cache these, just pass through to the backing store" (e.g., some final output that isn't intended to get reused anytime soon).

I think this is exactly right. I think that storage systems are too often too hands off about the data (oh, give us the bytes and we will store them for you). I believe that there are gains to be had by asking the users to tell you more about what they're doing. If you have a directory which is only used to read files and a directory which is only used to write files, then you probably want to have different cache strategies for those directories? I believe we can deliver this with good enough UX for most people to use.

> Finally, don't sleep on NFSv4.1! It ticks a bunch of compliance boxes for various industries, and then they will pay you :). Supporting FUSE is great for folks who can do it, but you'd want them to start by just pointing their NFS client at you, then "upgrading" to FUSE for better performance.

I certainly don't, and this is why we are supporting NFSv3 right now. That's not going away any time soon. We want to offer something that's highly compatible with the industry at large today (NFS-based, we can talk specifics about whether or not that should be v3 or v4) and then something that is high-performance for the early adopters who can use something like FUSE. I think that both things are required to get the breadth of customers that we're looking for.