←back to thread

580 points huntaub | 6 comments | | HN request time: 0.929s | source | bottom

Hey HN, I’m Hunter the founder of Regatta Storage (https://regattastorage.com). Regatta Storage is a new cloud file system that provides unlimited pay-as-you-go capacity, local-like performance, and automatic synchronization to S3-compatible storage. For example, you can use Regatta to instantly access massive data sets in S3 with Spark, Pytorch, or pandas without paying for large, local disks or waiting for the data to download.

Check out an overview of how the service works here: https://www.youtube.com/watch?v=xh1q5p7E4JY, and you can try it for free at https://regattastorage.com after signing up for an account. We wanted to let you try it without an account, but we figured that “Hacker News shares a file system and S3 bucket” wouldn’t be the best experience for the community.

I built Regatta after spending nearly a decade building and operating at-scale cloud storage at places like Amazon’s Elastic File System (EFS) and Netflix. During my 8 years at EFS, I learned a lot about how teams thought about their storage usage. Users frequently told me that they loved how simple and scalable EFS was, and -- like S3 -- they didn’t have to guess how much capacity they needed up front.

When I got to Netflix, I was surprised that there wasn’t more usage of EFS. If you looked around, it seemed like a natural fit. Every application needed a POSIX file system. Lots of applications had unclear or spikey storage needs. Often, developers wanted their storage to last beyond the lifetime of an individual instance or container. In fact, if you looked across all Netflix applications, some ridiculous amount of money was being spent on empty storage space because each of these local drives had to be overprovisioned for potential usage.

However, in many cases, EFS wasn’t the perfect choice for these workloads. Moving workloads from local disks to NFS often encountered performance issues. Further, applications which treated their local disks as ephemeral would have to manually “clean up” left over data in a persistent storage system.

At this point, I realized that there was a missing solution in the cloud storage market which wasn’t being filled by either block or file storage, and I decided to build Regatta.

Regatta is a pay-as-you-go cloud file system that automatically expands with your application. Because it automatically synchronizes with S3 using native file formats, you can connect it to existing data sets and use recently written file data directly from S3. When data isn’t actively being used, it’s removed from the Regatta cache, so you only pay for the backing S3 storage. Finally, we’re developing a custom file protocol which allows us to achieve local-like performance for small-file workloads and Lustre-like scale-out performance for distributed data jobs.

Under the hood, customers mount a Regatta file system by connecting to our fleet of caching instances over NFSv3 (soon, our custom protocol). Our instances then connect to the customer’s S3 bucket on the backend, and provide sub-millisecond cached-read and write performance. This durable cache allows us to provide a strongly consistent, efficient view of the file system to all connected file clients. We can perform challenging operations (like directory renaming) quickly and durably, while they asynchronously propagate to the S3 bucket.

We’re excited to see users share our vision for Regatta. We have teams who are using us to build totally serverless Jupyter notebook servers for their AI researchers who prefer to upload and share data using the S3 web UI. We have teams who are using us as a distributed caching layer on top of S3 for low-latency access to common files. We have teams who are replacing their thin-provisioned Ceph boot volumes with Regatta for significant savings. We can’t wait to see what other things people will build and we hope you’ll give us a try at regattastorage.com.

We’d love to get any early feedback from the community, ideas for future direction, or experiences in this space. I’ll be in the comments for the next few hours to respond!

Show context
jitl ◴[] No.42175213[source]
I’m very interested in this as a backing disk for SQLite/DuckDB/parquet, but I really want my cached reads to come straight from instance-local NVMe storage, and to have a way to “pin” and “unpin” some subdirectories from local cache.

Why local storage? We’re going to have multiple processes reading & writing to the files and need locking & shared memory semantics you can’t get w/ NFS. I could implement pin/unpin myself in user space by copying stuff between /mnt/magic-nfs and /mnt/instance-nvme but at that point I’d just use S3 myself.

Any thoughts about providing a custom file system or how to assemble this out of parts on top of the NFS mount?

replies(1): >>42175408 #
huntaub ◴[] No.42175408[source]
Hey -- I think this is something that's in-scope for our custom protocol that we're working on. I'd love to chat more about your needs to make sure that we build something that will work great for you. Would you mind shooting an email to hleath [at] regattastorage.com and we can chat more?
replies(1): >>42184410 #
1. juancampa ◴[] No.42184410[source]
We're also interested in SQLite shared by multiple processes on something like Regatta but my concerns are the issues described in the SQLite documentation about NFS [1]. Notably "SQLite relies on exclusive locks for write operations, and those have been known to operate incorrectly for some network filesystems."

[1] https://sqlite.org/useovernet.html

replies(1): >>42184481 #
2. huntaub ◴[] No.42184481[source]
Ah, yes — there are some specific file locking concerns with NFSv3 (notably that locks aren’t built as leases like in NFSv4). Let me do a double click here, but I know we will be able to support locks correctly with our custom protocol when we launch it by the end of the year.
replies(2): >>42185966 #>>42186463 #
3. juancampa ◴[] No.42185966[source]
One more question. How does it handle large files that are frequently modified in arbitrary locations (like a SQLite file)? Will it only upload the "diffs" to S3? I'm guessing it doesn't have to scan the whole file to determine what's changed since it can keep track of what's "dirty".

I ask because last time I checked, S3 wouldn't let you "patch" an object. So you'd have to push the diff as separate objects and then "reconstruct" the original file client-side as different chunks are read, right?

replies(1): >>42186260 #
4. huntaub ◴[] No.42186260{3}[source]
That's correct re: the S3 API. What we do is we "merge" multiple write requests together to minimize the cost to you and the number of requests to S3. For example, if you write a file 1,000 times in the span of a minute, we would merge that into a single PutObject request to S3. Of course, we force flush the data every few minutes (even if it's being written frequently) in order to make sure that there's an up-to-date copy in S3.
5. mdaniel ◴[] No.42186463[source]
I would really enjoy hearing why SMBv4 or the hundreds of other protocols are somehow insufficient for your needs. The thought of "how hard can a custom protocol be?!" makes me shudder, to say nothing of the burden -- ours and yours -- of maintaining endpoint implementations for all the bazillions of places one would want to consume a network mount
replies(1): >>42187584 #
6. huntaub ◴[] No.42187584{3}[source]
Ultimately, we're just working on a different problem space than these protocols. That's not to say that all of the existing protocols are bad, I absolutely believe that these protocols are great. Our ultimate goal, though, is to replace block storage, with a file-layer protocol. This sort of requires different semantics than what the existing file protocols support.

I don't at all disagree that it's a hard problem! That's part of what makes it so fun to work on.