Most active commenters
  • mritchie712(3)
  • huntaub(3)

←back to thread

573 points huntaub | 15 comments | | HN request time: 1.077s | source | bottom

Hey HN, I’m Hunter the founder of Regatta Storage (https://regattastorage.com). Regatta Storage is a new cloud file system that provides unlimited pay-as-you-go capacity, local-like performance, and automatic synchronization to S3-compatible storage. For example, you can use Regatta to instantly access massive data sets in S3 with Spark, Pytorch, or pandas without paying for large, local disks or waiting for the data to download.

Check out an overview of how the service works here: https://www.youtube.com/watch?v=xh1q5p7E4JY, and you can try it for free at https://regattastorage.com after signing up for an account. We wanted to let you try it without an account, but we figured that “Hacker News shares a file system and S3 bucket” wouldn’t be the best experience for the community.

I built Regatta after spending nearly a decade building and operating at-scale cloud storage at places like Amazon’s Elastic File System (EFS) and Netflix. During my 8 years at EFS, I learned a lot about how teams thought about their storage usage. Users frequently told me that they loved how simple and scalable EFS was, and -- like S3 -- they didn’t have to guess how much capacity they needed up front.

When I got to Netflix, I was surprised that there wasn’t more usage of EFS. If you looked around, it seemed like a natural fit. Every application needed a POSIX file system. Lots of applications had unclear or spikey storage needs. Often, developers wanted their storage to last beyond the lifetime of an individual instance or container. In fact, if you looked across all Netflix applications, some ridiculous amount of money was being spent on empty storage space because each of these local drives had to be overprovisioned for potential usage.

However, in many cases, EFS wasn’t the perfect choice for these workloads. Moving workloads from local disks to NFS often encountered performance issues. Further, applications which treated their local disks as ephemeral would have to manually “clean up” left over data in a persistent storage system.

At this point, I realized that there was a missing solution in the cloud storage market which wasn’t being filled by either block or file storage, and I decided to build Regatta.

Regatta is a pay-as-you-go cloud file system that automatically expands with your application. Because it automatically synchronizes with S3 using native file formats, you can connect it to existing data sets and use recently written file data directly from S3. When data isn’t actively being used, it’s removed from the Regatta cache, so you only pay for the backing S3 storage. Finally, we’re developing a custom file protocol which allows us to achieve local-like performance for small-file workloads and Lustre-like scale-out performance for distributed data jobs.

Under the hood, customers mount a Regatta file system by connecting to our fleet of caching instances over NFSv3 (soon, our custom protocol). Our instances then connect to the customer’s S3 bucket on the backend, and provide sub-millisecond cached-read and write performance. This durable cache allows us to provide a strongly consistent, efficient view of the file system to all connected file clients. We can perform challenging operations (like directory renaming) quickly and durably, while they asynchronously propagate to the S3 bucket.

We’re excited to see users share our vision for Regatta. We have teams who are using us to build totally serverless Jupyter notebook servers for their AI researchers who prefer to upload and share data using the S3 web UI. We have teams who are using us as a distributed caching layer on top of S3 for low-latency access to common files. We have teams who are replacing their thin-provisioned Ceph boot volumes with Regatta for significant savings. We can’t wait to see what other things people will build and we hope you’ll give us a try at regattastorage.com.

We’d love to get any early feedback from the community, ideas for future direction, or experiences in this space. I’ll be in the comments for the next few hours to respond!

1. mritchie712 ◴[] No.42175067[source]
Pretty sure we're in your target market. We [0] currently use GCP Filestore to host DuckDB. Here's the pricing and performance at 10 TiB. Can you give me an idea on the pricing and performance for Regatta?

Service Tier: Zonal

Location: us-central1

10 TiB instance at $0.35/TiB/hr

Monthly cost: $2,560.00

Performance Estimate:

Read IOPS: 92,000

Write IOPS: 26,000

Read Throughput: 2,600 MiB/s

Write Throughput: 880 MiB/s

0 - https://www.definite.app/blog/duckdb-datawarehouse

replies(2): >>42175238 #>>42175360 #
2. huntaub ◴[] No.42175238[source]
Yes, you should be in our target market. I don't think that I can give a cost estimate without having a good sense of what percentage of your data you're actively using at any given time, but we should absolutely support the performance numbers that you're talking about. I'd love to chat more in detail, feel free to send me a note at hleath [at] regattastorage.com.
replies(1): >>42175518 #
3. _bare_metal ◴[] No.42175360[source]
Out of curiosity, why not go bare metal in a managed colocation? Is that for the geographic spread? Or unpredictable load?

Every few months of this spend is like buying a server

Edit: back at my pc and checked, relevant bare metal is ~$500/m, amortized:

https://baremetalsavings.com/c/LtxKMNj

Edit 2: for 100tb..

replies(3): >>42175534 #>>42176875 #>>42189500 #
4. mritchie712 ◴[] No.42175518[source]
I'll send you a note!

Found this in the docs:

> By default, Regatta file systems can provide up to 10 Gbps of throughput and 10,000 IOPS across all connected clients.

Is that the lower bound? The 50 TiB filestore instance has 104 Gbps read through put (albeit at a relatively high price point).

replies(1): >>42175541 #
5. mritchie712 ◴[] No.42175534[source]
agreed, one month of 50 TiB is $12,800!

we're using Filestore out of convenience right now, but actively exploring alternatives.

6. huntaub ◴[] No.42175541{3}[source]
That's just the limit that we apply to new file systems. We should be able to support your 104 Gbps of read throughput.
7. nine_k ◴[] No.42176875[source]
Hiring someone who knows how to manage bare metal (with failover and stuff) may take time %)
replies(1): >>42177892 #
8. wongarsu ◴[] No.42177892{3}[source]
You pay a datacenter to put it in a rack and add connect power and uplinks, then treat it like a big ec2 instance (minus the built-in firewall). Now you just need someone who knows how to secure an ec2 instance and run your preferred software there (with failover and stuff).

If you run a single-digit number of servers and replace them every 5 years you will probably never get a hardware failure. If you're unlucky and it still happens get someone to diagnose what's wrong, ship replacement parts to the data center and pay their tech to install them in your server.

Bare metal at scale is difficult. A small number of bare metal servers is easy. If your needs are average enough you can even just rent them so you don't have capital costs and aren't responsible for fixing hardware issues.

replies(4): >>42179198 #>>42183127 #>>42184135 #>>42185150 #
9. swyx ◴[] No.42183127{4}[source]
sounds like an opportunity for someone (you?) to offer an abstraction slightly above bare metal to do the stuff you said to do, charging higher than bare metal but lower than the other stuff. how much daylight is there between those prices?
replies(1): >>42189476 #
10. tempest_ ◴[] No.42184135{4}[source]
We run on our own stuff at our shop.

Some things that are hidden in the cloud providers cost are redundant networking, redundant internet connection, redundant disks.

Likely still cheaper than the cloud obviously but you will need to stomach down time for that stuff if something breaks.

11. kingnothing ◴[] No.42185150{4}[source]
Are you going to risk your entire business over "probably never get a hardware failure" that, if it hits, might result in days of downtime to resolve? I wouldn't.
replies(1): >>42189295 #
12. nine_k ◴[] No.42189295{5}[source]
Just pay 2x for the hardware and have a hot standby, 1990s-style. Practice switching between the boxes every month or so; should be imperceptible for the customers and a nearly non-event for the ops.
replies(1): >>42196263 #
13. nthh ◴[] No.42189476{5}[source]
I'm sure there are companies in this space providing private clouds on bare metal, I wonder how that would be to operate at scale though.
14. nthh ◴[] No.42189500[source]
This is compelling but it would be useful to compare upfront costs here. Investing $20,000+ in a server isn't feasible for many. I'd also be curious to know how much a failsafe (perhaps "heatable" cold storage, at least for the example) would cost.
15. kingnothing ◴[] No.42196263{6}[source]
How many hours of labor does that take every month you failover? What about hot hard drive spares? Do you want networking redundancy? How about data backups? Second set of hot servers in another physical data center?

All of that costs money and time. You're probably better off using cloud hosting and focusing on your unique offering than having that expertise and coordination in house.