Interesting. What sort of companies can take advantage of this?
Interesting. What sort of companies can take advantage of this?
I work for a non-profit doing digital preservation for a number of universities in the US. We store huge amounts of data in S3, Glacier and Wasabi, and provide services and workflows to help depositors comply with legal requirements, access controls, provable data integrity, archival best practices, etc.
There are some for-profits in this space as well. It's not a huge or highly profitable space, but I do think there are other business opportunities out there where organizations want to store geographically distributed copies of their data (for safety) and run that data through processing pipelines.
The trick, of course, is to identify which organizations have a similar set of needs and then build that. In our case, we've spent a lot of time working around data access costs, and there are some cases where we just can't avoid them. They can really be considerable when you're working with large data sets, and if you can solve the problem of data transfer costs from the get-go, you'll be way ahead of many existing services built on S3 and Glacier.
Basically, using R2 allows you to undercut competitors' pricing. It also means I don't need to build out a separate CDN to host my files, because Cloudflare will do that for me, too.
Competitors built out and maintain their own equivalent CDNs and storage solutions that are more ~10x more expensive to maintain and operate than going through Cloudflare. Basically, Cloudflare is doing to CDNs and storage what AWS and friends did to compute.
But reality is a bit more complicated than that. Migrating data + pointers to that data, en masse, isn't super easy (although things like Sippy make it easier).
In addition, there's all the capex that's gone into building systems around the assumptions of their blend data centers, homegrown CDNs, mix of storage systems. There's a sunk cost fallacy at play, as well as the inertia of knowing how to maintain the old system and not having any experience with the new system.
It's not impossible, but it'd require a lot of willpower and energy that these companies (who are 10+ years into their life cycles) don't really possess.
Having seen the inside of orgs like that before, starting from scratch is ~10x-100x easier, depending on the blend of bureaucracy on the menu.
And the difference is that you will fail your customers when that time comes because you'll just get suspended (we've seen some cases here on the forum) and you'll have to come here to complain so the ceo/cto resumes things for you.
In their docs they explicitly state it as an attractive feature to leverage, so that’d surprise me.
That being said, I’m not planning to serve particularly large files with any meaningful frequency, so in my particular case I’m not concerned about that possibility. (I’m distributing low bitrate audio, and small images, mostly).
If I were trying to build YouTube or whatever I’d be more concerned.
That being said, with their storage pricing and network set up as they are, I think they make plenty of money off of a hypothetical YouTube clone.
I do think they’ll raise prices eventually. But it’s a highly competitive space, so it feels like there’s a stable ceiling.
> I’m distributing low bitrate audio, and small images, mostly
This means the cache-size would be much smaller though.
Re cache-size, maybe I've misunderstood what you mean by cache size limiting, but yeah that's my point – I don't need a massive cache size for my application. My data doesn't lend itself much to large and distributed spikes. Egress is spiky, but centralized to a few files at a time. e.g. if there were to be a single day where 1TB were downloaded at once, 80% of it would be concentrated into ~20 400MB-sized files.
> They were also seeing ~30TB of daily egress on a non-enterprise plan, which would absolutely never happen in my case – 1TB of daily egress would be a p99.9 event.
I don't understand what media company you'll be competing against if you'll use just 30TB/month of bandwidth.