Most active commenters
  • johnjohnnotjohn(3)
  • kkielhofner(3)

←back to thread

From S3 to R2: An economic opportunity

(dansdatathoughts.substack.com)
274 points dangoldin | 39 comments | | HN request time: 0.009s | source | bottom
Show context
simonsarris ◴[] No.38118991[source]
Cloudflare has been attacking the S3 egress problem by creating Sippy: https://developers.cloudflare.com/r2/data-migration/sippy/

It allows you to incrementally migrate off of providers like S3 and onto the egress-free Cloudflare R2. Very clever idea.

He calls R2 an undiscovered gem and IMO this is the gem's undiscovered gem. (Understandable since Sippy is very new and still in beta)

replies(4): >>38119194 #>>38120069 #>>38120641 #>>38122400 #
1. ravetcofx ◴[] No.38119194[source]
What are the economics that Amazon and other providers have egress fees and R2 doesn't? Is it acting as a loss leader or does this model still make money for CloudFlare?
replies(9): >>38119285 #>>38119489 #>>38119521 #>>38119701 #>>38119768 #>>38119769 #>>38120649 #>>38121416 #>>38125131 #
2. NicoJuicy ◴[] No.38119285[source]
You pay for the capacity of your network.

Cloudflare has huge ingress, because they need it to protect sites against DDOS.

They basically already pay for their R2 bandwidth ( = egress) because of that.

Additionally, with their SDN ( software defined networking) they can fine-tune some of the Data-Flow/bandwidth too.

That's how I understood it, fyi.

Some more info could be found when they started ( or co-founded, not sure) the bandwidth alliance.

Eg.

https://blog.cloudflare.com/aws-egregious-egress/

https://blog.cloudflare.com/bandwidth-alliance/

replies(2): >>38119446 #>>38119785 #
3. miselin ◴[] No.38119446[source]
Also, for the CDN case that R2 seems to be targeting - regardless of the origin of the data (R2 or S3), chances are pretty good that Cloudflare is already paying for the egress anyway.
replies(2): >>38119491 #>>38120065 #
4. Nextgrid ◴[] No.38119489[source]
Completely free egress is a loss leader, but the true cost is so little (at least 90x less than what AWS charges) that it pays for itself in the form of more CloudFlare marketshare/mindshare.
replies(1): >>38119872 #
5. NicoJuicy ◴[] No.38119491{3}[source]
I'm not sure about that.

A CDN keeps the data nearby, reducing the need to pay egress to the big bandwidth providers.

( not an expert though)

replies(1): >>38120638 #
6. candiddevmike ◴[] No.38119521[source]
Greed on the cloud providers part, I think. You'd expect egress fees to enable cheaper compute, but there are other cloud providers out there like Hetzner with cheaper compute and egress, so the economics don't really add up.
replies(2): >>38119947 #>>38126264 #
7. kazen44 ◴[] No.38119701[source]
also, egress fees are a sort of vendor lock-in, because getting data out of the cloud is vastly more expensive then putting new data into the cloud..
replies(2): >>38119793 #>>38120093 #
8. chatmasta ◴[] No.38119768[source]
Amazon doesn't have unit cost for egress. They charge you for the stuff you put through their pipe, while paying their transit providers only for the size of the pipe (or more often, not paying them anything since they just peer directly with them at an exchange point).

Amazon uses $/gb as a price gouging mechanism and also a QoS constraint. Every bit you send through their pipe is basically printing money for them, but they don't want to give you a reserved fraction of the pipe because then other people can't push their bits through that fraction. So they get the most efficient utilization by charging for the stuff you send through it, ripping everybody off equally.

Also, this way it's not cost effective to build a competitor to Amazon (or any bandwidth intensive business like a CDN or VPN) on top of Amazon itself. You fundamentally need to charge more by adding a layer of virtualization, which means "PaaS" companies built on Amazon are never a threat to AWS and actually symbiotically grow the revenue of the ecosystem by passing the price gouging onto their own customers.

replies(3): >>38119876 #>>38120020 #>>38120203 #
9. johnjohnnotjohn ◴[] No.38119769[source]
I’m inherently suspicious of services that are free (like Cloudflare egress). Maybe I’ve been burned too many times over the years, but I almost expect some kind of hostility or u-turn in the long run (I do really like Cloudflare’s products right now!).

I almost wish they had some kind of sustainable usage-based charge that was much lower than AWS.

Feel free to tell me why I’m wrong! I’d love to jump onboard - it just seems too good to be true in the long-term.

replies(1): >>38126808 #
10. swyx ◴[] No.38119785[source]
somebody more knowledgeaeble please correct me if i'm mistaken, but i think the bandwidth alliance is really the lynchpin of the whole thing. basically get all the non-AWS players in the room and agree on zero rating traffic between each other, to provide a credible alternative to AWS networks
11. oaktowner ◴[] No.38119793[source]
Exactly this. Data has gravity, and this increases the gravity around data stored at Amazon...making it more likely for you to buy more compute/services at Amazon.
replies(1): >>38123709 #
12. WJW ◴[] No.38119872[source]
I know from personal experience that "big" customers can negotiate incredible discounts on egress bandwidth as well. 90-95% discount is not impossible, only "retail" customers pay the sticker price.
replies(1): >>38120756 #
13. specialp ◴[] No.38119876[source]
You don't get charge for transit if you are sending stuff IN from the internet or to any other AWS resource in that region. So there is no QOS constraint inside except for perhaps paying for the S3 GET/SELECT/LIST costs.

It is pretty much exclusively to lock you into their services. It heavily impacts multi-cloud and outside of AWS service decisions when your data lives in AWS and is taxed at 5-9 cents a GB to come out. We have settled for inferior AWS solutions at times because the cost of moving things out is prohibitive (IE AWS Backup vs other providers)

replies(2): >>38120291 #>>38120733 #
14. vidarh ◴[] No.38119947[source]
Indeed, Hetzner is so much cheaper that if you have high S3 egress fees you can rent Hetzner boxes to sit in front of your S3 deployment as caching proxies and get a lot of extra "free" compute on top.

It's an option that's often been attractive if/when you didn't want the hassle of building out something that could provide S3 level durability yourself. But with more/cheaper S3 competitors it's becoming a significantly less attractive option.

15. kkielhofner ◴[] No.38120020[source]
AWS egress charges blatantly take advantage of people who have never bought transit or done peering.

To them "that's just what bandwidth costs" but anyone who's worked with this stuff (sounds like you and I both) can do the quick math and see what kind of money printing machine this scheme is.

replies(1): >>38122916 #
16. kkielhofner ◴[] No.38120065{3}[source]
It's actually worse than that.

In the CDN case Cloudflare has to fetch it from the origin, cache (store) it anyway, and then egress it. By charging for R2 they're moving that cost center to a profit one.

17. kkielhofner ◴[] No.38120093[source]
The big cloud providers are Hotel California - you can check in but you can't check out.

Of course you can (like Snap) but it's a MASSIVE engineering effort and initial expense.

18. pests ◴[] No.38120203[source]
Honest question, how is this different than a toll road? An entity creates a road network with a certain size (lanes, capacity/hour, literal traffic) and pays for it by charging individual cars put through the road.
replies(4): >>38121840 #>>38122110 #>>38123381 #>>38125820 #
19. dangoldin ◴[] No.38120291{3}[source]
Author here - have you tried using R2? As others mentioned there's also Sippy (https://developers.cloudflare.com/r2/data-migration/sippy/) which makes this easy to try.
20. ilc ◴[] No.38120638{4}[source]
Let's say you want to use cloudflare, or another CDN. The process is pretty simple.

You setup your website and preferably DON'T have it talk to anyone other than the CDN.

You then point your DNS to wherever the CDN tells you to. (Or let them take over DNS. Depends on the provider.)

The CDN then will fetch data from your site and cache it, as needed.

Your site is the "origin", in CDN speak.

If Cloudflare can move the origin within their network, there is huge cost savings and reliability increases there. This is game changing stuff. Do not under estimate it.

21. dotnet00 ◴[] No.38120649[source]
There has to be more to it than a pure loss leader, since there's also the Bandwidth Alliance Cloudflare is in, which allows R2 competitors like Backblaze B2 to also offer free egress, which benefits those competitors while weakening the incentive for R2 somewhat.
22. martinald ◴[] No.38120733{3}[source]
It also makes things like just using RDS for your managed database and having compute nearby but with another provider often incredibly expensive.
23. martinald ◴[] No.38120756{3}[source]
That's still a 3-10x markup though. And it's also very dependent on your relationship with AWS. What happens if they don't offer the discount on renewal?
24. jmarbach ◴[] No.38121416[source]
Cloudflare wrote a blog post about their bandwidth egress charges in different parts of the world: https://blog.cloudflare.com/the-relative-cost-of-bandwidth-a...

The original post also includes a link to a more recent Cloudflare blog post on AWS bandwidth charges: https://blog.cloudflare.com/aws-egregious-egress/

25. dekhn ◴[] No.38121840{3}[source]
Or, really, any capital intensive business that makes money through operating costs based on usage rather than total capacity.
26. andreasmetsala ◴[] No.38122110{3}[source]
The difference is that Amazon doesn’t own the road, they’re just a truck driver. Amazon customers rent space on the truck and pay whatever the driver asks them for.
replies(1): >>38123246 #
27. PaulHoule ◴[] No.38122916{3}[source]
It's also a way to choose your customers.

Some people want to host a lot of warez and pirate movies and stuff but that doesn't monetize very well per GB consumed so pricing bandwidth high means those people never show up, thus saving a lot of trouble for AWS.

I remember when salesforce.com announced a service that would let you serve up web pages out of their database, it was priced crazy high (100-1000x too much) from the viewpoint of "I want to run a blog on this service" but for someone who wanted to put in a form to collect data from customers it was totally affordable. Salesforce knew which customers it wanted and priced accordingly.

28. corbezzoli ◴[] No.38123246{4}[source]
Yeah ok but you still need a ticket to board the truck/bus, right? The more people in your family, the more tickets.

The issue isn’t charging for egress, but charging excessively.

29. kenmacd ◴[] No.38123381{3}[source]
There's at least a couple of reasons that your analogy doesn't really work.

First a lot of these roads are 'free' and yet you're still being charged for it. If two large networks come to an agreement then they connect the two networks (ie build that road), but no money changes hands.

Second if there is a paid peering agreement in place (ie say AWS had a cost to push your data out), that still wouldn't be billed to them in the way they're charging you. Instead they'd be paying for the rate of traffic at something like the 95th percentile of the max. This means that you could download a petabyte of data from them when the pipe isn't busy and cost them nothing, or you could download a gigabyte when it's busy and push up the costs.

30. jgalt212 ◴[] No.38123709{3}[source]
very true, but data gets stale very quickly. So you start putting new data in a new place. Eventually, you don't care about the old place. And all the people and processes who accessed the data in the old place are gone.
replies(1): >>38126110 #
31. mannyv ◴[] No.38125131[source]
The way to reduce s3 egress fees is to use CloudFront, negotiate your cloudfront fees down, then use s3 as the origin.
32. Dylan16807 ◴[] No.38125820{3}[source]
The toll road is the only way out of the county and it's charging $90. That's what's different.
33. hnwizard ◴[] No.38126110{4}[source]
Completely agreed about data gravity, but it's not just that, it's also customer opted-in vendor-lockin.

The customer (because they are lazy, don't know better, aren't capable of, or all three) opts in to use various "convenient" CSP "services". These services could look convenient (and are always pretty to extremely expensive), they quickly becomes an integral part of the customer's badly architected "system".

The end result is complete vendor-lockin, the inability of the poor (stupid) user to leave and the continued gang rape of their bank account (also via additional, incompetent developer and devops "resources").

Throw in average modern "devops" who are hired to handle this. They aren't like the sysadmin of yesteryear, they no longer have experience with, or understand the bits and bytes. They are glorified UI clickers and YAML editors, they even lack any reasonable system level debugging skills. For every problem they encounter they first immediately run to google in search for answers.

In addition, I would argue that CSPs are a huge, huge waste of computing, space and power resources, because their systems completely encourage people to just do things, without understanding what they are doing, screw the consequences and just pay.

Result, the business suffers greatly (on so many levels), the CSP wins big and continues winning.

What happens here is that a system, if designed right from the get go, could have been run on a SINGLE, modern, high end, well positioned and connected server to the Internet, is now replaced with tens to hundreds of "instances" and random assorted CSP provided services -- what a colossal waste.

Books can be written on negligence, lack of understanding, utter tech stupidity and ultimately the costs which are absurd.

replies(1): >>38147603 #
34. wkat4242 ◴[] No.38126264[source]
Scaleway also, and they are fully S3 compatible. I use their glacier service for backup. I store 1.5TB for around 3€ per month.

I used the storage box from hetzner before but they only had 1TB or 5TB (and higher) choices so I had to pay for 5TB (€12 per month) without using most of it. Having rsync support was nice but rclone works fine with S3.

35. dgacmu ◴[] No.38126808[source]
Because they're a CDN. You pay for storage already, so an object that isn't downloaded much is paid for. An object that gets downloaded a lot uses bandwidth, but the more popular it is, the more effective the CDN caching is.

There probably needs to be an abuse prevention rate limit (and probably is), but it's not quite as crazy as it sounds to just rely on their CDN bandwidth sharing policies instead of charging.

replies(1): >>38128896 #
36. johnjohnnotjohn ◴[] No.38128896{3}[source]
What happens if I host an incredibly popular file, and start eating up everyone else’s share of the bandwidth? ie - I become a popular Linux distro package mirror?

I do think there are “soft limits” in place like you say - it’s just my personal preference to have documented limits (or pay fairly for what you use). IMO it helps stop abuse, and prevents billing surprises for legitimate heavy use-cases.

replies(1): >>38135758 #
37. dgacmu ◴[] No.38135758{4}[source]
They undoubtedly limit the % of bandwidth you can use when the link is full. The problem with that is that it's very hard to quantify, because whether or not they have spare bandwidth for you depends a lot on location, timing, and what else is happening on the network.

But that's really no different from the guarantee you get from most CDN services. If you're using cloudflare in front of S3, for example, you'll end up with the same behavior.

replies(1): >>38141074 #
38. johnjohnnotjohn ◴[] No.38141074{5}[source]
> But that's really no different from the guarantee you get from most CDN services. If you're using cloudflare in front of S3, for example, you'll end up with the same behavior.

But in my mind it’s also comforting that something like Cloudfront has a long-term sustainable model (I should also add with fewer strings attached like hosting video).

I do think the prices ant AWS are too high, but it discourages bad actors from filling up the shared pipes. ISPs are sometimes a classic example of what happens when a link is over subscribed.

Cloudflare’s “soft limits” are also somewhat of a dark pattern if you ask me. I like to know exactly how much something will cost, and it’s really hard to figure out with Cloudflare if you’re a high-traffic source. Do I hit the “soft limits,” or not? It’s really hard to say with their current model.

FWIW, I think Cloudflare is a great product right now - I am just skeptical they can keep it up forever.

39. 20after4 ◴[] No.38147603{5}[source]
It is a colossal waste of resources, indeed.

It's also a huge waste of human effort managing the complexity introduced by the cloud provider's arbitrary bullshit.

At this point multiple generations of engineers have little understanding of underlying layers of technology, having only really learned how to use cloud services. No TCP/IP, no UNIX, just a bit of bash and a ton of AWS.

Cloud providers do hide most of the low level complexity, which could be seen as a benefit (at least that seems to be what's touted as a main benefit, along with instant scalability.) Unfortunately they replace all of that with more arbitrary complexity which is ultimately (in my opinion, at least) a much bigger burden than the fundamental complexity that is abstracted away.