Most active commenters

    ←back to thread

    From S3 to R2: An economic opportunity

    (dansdatathoughts.substack.com)
    274 points dangoldin | 24 comments | | HN request time: 1.047s | source | bottom
    1. meowface ◴[] No.38119562[source]
    Is there any reason to not use R2 over a competing storage service? I already use Cloudflare for lots of other things, and don't personally care all that much about the "Cloudflare's near-monopoly as a web intermediary is dangerous" arguments or anything like that.
    replies(10): >>38120515 #>>38120628 #>>38120667 #>>38121777 #>>38121809 #>>38121833 #>>38121902 #>>38124987 #>>38126101 #>>38126111 #
    2. Hasz ◴[] No.38120515[source]
    As far as I know, R2 offers no storage tiers. Most of my s3 usage is archival and sits in glacier. From Cloudflare's pricing page, S3 is substantially cheaper for that type of workload.
    replies(1): >>38124971 #
    3. gurchik ◴[] No.38120628[source]
    1. This is the most obvious one, but S3 access control is done via IAM. For better or for worse, IAM has a lot of functionality. I can configure a specific EC2 instance to have access to a specific file in S3 without the need to deal with API keys and such. I can search CloudTrail for all the times a specific user read a certain file.

    2. R2 doesn't support file versioning like S3. As I understand it, Wasabi supports it.

    3. R2's storage pricing is designed for frequently accessed files. They charge a flat $0.015 per GB-month stored. This is a lot cheaper than S3 Standard standard pricing ($0.023 per GB-month), but more expensive than Glacier and marginally more expensive than S3 Standard - Infrequent Access. Wasabi is even cheaper at $0.0068 per GB-month but with a 1 TB billing minimum.

    4. If you want public access to the files in your S3 bucket using your own domain name, you can create a CNAME record with whatever DNS provider you use. With R2 you cannot use a custom domain unless the domain is set up in Cloudflare. I had to register a new domain name for this purpose since I could not switch DNS providers for something like this.

    5. If you care about the geographical region your data is stored in, AWS has way more options. At a previous job I needed to control the specific US state my data was in, which is easy to do in AWS if there is an AWS Region there. In contrast R2 and Wasabi both have few options. R2 has a "Jurisdictional Restriction" feature in Beta right now to restrict data to a specific legal jurisdiction, but they only support EU right now. Not helpful if you need your data to be stored in Brazil or something.

    replies(2): >>38124688 #>>38124827 #
    4. paulddraper ◴[] No.38120667[source]
    If you already use Cloudflare for lots of other things, no.

    If you already use AWS for lots of other things, yes.

    5. Voloskaya ◴[] No.38121777[source]
    I don't know about R2 specifically, but we migrated one of our service from S3 to Cloudflare Images, and we have been hit with over 40h+ of down time on CF's side over the last 30 days. One of the outage was 22 hours long. Today's outage has been ongoing for almost 12 hours and is still ongoing, and we have had 2 or 3 others >1h outages.

    Every cloud provider has outages sometimes but CF has been horrendous.

    We were actually planning on migrating some other parts to R2 but we are just ditching CF altogether and just going to pay a bit more on AWS for reliability.

    So if R2 has been impacted even a third as much as CF images, that would definitely be an important consideration.

    replies(2): >>38122167 #>>38124225 #
    6. MGriisser ◴[] No.38121809[source]
    Is R2 subject to Cloudflare's universal API rate limit? They have an API rate limit of 1200 requests/5 minutes that I've hit many times with their images product.

    And they won't increase it unless you become an enterprise customer in which case they'll generously double it.

    replies(1): >>38129750 #
    7. mike_d ◴[] No.38121833[source]
    There is no data locality. If your workload is in AWS already you might save money by keeping the data in the more expensive S3 vs going out to Cloudflare to fetch your bytes and return your results.

    If you don't mind having your bits reside elsewhere, Backblaze B2 and Bunny.net single location storage are both cheaper than Cloudflare.

    replies(1): >>38121971 #
    8. lewisl9029 ◴[] No.38121902[source]
    It's been a while, but last time I checked, write latency on R2 was pretty horrendous. Close to 1s compared to S3's <100ms, tested from my laptop in SF. Wouldn't be surprised if they made progress on this front, but definitely do dig deeper if your workload is sensitive to write latency.

    Another (that probably contributes directly to the write latency issues) is region selection and replication. S3 just offers a ton more control here. I have a bunch of S3 buckets replicating async across regions around the world to enable fast writes everywhere (my use case can tolerate eventual consistency here). R2 still seems very light on region selection and replication options. Kinda disappointed since they're supposed to be _the_ edge company.

    9. ◴[] No.38121971[source]
    10. esafak ◴[] No.38122167[source]
    What third-party sites do people use to track vendor downtimes, because they don't declare it honestly themselves?

    I found https://isdown.app/integrations/cloudflare/cloudflare-sites-...

    11. csomar ◴[] No.38124225[source]
    I don’t know why this isn’t mentioned more. CF offering (R2/workers/pages) are extremely unreliable that I’m wondering if anyone is actually using them.
    replies(1): >>38125982 #
    12. eastdakota ◴[] No.38124688[source]
    Thank you for providing product roadmap. Label all the above: coming soon.
    replies(1): >>38126526 #
    13. technion ◴[] No.38124827[source]
    I'm happy to be told to look harder but I couldn't find an R2 Object Lock equivalent.

    I do have to wonder if that leaves R2 customers one minor compromise away from losing their whole data store.

    14. thrtythreeforty ◴[] No.38124971[source]
    I know people archive all kinds of data. I use Glacier as off-site backup for my measly 1TB of irreplaceable data. But I know many customers put petabytes in it.

    What could you have a petabyte of that you're pretty sure you'll never need again? What kind of datasets are you storing?

    replies(2): >>38126085 #>>38133854 #
    15. Too ◴[] No.38124987[source]
    For data not frequently egressed. Log storage for example, data locality will be better and you will only ever extract small % of logs stored.

    Same with data that is aggregated into smaller data set within AWS before you egress it.

    16. jpgvm ◴[] No.38125982{3}[source]
    We are using Workers for ~12mo now with actually very little actual downtime. There have been some regional issues but no world wide outages.

    That said we don't use any queues, KV, etc. Just pure JS isolates so that probably contributes to the robustness.

    We do use the Cache API though and have ran into weirdness there. We also needed to implement our own Stale-While-Revalidate (SWR) because CF still refuses to implement this properly.

    Overall CF is a provider that I would say we begrudging acknowledge as good. Stuff like the SWR thing can be really frustrating but overall reliability and performance are much better since moving to CF.

    replies(1): >>38127761 #
    17. Dylan16807 ◴[] No.38126085{3}[source]
    > pretty sure you'll never need again?

    It doesn't have to be nearly that stark.

    If we factor out egress, since it's the same for everything, the bulk retrieval cost for glacier deep archive is only $2.50/TB.

    That means that a full year of storage ($12) plus four retrievals ($10) is roughly the same price as a single month of normal S3 storage ($23).

    18. welder ◴[] No.38126101[source]
    R2 doesn't support versioning yet. If you need versioning you have to use DigitalOcean Spaces (also cheaper than S3) or S3.

    Otherwise, I've been using R2 now in production for wakatime.com for almost a month now with Sippy enabled. The latency and error rates are the same as S3, with DigitalOcean having slightly higher latency and error rates.

    19. lysecret ◴[] No.38126111[source]
    One major thing that R2 doesn't have is big data distributed table support. E.g. you can use BigQuery to query data on GCS, or you can use Athena on S3.
    20. oars ◴[] No.38126526{3}[source]
    Fantastic. Looking forward to convincing my CTO to switch to R2 when your team gets closer to finishing these!
    21. csomar ◴[] No.38127761{4}[source]
    > Overall CF is a provider that I would say we begrudging acknowledge as good.

    I don't understand. You say that you used a very small subset of their offering in a very specific and limited way; and with that you conclude that their offering is "good"? Shouldn't you make that conclusion after reviewing at least 50% of their offering?

    replies(1): >>38136042 #
    22. njs12345 ◴[] No.38129750[source]
    There is the Images Batch API that isn't subject to the 1200 requests/5 minutes limit: https://developers.cloudflare.com/images/cloudflare-images/u...
    23. Hasz ◴[] No.38133854{3}[source]
    long term work stuff. Things we would be contractually obligated to produce many years down the line.

    Plenty of other people storing images, video, etc. a PB is really not that much stuff when it's not just for personal consumption.

    24. jpgvm ◴[] No.38136042{5}[source]
    All of those extra features aren't their offering. Their offering is their network, everything else is just icing.