From S3 to R2: An economic opportunity

(dansdatathoughts.substack.com)

274 points dangoldin | 2 comments | 02 Nov 23 19:15 UTC | HN request time: 0s | source

Show context

hipadev23 ◴[02 Nov 23 20:39 UTC] No.38119812[source]▶

OP is missing that a correct implementation of Databricks or Snowflake will have those instances are running inside the same AWS region as the data. That's not to say R2 isn't an amazing product, but the egregious costs aren't as high since egress is $0 on both sides.

replies(2): >>38120142 #>>38120245 #

dangoldin ◴[02 Nov 23 21:06 UTC] No.38120142[source]▶

>>38119812 #

Author here and it is true that costs within a region are free and if you do design your system appropriately you can take advantage of it but I've seen accidental cases where someone will try to access in another region and it's nice to not even have to worry about it. Even that can be handled with better tooling/processes but the bigger point is if you want to have your data be available across clouds to take advantage of the different capabilities. I used AI as an example but imagine you have all your data in S3 but want to use Azure due to the OpenAI partnership. It's that use case that's enabled by R2.

replies(1): >>38121516 #

1. hipadev23 ◴[02 Nov 23 22:52 UTC] No.38121516[source]▶

>>38120142 #

Yeah, for greenfield work building up on R2 is generally a far better deal than S3, but if you have a massive amount of data already on S3, especially if it's small files, you're going to pay a massive penalty to move the data. Sippy is nice but it just spreads the pain over time.

replies(1): >>38126193 #

2. Dylan16807 ◴[03 Nov 23 09:07 UTC] No.38126193[source]▶

>>38121516 (TP) #

> Sippy is nice but it just spreads the pain over time.

That egress money was going to be spent with or without sippy. It's not "just spreading" the pain, it's avoiding adding any pain at all.

↑