New AMD EPYC-based Compute Engine family, now in beta

(cloud.google.com)

343 points cvallejo | 1 comments | 18 Feb 20 16:31 UTC | HN request time: 0.261s | source

Show context

mdasen ◴[18 Feb 20 18:02 UTC] No.22358369[source]▶

Since people from Google Cloud are likely here, one thing I'd like to ask/talk about: are we getting too many options for compute? One of the great things about Google Cloud was that it was very easy to order. None of this "t2.large" where you'd have to look up how much memory and CPU that it has and potentially how many credits you're going to get per hour and such. I think Google Cloud is still easier, but it's getting harder to know what is the right direction.

For example, the N2D instances are basically the price of the N1 instances or even cheaper with committed-use discounts. Given that they provide 39% more performance, should the N1 instances be considered obsolete once the N2D exits beta? I know that there could be workloads that would be better on Intel than AMD, but it seems like there would be little reason to get an N1 instance once the N2D exits beta.

Likewise, the N2D has the basically same sustained-use price as the E2 instances (which only have the performance of N1 instances). What's the point of E2 instances if they're the same price? Shouldn't I be getting a discount given that Google can more efficiently use the resources?

It's great to see the improvements at Google Cloud. I'm glad to see lower-cost, high-performance options available. However, I guess I'm left wondering who is choosing what. I look at the pricing and think, "who would choose an N1 or N2 given the N2D?" Sure, there are people with specific requirements, but it seems like the N2D should be the default in my mind.

This might sound a bit like complaining, but I do love how I can just lookup memory and CPU pricing easily. Rather than having to remember name-mappings, I just choose from one of the families (N1, N2, E2, N2D) and can look at the memory and CPU pricing. It makes it really simple to understand what you're paying. It's just that as more families get added and Google varies how it applies sustained-use and committed-use discounts between the families, it becomes more difficult to choose between them.

For example, if I'm going for a 1-year commitment, should I go with an E2 at $10.03/vCPU or an N2D at $12.65/vCPU. The N2D should provide more performance than the 26% price increase, yes? Why can't I get an EPYC based E-series to really drive down costs?

Again, I want to reiterate that Google Cloud's simpler pricing is great, but complications have crept in. E2 machines don't get sustained-use discounts which means they're really only valuable if you're doing a yearly commitment or non-sustained-use. The only time N1 machines are cheaper is in sustained-use - they're the same price as Intel N2 machines if you're doing a yearly commitment or non-sustained-use. Without more guidance on performance differences between the N2D and N2, why should I ever use N2? I guess this is a bit of rambling to say, "keep an eye on pricing complexity - I don't like spending a lot of time thinking about optimizing costs".

replies(11): >>22358433 #>>22358442 #>>22358483 #>>22358724 #>>22358783 #>>22358816 #>>22358852 #>>22359250 #>>22359298 #>>22360053 #>>22360348 #

boulos ◴[18 Feb 20 20:59 UTC] No.22360348[source]▶

>>22358369 #

Disclosure: I work on Google Cloud (and really care about this).

The challenge here is balancing diverse customer workloads against the processor vendors. Historically, at Google, we just bought a single server variant (basically) because almost all code is expected to care primarily about scale-out environments. That made the GCE decision simple: offer the same hardware we build for Google, at great prices.

The problem is that many customers have workloads and applications that they can’t just change. No amount of rational discounting or incentives makes a 2 GHz processor compete with a 4 GHz processor (so now, for GCE, we buy some speedy cores and call that Compute Optimized). Even more strongly, no amount of “you’re doing it wrong” actually is the right answer for “I have a database on-prem that needs several sockets and several TB of memory” (so, Memory Optimized).

There’s an important reason though that we refer to N1, N2, N2D, and E2 as “General purpose”: we think they’re a good balanced configuration, and they’ll continue to be the right default choice (and we default to these in the console). E2 is more like what we do internally at Google, by abstracting away processor choice, and so on. As a nit to your statement above, E2 does flip between Intel and AMD.

You should choose the right thing for your workloads, primarily subject to the Regions you need them in. We’ll keep trying to push for simplicity in our API and offering, but customers really do have a wide range of needs, which imposes at least some minimum amount of complexity. For too long (probably) we attempted to refuse, because of complexity, both for us and customers. Feel free to ignore it though!

replies(5): >>22360916 #>>22361132 #>>22361552 #>>22363902 #>>22365298 #

alfalfasprout ◴[18 Feb 20 22:35 UTC] No.22361132[source]▶

>>22360348 #

I mean, this mentality often is wrong. Scaling out actually isn't the right solution for everyone. It works for Google given that primarily web services are offered. It does not work for workloads that heavily rely on the CPU (think financial workloads, ML, HPC/scientific workloads) or have realtime requirements. In fact, for many ETL workloads vertical scaling proves far more efficient.

It's long been the "google way" to try and abstract out compute but it's led to an industry full of people trying to follow in their way and overcomplicating what can be solved on one or two machines.

replies(3): >>22361446 #>>22363919 #>>22364549 #

1. nemothekid ◴[19 Feb 20 11:06 UTC] No.22364549[source]▶

>>22361132 #

Google doesn’t have ML workloads (Basically all of search) or real-time (basically all of RTB) requirements?

I agree not everyone can develop like Google, but it’s wrong to say that “it doesn’t work”

↑