Most active commenters

boulos(10)
(6)
wmf(4)
867-5309(4)
JoshTriplett(3)
dilyevsky(3)
lallysingh(3)
cvallejo(3)
mdasen(3)
tempsy(3)

Popular/hot comments

>>22358369 #
>>22360348 #
>>22361446 #
>>22357449 #
>>22357706 #
>>22358979 #
>>22359295 #
>>22357462 #
>>22358147 #
>>22361132 #

New AMD EPYC-based Compute Engine family, now in beta

(cloud.google.com)

1. tpetry ◴[18 Feb 20 16:57 UTC] No.22357449[source]▶

Now its getting really interesting: In the end you have to compare pricing for a vCore (which is a thread on a cpu) with per-thread performance on AMD vs. Intel. Does anyone know a benchmark like this? Epyc Processors are most often tested on heavy parallelizable tasks and not strictly single thread tasks.

replies(4): >>22357500 #>>22357832 #>>22358054 #>>22358902 #

2. boulos ◴[18 Feb 20 16:58 UTC] No.22357462[source]▶

>>22357194 (OP) #

Disclosure: I work on Google Cloud.

cvallejo is the PM, so ask her anything!

replies(3): >>22357543 #>>22359291 #>>22360897 #

3. dsign ◴[18 Feb 20 16:59 UTC] No.22357478[source]▶

>>22357194 (OP) #

Can those instances be used in GKE node pools?

replies(2): >>22357688 #>>22362184 #

4. boulos ◴[18 Feb 20 17:01 UTC] No.22357500[source]▶

>>22357449 #

Disclosure: I work on Google Cloud.

Performance is a tricky, multi-dimensional thing, so there are many benchmarks that try to map to different workloads. For example, specint is often used for exactly your "single threaded task" benchmark, but if what you work on is numerical computing, you mostly don't care (you want specfp at the least and even that is bad).

Some people seem to really like Coremark these days. Others like specintrate. What kind of application do you care about? I'd guess plenty of folks here can provide a better estimate with that info.

5. JoshTriplett ◴[18 Feb 20 17:04 UTC] No.22357543[source]▶

>>22357462 #

Does the new n2d machine type support nested virtualization?

(Asking because Azure supports nested virtualization but only on some machine types. AWS doesn't support nested virtualization at all. Google Cloud seems to support nested virtualization on other machine types.)

Also, why 224 rather than 256?

replies(2): >>22357620 #>>22357757 #

6. boulos ◴[18 Feb 20 17:11 UTC] No.22357620{3}[source]▶

>>22357543 #

We do not yet support AMD's nested implementation (we do on Intel). But cvallejo is also the PM for Nested :).

As for 224, we've always reserved threads on each host for I/O and so on. Figure 2 from the Snap paper [1] is probably the best public reference. We also don't make it clear (on purpose) what size the underlying host processors are, though you can clearly guesstimate pretty easily.

[1] https://research.google/pubs/pub48630/

replies(2): >>22357723 #>>22357725 #

7. boulos ◴[18 Feb 20 17:16 UTC] No.22357688[source]▶

>>22357478 #

Disclosure: I work on Google Cloud.

Should be, once the rollouts complete. So once you can see N2D types in the Console for your project, I think it'll just flow naturally to GKE.

replies(1): >>22357724 #

8. m0zg ◴[18 Feb 20 17:17 UTC] No.22357706[source]▶

>>22357194 (OP) #

Well, they may be hypothetically "available" in us-central1, but they don't show up for me. All I see is "Cascade Lake powered" N2.

replies(4): >>22357748 #>>22357759 #>>22357785 #>>22365880 #

9. QUFB ◴[18 Feb 20 17:18 UTC] No.22357723{4}[source]▶

>>22357620 #

Speaking of nested virtualization, are there any plans to support this on E2, or will we have to use other instance types?

replies(1): >>22359201 #

10. cvallejo ◴[18 Feb 20 17:19 UTC] No.22357724{3}[source]▶

>>22357688 #

Disclosure: I also work on Google Cloud

That's correct! Once the rollout completes, you will be able to use N2D instances for GKE!

replies(2): >>22357787 #>>22358026 #

11. JoshTriplett ◴[18 Feb 20 17:19 UTC] No.22357725{4}[source]▶

>>22357620 #

> We do not yet support AMD's nested implementation (we do on Intel).

Any particular reason for that limitation, or just "not implemented yet"? (Not asking for product roadmaps, just wondering if there's a specific technical issue that makes it more difficult to support.)

> As for 224, we've always reserved threads on each host for I/O and so on. Figure 2 from the Snap paper [1] is probably the best public reference.

That's helpful, thank you.

replies(1): >>22360134 #

12. JoshTriplett ◴[18 Feb 20 17:20 UTC] No.22357748[source]▶

>>22357706 #

Likewise. "gcloud beta compute machine-types list" doesn't show them in any zone.

13. ◴[18 Feb 20 17:20 UTC] No.22357757{3}[source]▶

>>22357543 #

14. ◴[18 Feb 20 17:20 UTC] No.22357759[source]▶

>>22357706 #

15. boulos ◴[18 Feb 20 17:22 UTC] No.22357785[source]▶

>>22357706 #

Disclosure: I work on Google Cloud.

Sorry for the confusion. We should be more clear and explicitly state "now rolling out". Over the course of our multi-day rollout, different regions and the console will start showing these.

16. p_l ◴[18 Feb 20 17:23 UTC] No.22357787{4}[source]▶

>>22357724 #

And if it doesn't, it was always possible to break GKE state into "offering" node types that were not supported. Source: me, using preemptible instances for a year before they got into GKE ;-)

Modifying instance group templates is your friend.

replies(1): >>22362049 #

17. api ◴[18 Feb 20 17:26 UTC] No.22357832[source]▶

>>22357449 #

From what I've seen AMD's recent chips beat (sometimes outright destroy) Intel on multithreaded tasks, but on single-threaded tasks it's still a bit of a toss up and depends on the work load. Intel seems to still come out ahead on some heavy numeric and scientific type work loads, especially if vector instructions are used. The differences are not huge though, and AMD solidly wins on price/performance even in cases where it's a bit slower in absolute performance for single threaded work.

At this point Intel literally only makes sense if you have one of those single threaded work loads where it still excels and you absolutely must have the fastest single thread performance.

replies(1): >>22358147 #

18. ensacco ◴[18 Feb 20 17:36 UTC] No.22357996[source]▶

>>22357194 (OP) #

What's the intent / timeline for N2D in other regions, e.g. us-west1?

replies(1): >>22360182 #

19. simonebrunozzi ◴[18 Feb 20 17:39 UTC] No.22358026{4}[source]▶

>>22357724 #

> Disclosure: I also work on Google Cloud

Wow, half of the comments so far start with this disclosure! Tons of Google people lurking on HN :)

replies(1): >>22358180 #

20. t3rabytes ◴[18 Feb 20 17:41 UTC] No.22358054[source]▶

>>22357449 #

I did some rudimentary testing with AMD vs Intel on AWS recently and found that AMD lacked enough in single threaded perf that it meant they weren’t worth the savings for our workloads (Rails apps).

replies(2): >>22358236 #>>22358831 #

21. dilyevsky ◴[18 Feb 20 17:48 UTC] No.22358147{3}[source]▶

>>22357832 #

Is that with spectre/meltdown/etc protection on?

replies(3): >>22358368 #>>22358674 #>>22360327 #

22. lallysingh ◴[18 Feb 20 17:50 UTC] No.22358173[source]▶

>>22357194 (OP) #

What's the topology of these machines? Dual socket 64c chips with some reserved (or disabled)?

replies(1): >>22358210 #

23. gpm ◴[18 Feb 20 17:50 UTC] No.22358180{5}[source]▶

>>22358026 #

Too be fair, it's just two of them answering questions, and one is both OP and the product manager.

24. wmf ◴[18 Feb 20 17:52 UTC] No.22358210[source]▶

>>22358173 #

They don't say that but that's the only way to provide 224 threads.

replies(2): >>22358390 #>>22361664 #

25. actuator ◴[18 Feb 20 17:54 UTC] No.22358236{3}[source]▶

>>22358054 #

Care to publish if you have anything stored away from this?

I don't work with Rails anymore but the last Rails app(single threaded, Unicorn) I worked with raw CPU compute was not the bottleneck usually as with most CRUD apps time was mostly spent in I/O. This effect was so pronounced that I had set up most scaling groups on M or R instances as the memory used by the gems was the bottleneck on the number of Rails processes I could spawn in the box without exhausting resources. However I do remember even if CPU was not the bottleneck, moving to a processor with a better single thread performance did improve the median response time at the cost of making the same request throughput costlier.

26. pier25 ◴[18 Feb 20 17:54 UTC] No.22358249[source]▶

>>22357194 (OP) #

What are the implications? Higher perf and/or lower price?

replies(2): >>22358349 #>>22358388 #

27. kart23 ◴[18 Feb 20 18:01 UTC] No.22358349[source]▶

>>22358249 #

Looks to be about $5/Month cheaper, based on this page:

https://cloud.google.com/compute/all-pricing#n2_machine_type...

Jeez, I didn't realize how expensive cloud compute was. I always wondered why my school still has a datacenter. Having your own servers still makes sense for a lot of orgs.

replies(2): >>22358793 #>>22359651 #

28. privateSFacct ◴[18 Feb 20 18:01 UTC] No.22358360[source]▶

>>22357194 (OP) #

Does AWS have a comparable offering? I haven't seen anything on EPYC - congrats to GCP for moving quickly. I'm mostly AWS based currently.

replies(2): >>22358419 #>>22358508 #

29. blattimwind ◴[18 Feb 20 18:02 UTC] No.22358368{4}[source]▶

>>22358147 #

These are VMs, the hosts have to run with these mitigations enabled.

30. mdasen ◴[18 Feb 20 18:02 UTC] No.22358369[source]▶

>>22357194 (OP) #

Since people from Google Cloud are likely here, one thing I'd like to ask/talk about: are we getting too many options for compute? One of the great things about Google Cloud was that it was very easy to order. None of this "t2.large" where you'd have to look up how much memory and CPU that it has and potentially how many credits you're going to get per hour and such. I think Google Cloud is still easier, but it's getting harder to know what is the right direction.

For example, the N2D instances are basically the price of the N1 instances or even cheaper with committed-use discounts. Given that they provide 39% more performance, should the N1 instances be considered obsolete once the N2D exits beta? I know that there could be workloads that would be better on Intel than AMD, but it seems like there would be little reason to get an N1 instance once the N2D exits beta.

Likewise, the N2D has the basically same sustained-use price as the E2 instances (which only have the performance of N1 instances). What's the point of E2 instances if they're the same price? Shouldn't I be getting a discount given that Google can more efficiently use the resources?

It's great to see the improvements at Google Cloud. I'm glad to see lower-cost, high-performance options available. However, I guess I'm left wondering who is choosing what. I look at the pricing and think, "who would choose an N1 or N2 given the N2D?" Sure, there are people with specific requirements, but it seems like the N2D should be the default in my mind.

This might sound a bit like complaining, but I do love how I can just lookup memory and CPU pricing easily. Rather than having to remember name-mappings, I just choose from one of the families (N1, N2, E2, N2D) and can look at the memory and CPU pricing. It makes it really simple to understand what you're paying. It's just that as more families get added and Google varies how it applies sustained-use and committed-use discounts between the families, it becomes more difficult to choose between them.

For example, if I'm going for a 1-year commitment, should I go with an E2 at $10.03/vCPU or an N2D at $12.65/vCPU. The N2D should provide more performance than the 26% price increase, yes? Why can't I get an EPYC based E-series to really drive down costs?

Again, I want to reiterate that Google Cloud's simpler pricing is great, but complications have crept in. E2 machines don't get sustained-use discounts which means they're really only valuable if you're doing a yearly commitment or non-sustained-use. The only time N1 machines are cheaper is in sustained-use - they're the same price as Intel N2 machines if you're doing a yearly commitment or non-sustained-use. Without more guidance on performance differences between the N2D and N2, why should I ever use N2? I guess this is a bit of rambling to say, "keep an eye on pricing complexity - I don't like spending a lot of time thinking about optimizing costs".

replies(11): >>22358433 #>>22358442 #>>22358483 #>>22358724 #>>22358783 #>>22358816 #>>22358852 #>>22359250 #>>22359298 #>>22360053 #>>22360348 #

31. wmf ◴[18 Feb 20 18:05 UTC] No.22358388[source]▶

>>22358249 #

Basically. This is the best server processor.

replies(1): >>22358673 #

32. lallysingh ◴[18 Feb 20 18:05 UTC] No.22358390{3}[source]▶

>>22358210 #

Darnit, I was hoping I was wrong and we were getting a hint of some new beast with 112 cores...

33. wmf ◴[18 Feb 20 18:07 UTC] No.22358419[source]▶

>>22358360 #

Still "in the works": https://aws.amazon.com/blogs/aws/in-the-works-new-amd-powere...

34. lallysingh ◴[18 Feb 20 18:08 UTC] No.22358433[source]▶

>>22358369 #

They still own and have to pay for the old hardware.

Customers rarely have the time/energy/expertise to continuously reoptimize their cloud usage.

35. znpy ◴[18 Feb 20 18:08 UTC] No.22358442[source]▶

>>22358369 #

tl;dr: I find the amplitude of google cloud's offering confusing, so i think there should be less of it.

36. scardycat ◴[18 Feb 20 18:12 UTC] No.22358483[source]▶

>>22358369 #

Customers like having choices. Enterprises typically will "certify" one config and would like to stay on that till they absolutely need to move to something else.

replies(1): >>22359295 #

37. wolf550e ◴[18 Feb 20 18:14 UTC] No.22358508[source]▶

>>22358360 #

See https://aws.amazon.com/about-aws/whats-new/2019/04/amazon-ec...

replies(1): >>22358797 #

38. tempsy ◴[18 Feb 20 18:26 UTC] No.22358654[source]▶

>>22357194 (OP) #

AMD's stock is wild. It was around $2 just a few years ago and has been on a non-stop trend up to almost $60 today.

replies(2): >>22358979 #>>22360364 #

39. thedance ◴[18 Feb 20 18:27 UTC] No.22358673{3}[source]▶

>>22358388 #

Big claim there. On what basis?

replies(2): >>22358745 #>>22359133 #

40. api ◴[18 Feb 20 18:27 UTC] No.22358674{4}[source]▶

>>22358147 #

Don't know, but for pure numeric code these mitigations are not (AFAIK) that expensive. The main cost is incurred for any code that is syscall-heavy like I/O.

replies(1): >>22362383 #

41. theevilsharpie ◴[18 Feb 20 18:32 UTC] No.22358724[source]▶

>>22358369 #

> For example, the N2D instances are basically the price of the N1 instances or even cheaper with committed-use discounts. Given that they provide 39% more performance, should the N1 instances be considered obsolete once the N2D exits beta?

As the name implies, N2 is a newer generation than N1. I don't think Google has announced any official N1 deprecation timeline, but that product line clearly has an expiration date.

The more direct comparison would be Intel's N2 instances, vs. AMD's N2D instances. In that case, N2 instances are likely faster on a per-core basis and support some Intel-specific instructions, whereas N2D instances are substantially less expensive.

> Again, I want to reiterate that Google Cloud's simpler pricing is great, but complications have crept in.

That seems like an unavoidable consequence of maturing as a product offering: more options means more complexity. If Google tried to streamline everything and removed options to keep things simple, they'd have another cohort of users (including myself) screaming that the product doesn't meet their needs.

I suppose a "Help Me Choose" wizard that provides some opinionated guidance can be helpful to onboarding new users, but otherwise, I don't see how Google can win here.

replies(2): >>22358792 #>>22359535 #

42. Tuna-Fish ◴[18 Feb 20 18:34 UTC] No.22358745{4}[source]▶

>>22358673 #

You can pick almost any basis you like, it's still winning.

Most performance, most performance per watt, most performance per cost. Also, more performance per thread than high-threadcount intel chips. (Although, some of their low-threadcount Xeons do have an edge on that one.)

Oh, and best memory interface and best IO, too.

replies(1): >>22359230 #

43. ◴[18 Feb 20 18:37 UTC] No.22358783[source]▶

>>22358369 #

44. wmf ◴[18 Feb 20 18:37 UTC] No.22358792{3}[source]▶

>>22358724 #

They should just hide everything besides Rome under an "advanced" UI. ;-)

45. anthony_doan ◴[18 Feb 20 18:37 UTC] No.22358793{3}[source]▶

>>22358349 #

It's also expensive to get data out of cloud too. Getting data in is easy but getting it out will cost you.

46. Jyaif ◴[18 Feb 20 18:38 UTC] No.22358797{3}[source]▶

>>22358508 #

Your link is for the 1st gen EPYC cpus, while this announcement is for the 2nd gen.

47. kccqzy ◴[18 Feb 20 18:39 UTC] No.22358816[source]▶

>>22358369 #

I think for enterprise businesses, people just love choices. I don't know about GCP but I do know about highly paid AWS consultants producing detailed comparisons between instance types and make recommendations for companies to "save money." Or maybe some people just like the thrill of using spreadsheets and navigating the puzzle of pricing.

replies(1): >>22359338 #

48. theevilsharpie ◴[18 Feb 20 18:40 UTC] No.22358831{3}[source]▶

>>22358054 #

AMD-based AWS instances are running on first-generation Epyc processors.

Compared to the second-generation Epyc processors that Google is using, the first generation has lower clock speeds, can execute fewer instructions per clock (particularly in terms of floating-point operations), has substantially less cache, and has a more complicated memory topology that can negatively impact the performance of workloads that aren't NUMA-aware.

In short, your experience with AMD in AWS isn't relevant to Google's offerings.

49. rb808 ◴[18 Feb 20 18:42 UTC] No.22358852[source]▶

>>22358369 #

Cloud is a poor metaphor now. Its really a messy bunch of constellations where some people think they can see a pretty picture, most people just see random dots.

50. noahl ◴[18 Feb 20 18:46 UTC] No.22358902[source]▶

>>22357449 #

I don't work there any more, but back when I was at Google Cloud we developed our own benchmarking suite, partly to answer questions like this. It's open source, you can run it too: https://github.com/GoogleCloudPlatform/PerfKitBenchmarker

51. sdesol ◴[18 Feb 20 18:53 UTC] No.22358979[source]▶

>>22358654 #

They really are doing something disruptive. I can't quite remember if this is correct (it has been a while since I last studied business), but in business there is a "blue ocean strategy". The basic premise is, if you can provide a product for half the price, with the twice the value, you will destroy the incumbent.

What AMD is doing is really insane in my opinion. I'm not sure if they are pricing their processors low on purpose and/or if they have found a way to manufacture cheaper and/or Intel was screwing consumers with their pricing since they were so dominate.

No matter what, AMD is able to provide something that is measurably better and significantly cheaper than the incumbent, and if the blue ocean strategy holds, they should become the new incumbent in the near future.

replies(4): >>22359070 #>>22359139 #>>22359222 #>>22359223 #

52. blattimwind ◴[18 Feb 20 18:59 UTC] No.22359070{3}[source]▶

>>22358979 #

> What AMD is doing is really insane in my opinion. I'm not sure if they are pricing their processors low on purpose and/or if they have found a way to manufacture cheaper and/or Intel was screwing consumers with their pricing since they were so dominate.

Both. AMD uses chiplets for higher yields compared to Intel's huge monolithic processors (HCC, XCC), which lowers costs, and Intel jacked prices up because they had a monopoly.

replies(1): >>22365968 #

53. coder543 ◴[18 Feb 20 19:04 UTC] No.22359133{4}[source]▶

>>22358673 #

This article goes into great detail: https://www.servethehome.com/amd-epyc-7002-series-rome-deliv...

Another follow-up article: https://www.servethehome.com/amd-epyc-7702p-review-redefinin...

AMD is offering incredible performance on every metric: single threaded, multithreaded, total RAM per socket, PCIe 4.0, power consumption, total performance, total price, performance for price, etc.

Outside of some very niche applications, the only reason someone would choose Intel for servers right now is because "no one ever got fired for choosing Intel."

AMD's Epyc Rome processors are truly excellent, best-in-class processors.

replies(1): >>22359178 #

54. thesz ◴[18 Feb 20 19:05 UTC] No.22359139{3}[source]▶

>>22358979 #

Smaller chips have better yield. As AMD's current chips are composed from several smaller ones (I believe two or three), each composite has better yield than one bigger of same real estate size.

So yes, they figured out how to produce cheaper solutions.

replies(2): >>22359477 #>>22364166 #

55. thedance ◴[18 Feb 20 19:09 UTC] No.22359178{5}[source]▶

>>22359133 #

There are literally no server benchmarks anywhere in those articles. Unless you are planning to run a distributed C build node there's nothing in these articles that can inform your choice. Distributed building is an extremely narrow, niche use case. Where are the nginx and mysql and grpc benchmarks?

replies(1): >>22359277 #

56. boulos ◴[18 Feb 20 19:11 UTC] No.22359201{5}[source]▶

>>22357723 #

The blocker here would be the need for supporting nested on AMD parts (as E2 can choose between Intel and AMD parts).

57. bcrosby95 ◴[18 Feb 20 19:12 UTC] No.22359222{3}[source]▶

>>22358979 #

Not only disruptive to their industry, but game changing for those of us writing software. I remember reading all the hoopla over the loss of Dennard scaling 15 years ago (e.g. http://www.gotw.ca/publications/concurrency-ddj.htm). Intel poked along at 2 and 4 core consumer systems for so long.

The argument was always "no one can use more than X cores" - but software seems to trail hardware in these examples, not the reverse. When Zen was first released, many of the less expensive 6 core options performed worse than Intel's similarly priced 4 core chips. But when comparing modern software using those old parts, AMD's 6 core offerings tend to hold up better.

It feels like AMD is finally ushering us into an era where being able to take advantage of large amounts of parallelism is going to become important for almost every developer.

58. wyxuan ◴[18 Feb 20 19:12 UTC] No.22359223{3}[source]▶

>>22358979 #

Two problems is production capacity and data center, since AMD is competing against Apple for production by tsmc. on the point about data centers, Amd narrowly got on the train here. Semiconductors are a very cyclical industry and since the end of the business cycle is coming in Amd is in for a rough time. There's room for growth, but arguably the stock is priced in

replies(1): >>22359541 #

59. loeg ◴[18 Feb 20 19:13 UTC] No.22359230{5}[source]▶

>>22358745 #

Raw single-thread perf still matters for many workloads. Epyc doesn't come in especially high clock / fewer core configurations that are beneficial to some workloads. (Additionally: cloud vendors don't buy that end of configuration; they buy the high core count, high perf per watt configurations. E.g. GCE's N2D is the 2.25 GHz base clock, 64 core Epyc 7742 in a 2P configuration, but you can get EPYC 7302 with 16 cores at a 3 GHz base clock.)

For my business' workloads, Threadripper 3 (same gen 2 Zen, same IO chiplet, etc) would likely be a much better fit (and competitive with Intel) if AMD sold it with the same kind of enterprisey guarantees they do for Epyc (ECC, etc). Threadripper 3970x, for example, comes with 32 cores and a base clock of 3.7 GHz. That's a much better fit for us than Epyc 7742 or 7302.

replies(2): >>22359780 #>>22359867 #

60. outworlder ◴[18 Feb 20 19:14 UTC] No.22359250[source]▶

>>22358369 #

> are we getting too many options for compute

As compared to what, Azure? :)

61. coder543 ◴[18 Feb 20 19:17 UTC] No.22359277{6}[source]▶

>>22359178 #

There's absolutely no way you could've read those articles in the last four minutes. They go into great detail about what makes the Rome processors so important -- it's not just some random amalgamation of benchmarks, but the benchmarks serve to provide hard numbers that back up the textual analysis.

The benchmarks are not just "distributed compilation" either... that's a very misleading characterization. There was one compilation benchmark for the Linux kernel, and that's the only compilation benchmark I remember seeing.

No one benchmarks nginx because nginx can easily saturate the network card on a server without saturating the processor.

Here's a postgres benchmark: https://openbenchmarking.org/embed.php?i=2002066-VE-XEONEPYC...

Or a rocksdb benchmark: https://openbenchmarking.org/embed.php?i=2002066-VE-XEONEPYC...

MariaDB was a rare win for Intel: https://openbenchmarking.org/embed.php?i=2002066-VE-XEONEPYC...

("rare win" is literally the wording used in the Phoronix article: https://www.phoronix.com/scan.php?page=article&item=linux55-...)

ServeTheHome had access to more comprehensive Intel hardware, so I preferred to link to their articles, but Phoronix saw more of the same stuff.

Intel was thoroughly destroyed in every Linux review of Rome vs Intel's latest that I've seen. Intel can eke out some rare wins when applications are heavily optimized for the nuances of their CPUs, but it's not guaranteed even then.

If you can't be bothered to read articles to understand the answer to the question you asked, then this is my last reply.

replies(2): >>22359424 #>>22360748 #

62. outworlder ◴[18 Feb 20 19:18 UTC] No.22359291[source]▶

>>22357462 #

Are those instances available to all customers yet? We don't see those in our console.

replies(1): >>22360743 #

63. adamc ◴[18 Feb 20 19:18 UTC] No.22359295{3}[source]▶

>>22358483 #

That reflects the lumbering, bureaucratic nature of enterprises.

replies(4): >>22359401 #>>22359583 #>>22361276 #>>22363929 #

64. TuringNYC ◴[18 Feb 20 19:18 UTC] No.22359298[source]▶

>>22358369 #

Different chipsets may have slightly different capabilities. For example, I’ve been using NVIDIA RAPIDS recently. Not all NVIDIA cards support this particular framework’s needs. Sometimes you need to specifically direct customer installations to a specific type of card or chipset.

65. TuringNYC ◴[18 Feb 20 19:21 UTC] No.22359338{3}[source]▶

>>22358816 #

Or they prefer the certainty of being able to test software on specific hardware setups and be able to give customers higher levels of confidence.

66. scardycat ◴[18 Feb 20 19:26 UTC] No.22359401{4}[source]▶

>>22359295 #

Sure, I am sure they have something to say about how smaller internet companies move fast and break things with no concern for how it affects customers. I am sure you would rather have your bank just work than have some nifty tool with insufficient testing and that flakes out every other day.

Everyone to their own, I am just stating that there is a need.

67. thedance ◴[18 Feb 20 19:28 UTC] No.22359424{7}[source]▶

>>22359277 #

Every morning, I beg my God to make morons stop replying to me on HN. Today is the first day anyone has promised to make my dream come true.

I didn't read those articles in the last 4 minutes because I read them when they were published. A massively parallel run of 7zip was a really stupid benchmark in August and it remains stupid today.

These other benchmarks are certainly more relevant but none of them jumps out at me as a killer claim. An EPYC 7402 with 50% more cores, drawing 80% more power, and costing 35% more dollars than a Xeon Silver 4216 delivers 24% more pgsql ops per second. What TCO equation do you plug that into? I would describe these results as mixed.

replies(2): >>22359673 #>>22360754 #

68. dragontamer ◴[18 Feb 20 19:34 UTC] No.22359477{4}[source]▶

>>22359139 #

> As AMD's current chips are composed from several smaller ones (I believe two or three)

For EPYC, AMD is using nine chips: https://images.anandtech.com/doci/13561/amd_rome-678_678x452...

That's 1x I/O chip (kind of like a router), and 8x chips, each of which has 8x cores on it. Total for 64-cores / 128-threads across 8-compute chips, talking together through a central 1x I/O and Memory chip.

The I/O chip is the biggest for reasons: 1. Its made on a cheaper process. 2. It has worse performance than the compute chips. 3. Its required to be big because driving external I/O requires more power.

So the I/O chip can be made on a cheap / inefficient 14nm process, while the CPUs can be made on a more expensive 7nm process (maximizing clock rates, power-efficiency). The big I/O ports are going to eat up a lot of power regardless of 7nm or 14nm process, so might as well save money here.

69. mdasen ◴[18 Feb 20 19:39 UTC] No.22359535{3}[source]▶

>>22358724 #

> If Google tried to streamline everything...they'd have another cohort of users screaming that the product doesn't meet their needs

Except that they could simplify it without reducing flexibility.

For example, the difference between E-series and N-series is that E-series instances have the whole balancing thing. Instead of being a different instance type, it could be simplified into an option on available types and it would just give you a discount.

Likewise, some of it is about consistency. How much should sustained-usage give you a discount? 20%? 30%? 0%? There seems to be little difference to Google whether sustained-use an E2, N2, N2D, or N1 in terms of their costs and planning and yet the discount varies a lot.

It's not about fewer choices. It's more that the choices aren't internally consistent. N2 instances are supposed to be simply superior to N1 instance, but N1 instances cost the same as N2 instances for 1-year contract, 3-year contract, and on-demand. They're only more expensive for sustained-use which seems odd. Likewise, E2 instances are meant to give you a discount and they do give you a discount for 3 out of the 4 usage scenarios. The point is that there's no real reason for the pricing not to be consistent across the 4 usage scenarios (1-year, 3-year, on-demand, and sustained-use). That's where the complexity creeps in.

It's really easy to look and say, "ok, I have E2, N2D, and N2 instances in ascending price order and I can choose what I want." Except that the pricing doesn't work out consistently.

> N2 instances are likely faster on a per-core basis

Are they meant to be? Google's announcement makes it seem like they should be equivalent: "N2D instances provide savings of up to 13% over comparable N-series instances".

The point I'm trying to make isn't that they shouldn't offer choice. It's that the choice should be consistent to be easily understandable. E2 instances should offer a consistent discount. If N2 machines are the same price as N1 machines across 3 usage scenarios, they should be the same price across all 4. When you browse the pricing page, you can get into situations where you start thinking, "ok, the N1 instances are cheaper so do I need the improvements of the N2?" And then you start looking and you're like, "wait, the N2s are the same price....oh, just the same price most of the time." Then you start thinking, "I can certainly deal with the E2's balancing...oh, but it's the same price...well, it's cheaper except for sustained-use".

There doesn't seem to be a reason why sustained-use on N1s should be cheaper for Google than sustained-use on N2s. There doesn't seem to be a reason why sustained-use on E2s offers no discount - especially given that the 1-year E2 price offers the same 37% discount that the N1s offer.

It would be nice to go to the page and say, "I'm going with E2s." However, I go to the page and it's more like, "I'm going with E2s when I am going to do a 1-year commitment, but I'm going with N2Ds when I'm doing sustained-use without a commitment since those are the same price for better hardware with seemingly no reason and the N1s are just equal or more expensive so why don't they just move them to a 'legacy machine types' page". It's the inconsistency in the pricing for seemingly no reason that makes it tough, not the options. The fact that N2Ds are the same monthly price as E2s for sustained-use, but E2s are significantly cheaper in all other scenarios is the type of complexity that's the annoying bit.

EDIT: As an example, E2 instances are 20.7% cheaper on-demand, 20.7% cheaper with 1-year commitment, and 20.7% cheaper with 3-year commitment compared to N2D instances. That's wonderful consistency. Then we look at sustained use and it's 0.9% cheaper with no real explanation why. It's a weird pricing artifact that means that you aren't choosing, "this is the correct machine for the price/performance balance I'm looking for" but rather you're balancing three things: price, performance, and billing scenario.

replies(1): >>22361209 #

70. dragontamer ◴[18 Feb 20 19:39 UTC] No.22359541{4}[source]▶

>>22359223 #

The opposite.

AMD spent less money on TSMC's research. Apple has been bankrolling TSMC to get the latest and greatest process tech.

AMD reached 7nm not because AMD put the R&D research into it... but because they can ride on the coattails of Apple and TSMC's investments.

--------

TSMC and Apple simultaneously benefit: TSMC can spread the risk of the 7nm process to more companies. Apple still gets first-dibs on the technology (but they only need ~6months worth of factory time to build all the chips they need).

Its more surprising that Intel managed to stay ahead of TSMC / Apple for so long. The economics are kind of against Intel here. The more people working together on process tech, the more efficient the results get.

71. gabrielfv ◴[18 Feb 20 19:43 UTC] No.22359583{4}[source]▶

>>22359295 #

Any company has a problem, which they size the value that it has to be tackled for, so they set budget, measure options, raise possible trade-offs, eventually succeed at dealing with it or get bitten due to lack of proper future-proofing, not evaluating environment/requirements properly, rinse and repeat. Once you've got several problems that require this kind of approach, carefully handpicking the possibly-but-not-proven best solution is not only time consuming but might lead to potentially awfully impactful consequences. When the decision to switch cloud provider services to one that may get you offline for half an hour leading to a million-dollar revenue impact, that's when we're talking enterprise.

replies(1): >>22376347 #

72. acdha ◴[18 Feb 20 19:51 UTC] No.22359651{3}[source]▶

>>22358349 #

It’s a question of how dynamic your usage is and how much the better security and management features save you. If you have very consistent workloads with little idle hardware and modest admin needs you can beat cloud environments even with reservations but usually when I see the numbers it means major costs like staffing or power / HVAC aren’t being factored in.

73. coder543 ◴[18 Feb 20 19:53 UTC] No.22359673{8}[source]▶

>>22359424 #

Calling someone a moron is not acceptable on HN, to start with, but it's also just not a great way to conduct a discussion. Secondly, you explained that you HAVE read the articles, so why would I stop replying to you?

> These other benchmarks are certainly more relevant but none of them jumps out at me as a killer claim. An EPYC 7402 with 50% more cores, drawing 80% more power, and costing 35% more dollars than a Xeon Silver 4216 delivers 24% more pgsql ops per second. What TCO equation do you plug that into? I would describe these results as mixed.

That's some interesting cherry picking. If I may do some of my own...

- The Epyc 7642 is doing 66% more pg sql ops per second than the Silver 4216, but only using an average of 34% more power than the 4216.

- The Xeon Platinum 8253 is consuming about the same amount of power as the Epyc 7402, costs twice as much, and yet the 7402 is performing 34% faster.

The Xeon Silver 4216 is competitive in this one benchmark, and you declare that the results are "mixed". It gets thoroughly destroyed in tons of other benchmarks.

So, yes, if you will only ever run this specific version of MariaDB on this one server, then it might be a toss up... IF you don't benefit from using PCIe 4.0 to access more (or faster) SSDs, and you don't want to have the option of putting in more RAM.

AMD is consistently better in the overwhelming majority of benchmarks here, especially as you get away from the low end. Saying that Intel has one "toss up" victory in the low end category is not exactly a ringing endorsement to pick Intel here.

74. kllrnohj ◴[18 Feb 20 20:04 UTC] No.22359780{6}[source]▶

>>22359230 #

> For my business' workloads, Threadripper 3 (same gen 2 Zen, same IO chiplet, etc) would likely be a much better fit (and competitive with Intel) if AMD sold it with the same kind of enterprisey guarantees they do for Epyc (ECC, etc).

Threadripper has official support for ECC. Well, "optional" based off of the motherboard's support: https://www.amd.com/en/chipsets/str40

And just picking a random board: https://www.gigabyte.com/Motherboard/TRX40-AORUS-XTREME-rev-... you'll see it listed:

"Support for ECC Un-buffered DIMM 1Rx8/2Rx8 memory modules"

That it must be un-buffered is an annoying market segmentation thing that limits your max RAM in practice, BUT you can at least get ECC up to 256GB with official support and RAM modules that actually exist.

replies(1): >>22361230 #

75. vel0city ◴[18 Feb 20 20:13 UTC] No.22359867{6}[source]▶

>>22359230 #

AMD does sell Threadripper with the same enterprisey guarantees as they do for Epyc, at least in regards to ECC.

>Quad-Channel DDR4 ECC Memory Support >With the most memory channels you can get on desktop6, the Ryzen™ Threadripper™ processor can support Workstation Standard DDR4 ECC (Error Checking & Correction Mode) Memory to keep you tight, tuned and perfectly in sync.

https://www.amd.com/en/products/ryzen-threadripper

ECC is also supported in the desktop class CPUs and chipsets.

replies(1): >>22361270 #

76. 013a ◴[18 Feb 20 20:30 UTC] No.22360053[source]▶

>>22358369 #

Realistically; a typical hyperscale cloud provider has tens/hundreds of millions of dollars invested into a specific CPU platform. It makes very little sense to just throw it out chasing some idealism like "simplicity"; the world is not simple.

You can be like Digitalocean and just say "You want a CPU core, you get a CPU core, no guarantee what it'll be". Most enterprises won't buy this. But, I think there's some interesting use-cases where even a hyperscale provider targeting enterprises could (and do) utilize this; not on an EC2-like product, but as the infrastructure for something like Lambda, or to run the massive number of internal workloads necessary to power highly-managed cloud workloads.

77. boulos ◴[18 Feb 20 20:38 UTC] No.22360134{5}[source]▶

>>22357725 #

Not implemented yet. In the stack rank of “stuff needed to update our hypervisor for AMD again” it wasn’t at the top :).

Note the again as well: GCE originally had it such that N in N1 meant iNtel, and A1 was for AMD (as Joe said publicly here: https://twitter.com/jbeda/status/1159891645531213824). By the time I joined though, we didn’t see the point of the A1 parts, since the Sandybridge’s smoked them.

78. cvallejo ◴[18 Feb 20 20:43 UTC] No.22360182[source]▶

>>22357996 #

disclosure: I work at google!

We will be expanding the regional footprint of N2D. US-west1 should come online in the first half of this year.

replies(1): >>22360739 #

79. boulos ◴[18 Feb 20 20:46 UTC] No.22360220[source]▶

>>22357194 (OP) #

Disclosure: I work on Google Cloud.

This has come up a few times, so I wanted to reiterate that these are the Zen2/Rome parts not the first generation “Naples” parts. We didn’t bother launching Naples for GCE, because (as you can see) Rome is a huge step up.

replies(1): >>22383590 #

80. imajoo ◴[18 Feb 20 20:56 UTC] No.22360327{4}[source]▶

>>22358147 #

AWS patched their hosts back in 2018 which caused a huge shitstorm in terms of performance lost across various workloads..so yes.

https://blog.appoptics.com/visualizing-meltdown-aws/

replies(1): >>22362376 #

81. boulos ◴[18 Feb 20 20:59 UTC] No.22360348[source]▶

>>22358369 #

Disclosure: I work on Google Cloud (and really care about this).

The challenge here is balancing diverse customer workloads against the processor vendors. Historically, at Google, we just bought a single server variant (basically) because almost all code is expected to care primarily about scale-out environments. That made the GCE decision simple: offer the same hardware we build for Google, at great prices.

The problem is that many customers have workloads and applications that they can’t just change. No amount of rational discounting or incentives makes a 2 GHz processor compete with a 4 GHz processor (so now, for GCE, we buy some speedy cores and call that Compute Optimized). Even more strongly, no amount of “you’re doing it wrong” actually is the right answer for “I have a database on-prem that needs several sockets and several TB of memory” (so, Memory Optimized).

There’s an important reason though that we refer to N1, N2, N2D, and E2 as “General purpose”: we think they’re a good balanced configuration, and they’ll continue to be the right default choice (and we default to these in the console). E2 is more like what we do internally at Google, by abstracting away processor choice, and so on. As a nit to your statement above, E2 does flip between Intel and AMD.

You should choose the right thing for your workloads, primarily subject to the Regions you need them in. We’ll keep trying to push for simplicity in our API and offering, but customers really do have a wide range of needs, which imposes at least some minimum amount of complexity. For too long (probably) we attempted to refuse, because of complexity, both for us and customers. Feel free to ignore it though!

replies(5): >>22360916 #>>22361132 #>>22361552 #>>22363902 #>>22365298 #

82. overcast ◴[18 Feb 20 21:01 UTC] No.22360364[source]▶

>>22358654 #

Wild is to put it mildly, the stock is historically extremely volatile. If past performance predicts anything, it could potentially tank in the next couple of years, like it did the last few times it shot up. Obviously they are doing some great work, but I wouldn't go all in on them for the long haul.

replies(1): >>22361933 #

83. colinmcdonald22 ◴[18 Feb 20 21:47 UTC] No.22360739{3}[source]▶

>>22360182 #

Not directly related, but do you know when we can expect to see N2 instances in us-east1-b? Currently they're in 2/3 of the zones, just one annoying zone short of being able to use it in my GKE cluster.

84. boulos ◴[18 Feb 20 21:47 UTC] No.22360743{3}[source]▶

>>22359291 #

See my other comment about the rollout proceeding now. You should see them sometime this week, in theory.

85. dang ◴[18 Feb 20 21:48 UTC] No.22360748{7}[source]▶

>>22359277 #

Please don't cross into personal attack. It breaks HN's rules and invites worse from others. Your comment would be fine without the last (and arguably also the first) bits.

https://news.ycombinator.com/newsguidelines.html

replies(2): >>22360781 #>>22360879 #

86. dang ◴[18 Feb 20 21:50 UTC] No.22360754{8}[source]▶

>>22359424 #

Please don't cross into personal attack or break the site guidelines even if someone else started it. That's how we get a downward spiral. Conversely, if you respond by not getting personal and sticking to the site guidelines, you contribute to preserving the commons.

https://news.ycombinator.com/newsguidelines.html

87. ◴[18 Feb 20 21:54 UTC] No.22360781{8}[source]▶

>>22360748 #

88. ◴[18 Feb 20 22:05 UTC] No.22360879{8}[source]▶

>>22360748 #

89. ksec ◴[18 Feb 20 22:07 UTC] No.22360897[source]▶

>>22357462 #

I am pretty sure it is not Your / Google Cloud's fault given AWS and Azure are in similar situation here, but;

ROME was announced in Later 2018, Released on Aug / September 2019. Why did it take another 6 months to roll out an new instance? When I assume you must have had samples since early 2019.

90. milesward ◴[18 Feb 20 22:11 UTC] No.22360916{3}[source]▶

>>22360348 #

We need some kinda shortcut: like, run your app for a few days on an instance, we chew your stackdriver metrics, we make a new shortcut n3-mybestinstance, which picks the right shape/processor family etc for yah.

replies(1): >>22360973 #

91. elithrar ◴[18 Feb 20 22:17 UTC] No.22360973{4}[source]▶

>>22360916 #

As a Googler: take VM rightsizing recommendations - "save $X because you're underutilizing this machine shape" - and extend them to encompass this by including VM-family swaps based on underlying VM metrics? :)

92. alfalfasprout ◴[18 Feb 20 22:35 UTC] No.22361132{3}[source]▶

>>22360348 #

I mean, this mentality often is wrong. Scaling out actually isn't the right solution for everyone. It works for Google given that primarily web services are offered. It does not work for workloads that heavily rely on the CPU (think financial workloads, ML, HPC/scientific workloads) or have realtime requirements. In fact, for many ETL workloads vertical scaling proves far more efficient.

It's long been the "google way" to try and abstract out compute but it's led to an industry full of people trying to follow in their way and overcomplicating what can be solved on one or two machines.

replies(3): >>22361446 #>>22363919 #>>22364549 #

93. ◴[18 Feb 20 22:43 UTC] No.22361209{4}[source]▶

>>22359535 #

94. loeg ◴[18 Feb 20 22:46 UTC] No.22361230{7}[source]▶

>>22359780 #

Yep, I'm familiar with all that.

Yeah, it's that "optional" part that is problematic for ECC in particular. But don't let that be a distraction; there are plenty of other enterprisey features in Epyc that are not present in TR, including registered memory support.

Re: 256GB ECC UDIMM on an 8-socket TR board, that's 32GB a DIMM. I guess you can find 32 GB ECC UDIMMs now, but that's pretty recent and expensive.

replies(1): >>22361520 #

95. loeg ◴[18 Feb 20 22:52 UTC] No.22361270{7}[source]▶

>>22359867 #

ECC isn't the only feature, please don't focus on that at the expense of missing the point. They do some market segmentation between TR and Epyc and it is material.

96. bcrosby95 ◴[18 Feb 20 22:53 UTC] No.22361276{4}[source]▶

>>22359295 #

Lumbering enterprises that spend billions of dollars on IT.

97. erulabs ◴[18 Feb 20 23:15 UTC] No.22361446{4}[source]▶

>>22361132 #

Except, almost without exception, eventually the one or two machines will fall over. Ideally you can engineer your way around this ahead of time - but not always. Fundamentally relying on a few specific things (or people) will always be an existential risk to a big firm. Absolutely agree re: start small - but the problem with “scale out” is a lack of good tooling - not a fundamental philosophical one.

replies(5): >>22361542 #>>22362219 #>>22362898 #>>22363481 #>>22363618 #

98. kllrnohj ◴[18 Feb 20 23:25 UTC] No.22361520{8}[source]▶

>>22361230 #

> But don't let that be a distraction; there are plenty of other enterprisey features in Epyc that are not present in TR, including registered memory support.

Then don't make your example be ECC specifically. It's the only thing you listed, I wasn't "distracted" by it. And I also even commented on the lack of registered memory support, so I don't know why you're repeating that back to me?

99. aidenn0 ◴[18 Feb 20 23:28 UTC] No.22361542{5}[source]▶

>>22361446 #

Plenty of services can deal with X hours of downtime when a single machine fails for values of X that are longer than it takes to restore to a new machine from backups.

replies(2): >>22362715 #>>22364102 #

100. mdasen ◴[18 Feb 20 23:30 UTC] No.22361552{3}[source]▶

>>22360348 #

This makes a lot of sense, but it doesn't explain why the pricing isn't consistent. Why is an N1 the same price as an N2, except for sustained-use? Why is an E2 cheaper than an N1/N2D, except for sustained-use?

E2 is just such an amazing idea that feels like it's going to be under-utilized because it isn't cheaper for the sustained-use case. There doesn't seem to be any reason why E2 would be more expensive (to Google) for sustained-use and not for on-demand or committed.

Google Cloud is really nice, but the inconsistent pricing/discounting between the different types seems odd. Like, I'm running something on N1 right now with sustained-use because there's no incentive for me to switch to E2. It feels a bit wasteful since it doesn't get a lot of traffic and would be the perfect VM to steal resources from. However, I'd only get a discount if I did a 1-year commitment. For Google, I'm tying up resources you could put to better use. E2 instances are usually 30% cheaper which would give me a nice incentive to switch to them, but without the sustained-use discount, N2D and N1 instances become the same price. So, I end up tying up hardware that could be used more efficiently.

replies(1): >>22362316 #

101. 867-5309 ◴[18 Feb 20 23:31 UTC] No.22361558[source]▶

>>22357194 (OP) #

no pricing mentioned

I was surprised to discover the other day that one of my VPSs had been upgraded from 1 old Xeon 26XX core to 2 EPYC cores. other stats unmetered 10Gb/s up/down, low latency A'dam location, 2GB RAM, SSD.. it even outperformed my i7-8700T in a single-core openSSL benchmark. most importantly it costs €3/mo

I really can't see google competing with that

replies(2): >>22361601 #>>22362088 #

102. Dontrememberit ◴[18 Feb 20 23:38 UTC] No.22361601[source]▶

>>22361558 #

Which VPS?

replies(2): >>22361766 #>>22361769 #

103. bob1029 ◴[18 Feb 20 23:48 UTC] No.22361664{3}[source]▶

>>22358210 #

Their topology tells far richer tales than the press releases when you dig deeper on the numbers.

224 threads = 112 HT cores = 2 x 56 core CPUs. This is 8 cores short of the 64 core flagship. 8 cores == 1 CCX.

It seems exceedingly unlikely that AMD would produce a Rome CPU with 7 out of 8 CCX in perfect health, but have the 8th CCX completely missing (functionally). It seems more likely that the 8th CCX is there with all 8 cores, and that it is reserved for some other type of service. One possibility could be that there are higher guarantees of side channel protection at the CCX boundary, and google intends to use this for secure internal functions or sell to clients who have side channel sensitivity. Another may be that they simply want the hypervisor to have a very fat budget of 8 cores per socket to work with. Considering the amount of potential IO going on, this might be required in some cases.

replies(1): >>22363799 #

104. 867-5309 ◴[19 Feb 20 00:01 UTC] No.22361766{3}[source]▶

>>22361601 #

Scaleway

105. anderspitman ◴[19 Feb 20 00:01 UTC] No.22361769{3}[source]▶

>>22361601 #

Yeah that seems a little too good.

replies(1): >>22366615 #

106. tempsy ◴[19 Feb 20 00:33 UTC] No.22361933{3}[source]▶

>>22360364 #

to each their own. can't have reward without risk. hard to find another investment that returned 30x in a few years.

replies(1): >>22367284 #

107. crazysim ◴[19 Feb 20 00:52 UTC] No.22362049{5}[source]▶

>>22357787 #

Yeah, this can be done for nested virtualization instances (images) too.

108. e12e ◴[19 Feb 20 00:57 UTC] No.22362088[source]▶

>>22361558 #

I see 10gps listed at USD 569/month?

https://www.scaleway.com/en/virtual-instances/general-purpos...

replies(2): >>22365672 #>>22366563 #

109. leemoonsoo ◴[19 Feb 20 01:13 UTC] No.22362184[source]▶

>>22357478 #

When gVisor is enabled, hyper-threading is disabled (https://cloud.google.com/kubernetes-engine/docs/concepts/san...) and GKE cluster shows only half the CPU as available resources.

Does n2d instance is considered safe from CPU vulnerability and safely enable hyper-threading when used with gVisor?

110. carbocation ◴[19 Feb 20 01:15 UTC] No.22362199[source]▶

>>22357194 (OP) #

Any idea when these will be available on the genomics pipeline API (now "Cloud Life Sciences" API)?

111. alfalfasprout ◴[19 Feb 20 01:19 UTC] No.22362219{5}[source]▶

>>22361446 #

It is a philosophical one when you design around scaling out at a high rate. You incur significant additional complexity in many cases along with increased overhead.

It's fallacious to think that relying on "n" things is strictly safer than "3" things where n is large. That's not quite true due to the significant complexity increases when dealing with large "n" and accompanying overhead.

For web applications (which I suspect the majority of HN readers work on) then sure, but plenty of realtime or safety critical applications are perfectly ok with three-way redundancy.

replies(1): >>22364859 #

112. throwaway2048 ◴[19 Feb 20 01:33 UTC] No.22362316{4}[source]▶

>>22361552 #

Pricing confusion is a cornerstone of big single shop vendors, the more confusing you make pricing, the more chances a customer is going to spend more than they otherwise might.

Also opens avenues for highly paid consultants to dip their beak and promote your products.

113. tkinz27 ◴[19 Feb 20 01:43 UTC] No.22362369[source]▶

>>22357194 (OP) #

Its really great to see more AMD options for cloud instances. Now I'm just waiting for more ARM architecture options. Not having to cross compile code (a1/m6g AWS instance types) has been very useful in my day to day job.

114. dilyevsky ◴[19 Feb 20 01:45 UTC] No.22362376{5}[source]▶

>>22360327 #

Afaik amd patches aren’t as penalizing as Intel ones. I have no idea if ^^^ refers to bare metal, bare metal w/ patches or cloud env (unlikely) hence the question.

replies(1): >>22362417 #

115. dilyevsky ◴[19 Feb 20 01:46 UTC] No.22362383{5}[source]▶

>>22358674 #

Pure numeric benches aren’t very useful irl tho. My hunch is with security patches single core will be very comparable

116. imajoo ◴[19 Feb 20 01:52 UTC] No.22362417{6}[source]▶

>>22362376 #

The link above is specific to AWS -- hence, "Visualizing Meltdown on AWS" as their title.

117. chucky_z ◴[19 Feb 20 03:03 UTC] No.22362715{6}[source]▶

>>22361542 #

I'd like to add to this and say that a server being down for 6 hours, that if over the life of its uptime (months? years?) saves uncountable number of hours on computations and complexity, is so worth it.

Heck, even a machine like that being down for a week is usually still worth it.

118. PudgePacket ◴[19 Feb 20 03:49 UTC] No.22362898{5}[source]▶

>>22361446 #

I've heard lots of anecdotes from big sites doing fine with a small number of machines.

eg stackoverflow only has 1 active DB server, and 1 backup.

https://stackexchange.com/performance

replies(1): >>22363499 #

119. lmeyerov ◴[19 Feb 20 06:23 UTC] No.22363481{5}[source]▶

>>22361446 #

Totally. We could replace our GPU stack with who knows how many CPUs to hit the same 20ms SLAs, and we'll just pretend data transfer overhead doesn't exist ;-)

More seriously, we're adding multi-node stuff for isolation and multi-GPU for performance. Both are quite different... and useful!

120. endymi0n ◴[19 Feb 20 06:28 UTC] No.22363499{6}[source]▶

>>22362898 #

Actually, this makes a lot of sense. Reasoning about a single machine is just way simpler and keeps the full power of a modern transactional database at your fingertips. Backups keep taking longer and disaster recovery isn‘t as speedy anymore, but we‘re running on Postgres internally as well and I‘d scale that server as big as possible (even slightly beyond linear cost growth, which is pretty huge these days) before even thinking about alternatives.

121. mrich ◴[19 Feb 20 07:05 UTC] No.22363618{5}[source]▶

>>22361446 #

The solution often is to have a warm standby that can take over immediately. You do not get any distributed overhead that is present in a fully load-balanced system during normal operation, and only pay a small amount in the very exceptional failure case.

122. smueller1234 ◴[19 Feb 20 07:52 UTC] No.22363799{4}[source]▶

>>22361664 #