The Future of Big Iron: An Interview with IBM’s Christian Jacobi

(morethanmoore.substack.com)

Show context

froh ◴[15 Oct 24 20:48 UTC] No.41852945[source]▶

Jacobi is one of 70 IBM Fellows (think IBM internal professors, free reign over a research budget, you gain the title with technical prowess plus business acumen)

at the heart of the Mainframe success is this:

> I’d say high-availability and resiliency means many things, but in particular, two things. It means you have to catch any error that happens in the system - either because a transistor breaks down due to wear over the lifetime, or you get particle injections, or whatever can happen. You detect the stuff and then you have mechanisms to recover. You can't just add this on top after the design is done, you have to be really thinking about it from the get-go.

and then he goes into details how that is achieved. the article nicely goes into some details.

oh and combine the 99.9999999% availability "nine nines" with insane throughput. as in real time phone wiretapping throughput, or real time mass financial transactions, of course.

or a web server for an online image service.

or "your personal web server in a mouse click", sharing 10.000 such virtual machines on a single physical machine. which has a shared read only /ist partition mounted into all guests. not containers, no, virtual machines, in ca 2006...

"don't trust a computer you can lift"

replies(3): >>41853129 #>>41861040 #>>41878681 #

wolf550e ◴[15 Oct 24 21:06 UTC] No.41853129[source]▶

>>41852945 #

The amount of throughput you can get out of AMD EPYC zen5 servers for the price of a basic mainframe is insane. Even if IBM wins in single core perf using absurd amount of cache and absurd cooling solution, the total rack throughput is definitely won by "commodity" hardware.

replies(2): >>41853447 #>>41853880 #

neverartful ◴[15 Oct 24 22:41 UTC] No.41853880[source]▶

>>41853129 #

These comments always come up with every mainframe post. It's not only about performance. If it were it would be x86 or pSystems (AIX/POWER). The reason customers buy mainframes is RAS (reliabililty, availability, scalability). Notice that performance is not part of RAS.

replies(1): >>41854043 #

jiggawatts ◴[15 Oct 24 23:09 UTC] No.41854043[source]▶

>>41853880 #

You and the parent are both "missing the point", which is sadly not talked about by the manufacturer either (IBM).

I used to work for Citrix, which is "software that turns Windows into a mainframe OS". Basically, you get remote thin terminals the same as you would with an IBM mainframe, but instead of showing you green text you get a Windows desktop.

Citrix used to sell this as a "cost saving" solution that inevitably would cost 2-3x the same as traditional desktops.

The real benefit for both IBM mainframes and Citrix is: latency.

You can't avoid the speed of light, but centralising data and compute into "one box" or as close as you can get it (one rack, one data centre, etc...) provides enormous benefits to most kinds of applications.

If you have some complex business workflow that needs to talk to dozens of tables in multiple logical databases, then having all of that unfold in a single mainframe will be faster than if it has to bounce around a network in a "modern" architecture.

In real enterprise environments (i.e.: not a FAANG) any traffic that has to traverse between servers will typically use 10 Gbps NICs at best (not 100 Gbps!), have no topology optimisation of any kind, and flow through at a minimum one load balancer, one firewall, one router, and multiple switches.

Within a mainframe you might have low double-digit microsecond latencies between processes or LPARs, across an enterprise network between services and independent servers its not unusual to get well over one millisecond -- one hundred times slower.

This is why mainframes are still king for many orgs: They're the ultimate solution for dealing with speed-of-light delays.

PS: I've seen multiple attempts to convert mainframe solutions to modern "racks of boxes" and it was hilarious to watch the architects be totally mystified as to why everything was running like slow treacle when on paper the total compute throughput was an order of magnitude higher than the original mainframe had. They neglected latency in their performance modelling, that's why!

replies(3): >>41854112 #>>41854634 #>>41854691 #

neverartful ◴[15 Oct 24 23:19 UTC] No.41854112[source]▶

>>41854043 #

The mainframe itself (or any other platform for that matter) is not magical with regards to latency. It's all about proper architecture for the workload. Mainframes do provide a nice environment for being able to push huge volumes of IO though.

replies(2): >>41854429 #>>41854909 #

1. jiggawatts ◴[16 Oct 24 00:19 UTC] No.41854429[source]▶

>>41854112 #

Again, missing the point. Just look at the numbers.

Mainframe manufacturers talk about "huge IO throughputs" but a rack of x86 kit with ordinary SSD SAN storage will have extra zeroes on the aggregate throughput. Similarly, on a bandwidth/dollar basis, Intel-compatible generic server boxes are vastly cheaper than any mainframe. Unless you're buying the very largest mainframes ($billions!), then for the same price a single Intel box will practically always win if you spend the same budget. E.g.: just pack it full of NVMe SSDs and enjoy ~100GB/s cached read throughput on top of ~20GB/s writes to remote "persistent" storage.

The "architecture" here is all about the latency. Sure, you can "scale" a data centre full of thousands of boxes far past the maximums of any single mainframe, but then the latency necessarily goes up because of physics, not to mention the practicalities of large-scale Ethernet networking.

The closest you can get to the properties of a mainframe is to put everything into one rack and use RDMA with Infiniband.

replies(2): >>41854644 #>>41854988 #

2. throw4950sh06 ◴[16 Oct 24 00:57 UTC] No.41854644[source]▶

>>41854429 (TP) #

> The closest you can get to the properties of a mainframe is to put everything into one rack and use RDMA with Infiniband.

Or PCIe... I really would like to try building that.

replies(1): >>41855282 #

3. Spooky23 ◴[16 Oct 24 02:16 UTC] No.41854988[source]▶

>>41854429 (TP) #

You have to think of the mainframe as a platform like AWS or Kubernetes or VMWare. Saying “AWS has huge throughput” is meaningless.

The features of the platform are the real technical edge. You need to use those features to get the benefits.

I’ve moved big mainframe apps to Unix or windows systems. There’s no magic… you just need to refactor around the constraints of the target system, which are different than the mainframe.

replies(1): >>41858038 #

4. jiggawatts ◴[16 Oct 24 03:19 UTC] No.41855282[source]▶

>>41854644 #

I'm fairly certain you can't create a "mesh" with PCIe between multiple hosts. It's more like USB instead of Ethernet.

replies(2): >>41858151 #>>41877334 #

5. froh ◴[16 Oct 24 11:54 UTC] No.41858038[source]▶

>>41854988 #

what you hint at is that most workloads today don't need most of the mainframe features any more, any you can move them to commodity hardware.

There is much less need for most business functions to sit on a mainframe.

However the mainframe offers some availability features in hardware and z/VM, which you need to compensate for in software and system architecture, if failure is not an option, business-wise.

and if your organisation can build such a fail-operational system and software solution, then there is no reason today to stay on the mainframe. it's indeed more a convenience these days than anything else.

replies(1): >>41859031 #

6. throw4950sh06 ◴[16 Oct 24 12:10 UTC] No.41858151{3}[source]▶

>>41855282 #

Don't treat the CPU board as the host but a peripheral. ;-)

But I'd say it's much closer to Ethernet than USB. You have controllers (routers), switches and nodes... USB doesn't, not like this.

7. neverartful ◴[16 Oct 24 13:50 UTC] No.41859031{3}[source]▶

>>41858038 #

I agree with most of this. I believe that mainframes have an advantage when you look at environmental factors (power consumption and cooling).

8. pezezin ◴[18 Oct 24 08:12 UTC] No.41877334{3}[source]▶

>>41855282 #

You can't with standard PCIe, but you will be able to do it with CXL, although I don't know of any server platform that uses it yet.

↑