Most active commenters

matt-p(5)
liquidgecka(4)
ChuckMcM(3)
bri3d(3)

Popular/hot comments

>>45018536 #
>>45018608 #
>>45019151 #

←back to thread

Google's Liquid Cooling

(chipsandcheese.com)

Show context

jonathaneunice ◴[25 Aug 25 18:58 UTC] No.45017586[source]▶

>>45016720 (OP) #

It’s very odd when mainframes (S/3x0, Cray, yadda yadda) have been extensively water-cooled for over 50 years, and super-dense HPC data centers have used liquid cooling for at least 20, to hear Google-scale data center design compared to PC hobbyist rigs. Selective amnesia + laughably off-target point of comparison.

replies(6): >>45017651 #>>45017716 #>>45018092 #>>45018513 #>>45018785 #>>45021044 #

1. liquidgecka ◴[25 Aug 25 20:20 UTC] No.45018513[source]▶

>>45017586 #

[bri3d pointed out that I missed an element of this. There is a transfer between rack level and machine level coolant which makes this far less novel than I had initially understood. See their direct comment to this]

I posed this further down in a reply-to-a-reply but I should call it out a little closer to the top: The innovation here is not “we are using water for cooling”. The innovation here is that they are direct cooling the servers with chillers that are outside of the facility. Most mainframes will use water cooling to get the heat from the core out to the edges where traditional where it can be picked up by the typical heatsink/cooling fans. Even home PCs do this by moving the heat to a reservoir that can be more effectively cooled.

What Google is doing is using the huge chillers that would normally be cooling the air in the facility to cool water which is directly pumped into every server. The return water is then cooled in the chiller tower. This eliminates ANY air based transfer besides the chiller tower. This is one being done a server or a rack.. its being done on the whole data center all at once.

I am super curious how they handle things like chiller maintenance or pump failures. I am sure they have redundancy but the system for that has to be super impressive because it can’t be offline long before you experience hardware failure!

[Edit: It was pointed out in another comment that AWS is doing this as well and honestly their pictures make it way clearer what is happening: https://www.aboutamazon.com/news/aws/aws-liquid-cooling-data...]

replies(5): >>45018536 #>>45018749 #>>45018898 #>>45019376 #>>45023339 #

2. ambicapter ◴[25 Aug 25 20:22 UTC] No.45018536[source]▶

>>45018513 (TP) #

So every time they plug in a server they also plug in water lines?

replies(5): >>45018606 #>>45018608 #>>45018698 #>>45019151 #>>45020013 #

3. jedberg ◴[25 Aug 25 20:28 UTC] No.45018606[source]▶

>>45018536 #

Looks like it. New server means power, internet, and water.

replies(1): >>45018863 #

4. liquidgecka ◴[25 Aug 25 20:29 UTC] No.45018608[source]▶

>>45018536 #

[I am not a current Google Employee so my understanding of this is based on externally written articles and “leap of faith” guestimation]

Yes. A supply and return line along with power. Though if I had to guess how its setup this would be done with some super slick “it just works” kind of mount that lets them just slide the case in and lock it in place. When I was there almost all hardware replacement was made downright trivial so it could just be more or less slide in place and walk away.

replies(4): >>45018985 #>>45019109 #>>45019142 #>>45019181 #

5. ajb ◴[25 Aug 25 20:39 UTC] No.45018698[source]▶

>>45018536 #

I remember reading somewhere that they don't operate at the level of servers; if one dies they leave it in place until they're ready to replace the whole rack. Don't know if that's true now, though.

It does sound like connections do involve water lines though. As they are isolating different water circuits, in theory they could have a dry connection between heat exchanger plates, or one made through thermal paste. It doesn't sound like they're doing that though.

replies(2): >>45019104 #>>45019128 #

6. nitwit005 ◴[25 Aug 25 20:44 UTC] No.45018749[source]▶

>>45018513 (TP) #

This was before I was born, so I'm hardly an expert, but I've heard of feeding IBM mainframes chilled water. A quick check of wikipedia found some mention of the idea: https://en.wikipedia.org/wiki/IBM_3090

replies(2): >>45018879 #>>45019802 #

7. fudgy73 ◴[25 Aug 25 20:54 UTC] No.45018863{3}[source]▶

>>45018606 #

just like humans.

8. ChuckMcM ◴[25 Aug 25 20:55 UTC] No.45018879[source]▶

>>45018749 #

When our mainframe in 1978 sprung a leak in its water cooling jacket it took down the main east/west node on IBMs internal network at the time. :-). But that was definitely a different chilling mechanism than the types Google uses.

9. ChuckMcM ◴[25 Aug 25 20:56 UTC] No.45018898[source]▶

>>45018513 (TP) #

Much of the Google use of liquid chillers was protected behind NDAs as part of its "hidden advantage" with respect to the rest of the world. It was the secret behind really low PUE numbers.

replies(1): >>45022161 #

10. nielsbot ◴[25 Aug 25 21:04 UTC] No.45018985{3}[source]▶

>>45018608 #

Maybe similar to a gasoline hose breakaway

https://www.opwglobal.com/products/us/retail-fueling-product...

11. cavisne ◴[25 Aug 25 21:13 UTC] No.45019104{3}[source]▶

>>45018698 #

Definitely not true for these workloads. TPUs are interconnected, one dying makes the whole cluster significantly less useful.

12. michaelt ◴[25 Aug 25 21:13 UTC] No.45019109{3}[source]▶

>>45018608 #

Interestingly, entire supercomputers have been decommissioned [1] due to faulty quick disconnects causing water spray.

So you can get a single, blind-mating connector combining power, data and water - but you might not want to :)

[1] https://gsaauctions.gov/auctions/preview/282996

13. liquidgecka ◴[25 Aug 25 21:15 UTC] No.45019128{3}[source]▶

>>45018698 #

It has not been true for a LONG time. That was part of Google early “compute unit” strategy that involved things like sealed containers and such. Turns out that’s not super efficient or useful because you leave large swaths of hardware idle.

In my day we had software that would “drain” a machine and release it to hardware ops to swap the hardware on. This could be a drive, memory, CPU or a motherboard. If it was even slightly complicated they would ship it to Mountain View for diagnostic and repair. But every machine was expected to be cycled to get it working as fast as possible.

We did a disk upgrade on a whole datacenter that involved switching from 1TB to 2TB disks or something like that (I am dating myself) and total downtime was so important they hired temporary workers to work nights to get the swap done as quickly as possible. If I remember correctly that was part of the “holy cow gmail is out of space!” chaos though, so added urgency.

replies(2): >>45022375 #>>45025880 #

14. semi-extrinsic ◴[25 Aug 25 21:16 UTC] No.45019142{3}[source]▶

>>45018608 #

https://amphenol-industrial.com/products/liquid-cooling-syst...

replies(1): >>45019421 #

15. jayd16 ◴[25 Aug 25 21:17 UTC] No.45019151[source]▶

>>45018536 #

Maybe we can declutter things if they get PWoE(power and water over ethernet) or just a USB-W standard.

replies(3): >>45019238 #>>45020658 #>>45025857 #

16. scrlk ◴[25 Aug 25 21:21 UTC] No.45019181{3}[source]▶

>>45018608 #

You can see the male quick disconnect fittings for the liquid cooling at each corner of the server in this photo:

https://substackcdn.com/image/fetch/$s_!8aMm!,f_auto,q_auto:...

Looks like the power connector is in the centre. I'm not sure if backplane connectors are covered up by orange plugs?

replies(1): >>45019717 #

17. Nition ◴[25 Aug 25 21:27 UTC] No.45019238{3}[source]▶

>>45019151 #

It worked for MONIAC.

18. bri3d ◴[25 Aug 25 21:41 UTC] No.45019376[source]▶

>>45018513 (TP) #

I don't think this comment is accurate based on the article, although you cite personal experience elsewhere so maybe your project wasn't the one that's documented here?

> What Google is doing is using the huge chillers that would normally be cooling the air in the facility to cool water which is directly pumped into every server.

From the article:

> CDUs exchange heat between coolant liquid and the facility-level water supply.

Also, I know from attaching them at some point that plenty of mainframes used this exact same approach (water to water exchange with facility water), not water to air to water like you describe in this comment and others, so I think you may have just not had experience there? https://www.electronics-cooling.com/2005/08/liquid-cooling-i... contains a diagram in Figure 1 of this exact CDU architecture, which it claims was in use in mainframes dating back to 1965 (!).

I also don't think "This eliminates ANY air based transfer besides the chiller tower." is strictly true; looking at the photo of the sled in the article, there are fans. The TPUs are cooled by the liquid loop but the ancillaries are still air cooled. This is typical for water cooling systems in my experience; while I wouldn't be surprised to be wrong (it sure would be more efficient, I'd think!), I've never seen a water cooling system which successfully works without forced air, because there are just too many ancillary components of varying shapes to successfully design a PCB-waterblock combination which does not also demand forced air cooling.

replies(2): >>45019583 #>>45020923 #

19. tesseract ◴[25 Aug 25 21:45 UTC] No.45019421{4}[source]▶

>>45019142 #

CPC is in this market too https://www.cpcworldwide.com/Liquid-Cooling/Products/Blind-M...

Non-spill fluid quick disconnects are established tech in industries like medical, chemical processing, beverage dispensing, and hydraulic power, so there are plenty of design concepts to draw on.

20. liquidgecka ◴[25 Aug 25 21:58 UTC] No.45019583[source]▶

>>45019376 #

> > CDUs exchange heat between coolant liquid and the facility-level water supply.

Oh interesting I missed that when I went through in the first pass. (I think I space bared to pass the image and managed to skip the entire paragraph in between the two images so that’s on me.

I was running off an informal discussion I had with a hardware ops person several years ago where he mentioned a push to unify cooling and eliminate thermal transfer points since they were one of the major elements of inefficiency in modern cooling solutions. By missing that as I browsed through it I think I leaned too heavily on my assumptions without realizing it!

Also, not all chips can be liquid cooled so there will always be an element of air cooling so the fans and stuff are still there for the “everything else” cases and I doubt anybody will really eliminate that effectively. The comment you quoted was mostly directed towards the idea that Cray-1 had liquid cooling, it did, but it transferred to air outside of the server which was an extremely common model for most older mainframe setups. It was rare for the heat to be kept liquid along the whole path.

replies(2): >>45020981 #>>45021028 #

21. Romario77 ◴[25 Aug 25 22:10 UTC] No.45019717{4}[source]▶

>>45019181 #

it's not the center one, it's the side ones. Center seems to be a power supply.

22. jauntywundrkind ◴[25 Aug 25 22:18 UTC] No.45019802[source]▶

>>45018749 #

Having to pre chill water (via a refrigeration cycle) is radically less efficient than being able to collect and then disperse heat. It generates considerably more heat ahead of time, to deliver the chilled water. This mode of gathering the heat and sending it out, dealing with the heat after it is produced rather than in advance, should be much more energy efficient.

I don't know what surprises me about it so much, but having these rack-sized CDU heat-exchangers was quite a surprise, quite novel to me. Having a relatively small closed loop versus one big loop that has to go outside seems like a very big tradeoff, with a somewhat material and space intensive demand (a rack with 6x CDUs), but the fine grained control does seem obviously sweet to have. I wish there were a little more justification for the use of heat exchangers!

The way water is distributed within the server is also pretty amazing, with each server having it's own "bus bar" of water, and each chip having it's own active electro-mechanical valve to control it's specific water flow. The TPUv3 design where cooling happens serially, each chip in sequence getting hotter and hotter water seems common-ish, where-as with TPUv4 there's a fully parallel and controllable design.

Also the switch from lidded chips to bare chips, with a cold plate that comes down to just above, channeling water is one of those very detailed fine-grained optimizations that is just so sweet.

23. Hilift ◴[25 Aug 25 22:39 UTC] No.45020013[source]▶

>>45018536 #

And a 12V battery.

24. taneq ◴[26 Aug 25 00:00 UTC] No.45020658{3}[source]▶

>>45019151 #

Great, now I’ll have to figure out if my USB Hi-Flow cable is slowing down my USB Full-Flow drink bottle refilling.

replies(1): >>45020759 #

25. toast0 ◴[26 Aug 25 00:13 UTC] No.45020759{4}[source]▶

>>45020658 #

It's not. Full flow is less than hi-flow... Your bottle might slow down other fills on the bus though :p

26. matt-p ◴[26 Aug 25 00:42 UTC] No.45020923[source]▶

>>45019376 #

It's interesting because I've never seen mainframes do water to water (though I'm sure that was possible).

The only ones I've ever seen do water to compressor (then gas to the outdoor condenser, obviously)

replies(1): >>45021020 #

27. matt-p ◴[26 Aug 25 00:50 UTC] No.45020981{3}[source]▶

>>45019583 #

The CDUs are essentially just passive water to water heat exchangers with some fancy electronics attached. You want to run a different chemical mix outside to the chillers as you do on the internal loop, it also helps regulate flow/pressure and leak detection with auto cutoff is all fairly essential.

Running direct on facility water would made day to day operations and maintenance a total pain.

28. bri3d ◴[26 Aug 25 00:55 UTC] No.45021020{3}[source]▶

>>45020923 #

Most ES/9000 series and earlier stuff had water to water SKUs. I remember seeing an installation with one that exchanged heat into a facility loop which went through another water to water exchanger to a pond-fountain chiller system.

Starting with S/390 G4 they did a weird thing where the internals were cooled by refrigeration but the standard SKUs actually had the condenser in the bottom of the cabinet and they required raised floor cooling.

They brought water to air back with the later zSeries, but the standard SKUs mimicked the S/390 strategy with raised floor. I guess you could buy a z196 or a ec12 with a water to water cabinet but I too have never seen one.

29. avar ◴[26 Aug 25 00:57 UTC] No.45021028{3}[source]▶

>>45019583 #

    > not all chips can be
    > liquid cooled.

Why not? It's just a heatsink except with water running through cavities within it, instead of a fan sitting on top of the heatsink.

replies(2): >>45021283 #>>45025870 #

30. bri3d ◴[26 Aug 25 01:35 UTC] No.45021283{4}[source]▶

>>45021028 #

One of the biggest problems with water cooling, especially on boards that weren’t designed for it, can be passive components which don’t usually have a heatsink and therefore don’t offer a good surface for a water block, but end up in a thermal design which requires airflow - resistors and FETs are common culprits here. Commodity assemblies are also a big problem, with SFPs being a huge pain point in designs I’ve seen.

The problem is often exacerbated on PCBs designed for air cooling where the clearance between water cooled and air cooled components is not high enough to fit a water block. Usually the solution when design allows is to segment these components into a separate air cooled portion of the design, which is what Google look to have done on these TPU sleds (the last ~third of the assembly looks like it’s actively air cooled by the usual array of rackmount fans).

replies(2): >>45024261 #>>45024767 #

31. throwaway2037 ◴[26 Aug 25 04:06 UTC] No.45022161[source]▶

>>45018898 #

Do we know if other hyperscalers also use liquid chillers to achieve very low PUE values? I think I saw photos from xAI's new data center and there was liquid cooling.

replies(1): >>45027182 #

32. throwaway2037 ◴[26 Aug 25 04:47 UTC] No.45022375{4}[source]▶

>>45019128 #

    > part of the “holy cow gmail is out of space!” chaos

This sounds like an interesting story. Can you share more details.

33. jwr ◴[26 Aug 25 07:24 UTC] No.45023339[source]▶

>>45018513 (TP) #

> they are direct cooling the servers with chillers that are outside of the facility

That is exactly what the Cray Y-MP EL that I worked with in the 90s/2000s did.

34. matt-p ◴[26 Aug 25 09:33 UTC] No.45024261{5}[source]▶

>>45021283 #

Indeed, and the problem is once you've committed to fans and liquid cooling you can reduce the complexity and plate size massively by just cooling the big wins (CPU/GPU). I've actually seen setups where they only cold plate the GPU and leave the CPU and it's entire motherboard on air cooling.

Messy.

35. sfn42 ◴[26 Aug 25 10:44 UTC] No.45024767{5}[source]▶

>>45021283 #

I wonder if you could just put a conventional heatsink in there to cool the air inside the box?

You would have a liquid block on the CPU but you'd also have a heat sink on top that transfers heat from the air to the coolant block, working in reverse compared to normal air cooling heatsinks. The temperature difference would cause passive air circulation and the liquid cooling would now cool both the CPU and the air in the box, without fans.

Seems like something someone would have thought about and tested already though.

replies(1): >>45034620 #

36. Cthulhu_ ◴[26 Aug 25 12:52 UTC] No.45025857{3}[source]▶

>>45019151 #

Related, I vaguely recall a concept from a Tesla charger for semis that proposed a charging cable with active coolant flowing through it as well to keep the wire cooled.

edit: https://www.teslarati.com/tesla-liquid-cooled-supercharger-c...

replies(1): >>45027942 #

37. michaelt ◴[26 Aug 25 12:53 UTC] No.45025870{4}[source]▶

>>45021028 #

If you're blasting enough air around to cool a 600W GPU, you don't care if your GPU's power connector dissipates 10W under certain circumstances - the massive airflow will take care of it.

Take that airflow away and you have to be a good deal more careful with your connector selection, quality control and usability or you'll risk melted connectors.

Water-cooling connectors and cables isn't common, outside of things like 250kW EV chargers.

replies(1): >>45034625 #

38. Cthulhu_ ◴[26 Aug 25 12:54 UTC] No.45025880{4}[source]▶

>>45019128 #

I'd love to work in a datacenter at that scale sometime, it sounds like it's like working in a warehouse where you get a list of orders, servers to remove and pick up, but at the scales of the Googles et al, that's hundreds of server replacements a day, and production lines of new servers being built and existing ones being repaired or decommissioned.

It's a fascinating industry, but only in my head as the only info you get about it is carefully polished articles and the occasional anecdote on HN, which is also carefully polished due to NDAs.

39. ChuckMcM ◴[26 Aug 25 14:39 UTC] No.45027182{3}[source]▶

>>45022161 #

I don’t know of any Co-lo types that did it, Microsoft had some creative ideas about things like that but I don’t know if they brought any to fruition. I suspect there are still opportunities there.

40. jayd16 ◴[26 Aug 25 15:29 UTC] No.45027942{4}[source]▶

>>45025857 #

This is why I risk jokes on HN. Normally they're frowned upon but it's worth it to find interesting tech that meets the eccentric idea.

41. matt-p ◴[27 Aug 25 01:56 UTC] No.45034620{6}[source]▶

>>45024767 #

Not really practical it wouldn't transfer much energy at all. Let's say that your coolant comes in at 30 degrees c, well if your air is 40° and you've got no fans you can do the maths but it may as well be 0.

replies(1): >>45035978 #

42. matt-p ◴[27 Aug 25 01:57 UTC] No.45034625{5}[source]▶

>>45025870 #

It's exactly those kind of problems (though these systems will be SXM like).

43. sfn42 ◴[27 Aug 25 06:09 UTC] No.45035978{7}[source]▶

>>45034620 #

I was imagining the coolant comes in at a lower temp like 20 and maybe keeps the air from going above 40.

It doesn't have to do that much, but maybe you're right. I'm sure they'd be doing this if it was practical, being able to onit thousands of fans would probably save a pretty penny both on hardware and electricity.

↑