Most active commenters
  • liquidgecka(3)

←back to thread

Google's Liquid Cooling

(chipsandcheese.com)
399 points giuliomagnifico | 22 comments | | HN request time: 0.74s | source | bottom
Show context
jonathaneunice ◴[] No.45017586[source]
It’s very odd when mainframes (S/3x0, Cray, yadda yadda) have been extensively water-cooled for over 50 years, and super-dense HPC data centers have used liquid cooling for at least 20, to hear Google-scale data center design compared to PC hobbyist rigs. Selective amnesia + laughably off-target point of comparison.
replies(6): >>45017651 #>>45017716 #>>45018092 #>>45018513 #>>45018785 #>>45021044 #
liquidgecka ◴[] No.45018513[source]
[bri3d pointed out that I missed an element of this. There is a transfer between rack level and machine level coolant which makes this far less novel than I had initially understood. See their direct comment to this]

I posed this further down in a reply-to-a-reply but I should call it out a little closer to the top: The innovation here is not “we are using water for cooling”. The innovation here is that they are direct cooling the servers with chillers that are outside of the facility. Most mainframes will use water cooling to get the heat from the core out to the edges where traditional where it can be picked up by the typical heatsink/cooling fans. Even home PCs do this by moving the heat to a reservoir that can be more effectively cooled.

What Google is doing is using the huge chillers that would normally be cooling the air in the facility to cool water which is directly pumped into every server. The return water is then cooled in the chiller tower. This eliminates ANY air based transfer besides the chiller tower. This is one being done a server or a rack.. its being done on the whole data center all at once.

I am super curious how they handle things like chiller maintenance or pump failures. I am sure they have redundancy but the system for that has to be super impressive because it can’t be offline long before you experience hardware failure!

[Edit: It was pointed out in another comment that AWS is doing this as well and honestly their pictures make it way clearer what is happening: https://www.aboutamazon.com/news/aws/aws-liquid-cooling-data...]

replies(5): >>45018536 #>>45018749 #>>45018898 #>>45019376 #>>45023339 #
1. ambicapter ◴[] No.45018536[source]
So every time they plug in a server they also plug in water lines?
replies(5): >>45018606 #>>45018608 #>>45018698 #>>45019151 #>>45020013 #
2. jedberg ◴[] No.45018606[source]
Looks like it. New server means power, internet, and water.
replies(1): >>45018863 #
3. liquidgecka ◴[] No.45018608[source]
[I am not a current Google Employee so my understanding of this is based on externally written articles and “leap of faith” guestimation]

Yes. A supply and return line along with power. Though if I had to guess how its setup this would be done with some super slick “it just works” kind of mount that lets them just slide the case in and lock it in place. When I was there almost all hardware replacement was made downright trivial so it could just be more or less slide in place and walk away.

replies(4): >>45018985 #>>45019109 #>>45019142 #>>45019181 #
4. ajb ◴[] No.45018698[source]
I remember reading somewhere that they don't operate at the level of servers; if one dies they leave it in place until they're ready to replace the whole rack. Don't know if that's true now, though.

It does sound like connections do involve water lines though. As they are isolating different water circuits, in theory they could have a dry connection between heat exchanger plates, or one made through thermal paste. It doesn't sound like they're doing that though.

replies(2): >>45019104 #>>45019128 #
5. fudgy73 ◴[] No.45018863[source]
just like humans.
6. nielsbot ◴[] No.45018985[source]
Maybe similar to a gasoline hose breakaway

https://www.opwglobal.com/products/us/retail-fueling-product...

7. cavisne ◴[] No.45019104[source]
Definitely not true for these workloads. TPUs are interconnected, one dying makes the whole cluster significantly less useful.
8. michaelt ◴[] No.45019109[source]
Interestingly, entire supercomputers have been decommissioned [1] due to faulty quick disconnects causing water spray.

So you can get a single, blind-mating connector combining power, data and water - but you might not want to :)

[1] https://gsaauctions.gov/auctions/preview/282996

9. liquidgecka ◴[] No.45019128[source]
It has not been true for a LONG time. That was part of Google early “compute unit” strategy that involved things like sealed containers and such. Turns out that’s not super efficient or useful because you leave large swaths of hardware idle.

In my day we had software that would “drain” a machine and release it to hardware ops to swap the hardware on. This could be a drive, memory, CPU or a motherboard. If it was even slightly complicated they would ship it to Mountain View for diagnostic and repair. But every machine was expected to be cycled to get it working as fast as possible.

We did a disk upgrade on a whole datacenter that involved switching from 1TB to 2TB disks or something like that (I am dating myself) and total downtime was so important they hired temporary workers to work nights to get the swap done as quickly as possible. If I remember correctly that was part of the “holy cow gmail is out of space!” chaos though, so added urgency.

replies(2): >>45022375 #>>45025880 #
10. semi-extrinsic ◴[] No.45019142[source]
https://amphenol-industrial.com/products/liquid-cooling-syst...
replies(1): >>45019421 #
11. jayd16 ◴[] No.45019151[source]
Maybe we can declutter things if they get PWoE(power and water over ethernet) or just a USB-W standard.
replies(3): >>45019238 #>>45020658 #>>45025857 #
12. scrlk ◴[] No.45019181[source]
You can see the male quick disconnect fittings for the liquid cooling at each corner of the server in this photo:

https://substackcdn.com/image/fetch/$s_!8aMm!,f_auto,q_auto:...

Looks like the power connector is in the centre. I'm not sure if backplane connectors are covered up by orange plugs?

replies(1): >>45019717 #
13. Nition ◴[] No.45019238[source]
It worked for MONIAC.
14. tesseract ◴[] No.45019421{3}[source]
CPC is in this market too https://www.cpcworldwide.com/Liquid-Cooling/Products/Blind-M...

Non-spill fluid quick disconnects are established tech in industries like medical, chemical processing, beverage dispensing, and hydraulic power, so there are plenty of design concepts to draw on.

15. Romario77 ◴[] No.45019717{3}[source]
it's not the center one, it's the side ones. Center seems to be a power supply.
16. Hilift ◴[] No.45020013[source]
And a 12V battery.
17. taneq ◴[] No.45020658[source]
Great, now I’ll have to figure out if my USB Hi-Flow cable is slowing down my USB Full-Flow drink bottle refilling.
replies(1): >>45020759 #
18. toast0 ◴[] No.45020759{3}[source]
It's not. Full flow is less than hi-flow... Your bottle might slow down other fills on the bus though :p
19. throwaway2037 ◴[] No.45022375{3}[source]

    > part of the “holy cow gmail is out of space!” chaos
This sounds like an interesting story. Can you share more details.
20. Cthulhu_ ◴[] No.45025857[source]
Related, I vaguely recall a concept from a Tesla charger for semis that proposed a charging cable with active coolant flowing through it as well to keep the wire cooled.

edit: https://www.teslarati.com/tesla-liquid-cooled-supercharger-c...

replies(1): >>45027942 #
21. Cthulhu_ ◴[] No.45025880{3}[source]
I'd love to work in a datacenter at that scale sometime, it sounds like it's like working in a warehouse where you get a list of orders, servers to remove and pick up, but at the scales of the Googles et al, that's hundreds of server replacements a day, and production lines of new servers being built and existing ones being repaired or decommissioned.

It's a fascinating industry, but only in my head as the only info you get about it is carefully polished articles and the occasional anecdote on HN, which is also carefully polished due to NDAs.

22. jayd16 ◴[] No.45027942{3}[source]
This is why I risk jokes on HN. Normally they're frowned upon but it's worth it to find interesting tech that meets the eccentric idea.