Google's Liquid Cooling

(chipsandcheese.com)

399 points giuliomagnifico | 5 comments | 25 Aug 25 17:57 UTC | HN request time: 0.001s | source

Show context

jonathaneunice ◴[25 Aug 25 18:58 UTC] No.45017586[source]▶

It’s very odd when mainframes (S/3x0, Cray, yadda yadda) have been extensively water-cooled for over 50 years, and super-dense HPC data centers have used liquid cooling for at least 20, to hear Google-scale data center design compared to PC hobbyist rigs. Selective amnesia + laughably off-target point of comparison.

replies(6): >>45017651 #>>45017716 #>>45018092 #>>45018513 #>>45018785 #>>45021044 #

spankalee ◴[25 Aug 25 19:10 UTC] No.45017716[source]▶

>>45017586 #

From the article:

> Liquid cooling is a familiar concept to PC enthusiasts, and has a long history in enterprise compute as well.

And the trend in data centers was to move towards more passive cooling at the individual servers and hotter operating temperatures for a while. This is interesting because it reverses that trend a lot, and possibly because of the per-row cooling.

replies(2): >>45018019 #>>45018087 #

dekhn ◴[25 Aug 25 19:42 UTC] No.45018087[source]▶

>>45017716 #

We've basically been watching Google gradually re-discover all the tricks of supercomputing (and other high performance areas) over the past 10+ years. For a long time, websearch and ads were the two main drivers of Google's datacenter architecture, along with services like storage and jobs like mapreduce. I would describe the approach as "horizontal scaling with statistical multiplexing for load balancing".

Those style of jobs worked well but as Google has realized it has more high performance computing with unique workload characteristics that are mission-critical (https://cloud.google.com/blog/topics/systems/the-fifth-epoch...) their infrastructure has had to undergo a lot of evolution to adapt to that.

Google PR has always been full of "look we discovered something important and new and everybody should do it", often for things that were effectively solved using that approach a long time ago. MapReduce is a great example of that- Google certainly didn't invent the concepts of Map or Reduce, or even the idea of using those for doing high throughput computing (and the shuffle phase of MapReduce is more "interesting" from a high performance computing perspective than mapping or reducing anyway).

replies(6): >>45018386 #>>45018588 #>>45018809 #>>45019953 #>>45020485 #>>45021776 #

liquidgecka ◴[25 Aug 25 20:08 UTC] No.45018386[source]▶

>>45018087 #

As somebody that worked on Google data centers after coming from a high performance computing world I can categorically say that Google is not “re-learning” old technology. In the early days (when I was there) they focused heavily on moving from thinking of computers to thinking of compute units. This is where containers and self contained data centers came from. This was actually a joke inside of Google because it failed but was copied by all the other vendors for years after Google had given up on it. They then moved to stop thinking about cooling as something that happens within a server case to something that happens to a whole facility. This was the first major leap forward where they moved from cooling the facility and pushing conditioned air in to cooling the air immediately behind the server.

Liquid cooling at Google scale is different than mainframes as well. Mainframes needed to move heat from the core out to the edges of the server where traditional data center cooling would transfer it away to be conditioned. Google liquid cooling is moving the heat completely outside of the building while it’s still liquid. That’s never been done before as far as I am aware. Not at this scale at least.

replies(4): >>45018554 #>>45018655 #>>45018697 #>>45022759 #

1. zer00eyz ◴[25 Aug 25 20:34 UTC] No.45018655[source]▶

>>45018386 #

> cooling is moving the heat completely outside of the building while it’s still liquid.

We have been doing this for decades, it's how refrigerants work.

The part that is new is not having an air-interface in the middle of the cycle.

Water isn't the only material being looked at, mostly because high pressure PtC (Push to Connect) fittings, and monitoring/sensor hardware has evolved. If a coolant is more expensive but leaks don't destroy equipment, and can be quickly isolated then it becomes a cost/accounting question.

replies(3): >>45018753 #>>45019058 #>>45021915 #

2. liquidgecka ◴[25 Aug 25 20:44 UTC] No.45018753[source]▶

>>45018655 (TP) #

> The part that is new is not having an air-interface in the middle of the cycle.

I wasn’t clear when I was writing but this was the point I was trying to make. Heat from the chip is transferred in the same medium all the way from the chip to the exterior chiller without intermediate transfers to a new medium.

3. marcosdumay ◴[25 Aug 25 21:09 UTC] No.45019058[source]▶

>>45018655 (TP) #

The claim is that Google has larger pipes that go all the way out of the building. While mainframes have short pipes that go only to a heat exchanger on the end of the hack.

IMO, it's not a big difference. There are probably many details more noteworthy than this. And yeah, mainframes are that way because the vendor only creates them up to the hack-level, while Google has the "vendor" design the entire datacenter. Supercomputers have had single-vendor datacenters for decades too, and have been using large pipes for a while too.

4. cyberax ◴[26 Aug 25 03:19 UTC] No.45021915[source]▶

>>45018655 (TP) #

Glycol is cheap and safe, but it has lower specific heat capacity and higher viscosity. So that's why water is still being used.

The next step is probably evaporative cooling, with liquid coolant ("freon") pumped to individual racks.

replies(1): >>45022287 #

5. jabl ◴[26 Aug 25 04:28 UTC] No.45022287[source]▶

>>45021915 #

Not sure what Google specifically is using here, but PG25 (25% propylene glycol, 75% water) is somewhat common in data center applications. The glycol takes care of avoiding microbial growth.

↑