I posed this further down in a reply-to-a-reply but I should call it out a little closer to the top: The innovation here is not “we are using water for cooling”. The innovation here is that they are direct cooling the servers with chillers that are outside of the facility. Most mainframes will use water cooling to get the heat from the core out to the edges where traditional where it can be picked up by the typical heatsink/cooling fans. Even home PCs do this by moving the heat to a reservoir that can be more effectively cooled.
What Google is doing is using the huge chillers that would normally be cooling the air in the facility to cool water which is directly pumped into every server. The return water is then cooled in the chiller tower. This eliminates ANY air based transfer besides the chiller tower. This is one being done a server or a rack.. its being done on the whole data center all at once.
I am super curious how they handle things like chiller maintenance or pump failures. I am sure they have redundancy but the system for that has to be super impressive because it can’t be offline long before you experience hardware failure!
[Edit: It was pointed out in another comment that AWS is doing this as well and honestly their pictures make it way clearer what is happening: https://www.aboutamazon.com/news/aws/aws-liquid-cooling-data...]
It does sound like connections do involve water lines though. As they are isolating different water circuits, in theory they could have a dry connection between heat exchanger plates, or one made through thermal paste. It doesn't sound like they're doing that though.
In my day we had software that would “drain” a machine and release it to hardware ops to swap the hardware on. This could be a drive, memory, CPU or a motherboard. If it was even slightly complicated they would ship it to Mountain View for diagnostic and repair. But every machine was expected to be cycled to get it working as fast as possible.
We did a disk upgrade on a whole datacenter that involved switching from 1TB to 2TB disks or something like that (I am dating myself) and total downtime was so important they hired temporary workers to work nights to get the swap done as quickly as possible. If I remember correctly that was part of the “holy cow gmail is out of space!” chaos though, so added urgency.
> part of the “holy cow gmail is out of space!” chaos
This sounds like an interesting story. Can you share more details.