←back to thread

199 points angadh | 1 comments | | HN request time: 0.205s | source
Show context
GlenTheMachine ◴[] No.44393313[source]
Space roboticist here.

As with a lot of things, it isn't the initial outlay, it's the maintenance costs. Terrestrial datacenters have parts fail and get replaced all the time. The mass analysis given here -- which appears quite good, at first glance -- doesn't including any mass, energy, or thermal system numbers for the infrastructure you would need to have to replace failed components.

As a first cut, this would require:

- an autonomous rendezvous and docking system

- a fully railed robotic system, e.g. some sort of robotic manipulator that can move along rails and reach every card in every server in the system, which usually means a system of relatively stiff rails running throughout the interior of the plant

- CPU, power, comms, and cooling to support the above

- importantly, the ability of the robotic servicing system toto replace itself. In other words, it would need to be at least two fault tolerant -- which usually means dual wound motors, redundant gears, redundant harness, redundant power, comms, and compute. Alternately, two or more independent robotic systems that are capable of not only replacing cards but also of replacing each other.

- regular launches containing replacement hardware

- ongoing ground support staff to deal with failures

The mass analysis also doesn't appear to include the massive number of heat pipes you would need to transfer the heat from the chips to the radiators. For an orbiting datacenter, that would probably be the single biggest mass allocation.

replies(17): >>44393394 #>>44393436 #>>44393528 #>>44393553 #>>44393882 #>>44394969 #>>44395311 #>>44395355 #>>44396009 #>>44396843 #>>44397057 #>>44397975 #>>44398392 #>>44398563 #>>44406204 #>>44410213 #>>44414799 #
1. lumost ◴[] No.44396843[source]
I used to build and operate data center infrastructure. There is very limited reason to do anything more than a warranty replacement on a GPU. With a high quality hardware vendor that properly engineers the physical machine, failure rates can be contained to less than .5% per year. Particularly if the network has redundancy to avoid critical mass failures.

In this case, I see no reason to perform any replacements of any kind. Proper networked serial port and power controls would allow maintenance for firmware/software issues.