https://www.datacenterdynamics.com/en/news/elon-musk-xai-gas...
https://www.datacenterdynamics.com/en/news/elon-musk-xai-gas...
(1) the utilization factor over the obsolescence-limited "useful" life of the hardware; (2) the short-term (sub-month) training job scheduling onto a physical cluster.
For (1) it's acceptable to, on average, not operate one month per year as long as that makes the electricity opex low enough.
For (2) yeah, large-scale pre-training jobs that spend millions of compute on what's overall "one single" job, those are often ok to wait a few days to a very few weeks as would be from just dropping HPC cluster system operation to standby power/deep sleep on the p10 worst days each year as far as renewable yield in the grid-capacity-limited surroundings of the datacenter goes. And if you can further run systems a little power-tuned rather than performance-tuned when power is less plentiful, to where you may average only 90% theoretical compute throughput during cluster operating hours (this is in addition to turning it off for about a month worth of time), you could reduce power production and storage capacity a good chunk further.