(Disclaimer: I am not an anti-AI guy — I am just listing the common talking points I see in my feeds.)
(Disclaimer: I am not an anti-AI guy — I am just listing the common talking points I see in my feeds.)
My strong intuition at the moment is that the environmental impact is greatly exaggerated.
The energy cost of executing prompts has dropped enormously over the past two years - something that's reflected in this report when it says "Driven by increasingly capable small models, the inference cost for a system performing at the level of GPT-3.5 dropped over 280-fold between November 2022 and October 2024". I wrote a bit about that here: https://simonwillison.net/2024/Dec/31/llms-in-2024/#the-envi...
We still don't have great numbers on training costs for most of the larger labs, which are likely extremely high.
Llama 3.3 70B cost "39.3M GPU hours of computation on H100-80GB (TDP of 700W) type hardware" which they calculated as 11,390 tons CO2eq. I tried to compare that to fully loaded passenger jet flights between London and New York and got a number of between 28 and 56 flights, but I then completely lost confidence in my ability to credibly run those calculations because I don't understand nearly enough about how CO2eq is calculated in different industries.
The "LLMs are an environmental catastrophe" messaging has become so firmly ingrained in our culture that I think it would benefit the AI labs themselves enormously if they were more transparent about the actual numbers.
While the single query might have become more efficient, we would also have to relate this to the increased volume of overall queries. E.g in the last few years, how many more users, and queries per user were requested.
My feeling is that it's Jevons paradox all over.
Individual inferences are extremely low impact. Additionally it will be almost impossible to assess the net effect due to the complexity of the downstream interactions.
At 40M 700W GPU hours 160 million queries gets you 175Wh per query. That's less than the energy required to boil a pot of pasta. This is merely an upper bound - it's near certain that many times more queries will be run over the life of the model.
Can you quantify how much less driving resulted from the increase of LLM usage? I doubt you can.