The wall confronting large language models

1. pama ◴[04 Sep 25 06:25 UTC] No.45124182[source]▶

Sauro, if you read this, please refrain from such low-content speculative statements:

“On a loose but telling note, this is still three decades short of the number of neural connections in the human brain, 1015, and yet they consume some one hundred million times more power (GWatts as compared to the very modest 20 Watts required by our brains).”

No human brain could have time to read all the materials of a modern LLM training run even if they lived and read eight hours a day since humans first appeared over 300,000 years ago. More to the point, inference of an LLM is way more energy efficient than human inference (see the energy costs of a B200 decoding a 671B parameter model and estimate the energy needed to write the equivalent of a human book worth of information as part of a larger batch). The main reason for the large energy costs of inference is that we are serving hundreds of millions of people with the same model. No humans have this type of scaling capability.

replies(4): >>45124272 #>>45124735 #>>45125473 #>>45127090 #

2. vrighter ◴[04 Sep 25 06:39 UTC] No.45124272[source]▶

>>45124182 (TP) #

And yet, the human brain is still (way way wayyyyyyyyy) more capable than the LLMs at the actual thinking. They're as wide as an ocean and as shallow as a puddle in a pothole. And we didn't need to read all of the internet to do it.

As for the "write a book" part, the LLM will write a book quickly sure, but a significant chunk of it will be bullshit. It will all be hallucinated, but the stopped clock will be right some of the time.

No humans have this scaling capability? What do you call the reproductive cycle then? Lots of smaller brains, each one possible specialized in a few fields, together containing all of human knowledge. And you might say that's not the same thing!, to which I reply with "let's not kid ourselves, Mixture-of-Experts describes exactly this".

replies(1): >>45124515 #

3. throwaway314155 ◴[04 Sep 25 07:19 UTC] No.45124515[source]▶

>>45124272 #

Agency may be better understood as Michael Levin's approach where e.g. a lifeform is something that can achieve the same goal using various methods (robust).

Having said that, you can now simply move the goal posts to say that while one human cannot read that much in that amount of time - the collective of all humans certainly can - or at least they can approximate it in a similar fashion to LLM's.

Since each of us can reap the benefits of the collective then the benefits are distributed back to the individuals as needed.

replies(1): >>45126609 #

4. wolvesechoes ◴[04 Sep 25 07:50 UTC] No.45124735[source]▶

>>45124182 (TP) #

I didn't have to read all textbooks, web articles or blog posts about numerical methods, yet I am capable of implementing production-ready ODE solver, and LLMs are not (I use this example as this is what I experienced). Clearly human supremacy.

5. skeezyboy ◴[04 Sep 25 09:56 UTC] No.45125473[source]▶

>>45124182 (TP) #

> The main reason for the large energy costs of inference is that we are serving hundreds of millions of people with the same model.

its because thats how LLMs work, not because theyre so popular

6. pama ◴[04 Sep 25 12:43 UTC] No.45126609{3}[source]▶

>>45124515 #

Certainly all of humanity could approximate the reading of an LLM for now. The energy consumption of humanity is about 20TW at this point, so over 20,000 times higher than the next generation LLM training runs. Just like the LLM training does not only spend energy on their “brain” neither does humanity.

replies(1): >>45145646 #

7. mikewarot ◴[04 Sep 25 13:32 UTC] No.45127090[source]▶

>>45124182 (TP) #

> The main reason for the large energy costs of inference is that we are serving hundreds of millions of people with the same model. No humans have this type of scaling capability.

Using CPUs or GPUs or even tensor units involve waiting for data to be moved from RAM to/from compute. It's my understanding that most of the power used in LLM compute is taken at that stage, and I further believe that 95% savings are possible by merging memory and compute to build a universal computing fabric.

Alternatively, I'm deep in old man with goofy idea territory. Only time will tell.

replies(1): >>45148750 #

8. throwaway314155 ◴[06 Sep 25 01:14 UTC] No.45145646{4}[source]▶

>>45126609 #

Well said

9. pama ◴[06 Sep 25 12:36 UTC] No.45148750[source]▶

>>45127090 #

There is room for improvement in inference, hence the presence of various startups in this space and the increased innovation in software. Large nvidia clusters are still cost optimal for scaling inference (as they move most of the memory transfer of smaller setups out of the critical path), and their energy cost is trivial compared to the cost of the hardware, but these conditions may change.

Training is nearly fully compute bound and NVidia/CUDA provide decent abstractions for it. At least for now. We still need new ideas if training is to scale another 10 orders of magnitude in compute, but these ideas may not be practical for another decade.