Backpropagation is a leaky abstraction (2016)

(karpathy.medium.com)

346 points swatson741 | 3 comments | 02 Nov 25 05:20 UTC | HN request time: 0.439s | source

Show context

gchadwick ◴[02 Nov 25 07:20 UTC] No.45788468[source]▶

Karpathy's contribution to teaching around deep learning is just immense. He's got a mountain of fantastic material from short articles like this, longer writing like https://karpathy.github.io/2015/05/21/rnn-effectiveness/ (on recurrent neural networks) and all of the stuff on YouTube.

Plus his GitHub. The recently released nanochat https://github.com/karpathy/nanochat is fantastic. Having minimal, understandable and complete examples like that is invaluable for anyone who really wants to understand this stuff.

replies(2): >>45788631 #>>45788885 #

kubb ◴[02 Nov 25 09:03 UTC] No.45788885[source]▶

>>45788468 #

I was slightly surprised that my colleagues, who are extremely invested in capabilities of LLMs, didn’t show any interest in Karpathy’s communication on the subject when I recommended it to them.

Later I understood that they don’t need to understand LLMs, and they don’t care how they work. Rather they need to believe and buy into them.

They’re more interested in science fiction discussions — how would we organize a society where all work is done by intelligent machines — than what kinds of tasks are LLMs good at today and why.

replies(9): >>45788975 #>>45789023 #>>45789131 #>>45789241 #>>45789316 #>>45789676 #>>45789975 #>>45791483 #>>45791925 #

teiferer ◴[02 Nov 25 09:19 UTC] No.45788975[source]▶

>>45788885 #

Which is terrible. That's the root of all the BS around LLMs. People lacking understanding of what they are and ascribing capabilities which LLMs just don't have, by design. Even HN discussions are full of that. Even though this page literally has "hacker" in its name.

replies(2): >>45788987 #>>45789478 #

tim333 ◴[02 Nov 25 11:13 UTC] No.45789478[source]▶

>>45788975 #

I see your point but on the other hand a lot of conversations go: A: what will we do when AI do all the jobs, B: that's silly LLMs can't do the jobs. The thing is A didn't say LLM, they said AI as in whatever that will be a short while into the future. Which is changing rapidly because thousands of bright people are being paid to change it.

replies(2): >>45790308 #>>45793560 #

HarHarVeryFunny ◴[02 Nov 25 13:44 UTC] No.45790308[source]▶

>>45789478 #

The trouble is that "AI" is also very much a leaky abstraction, which makes it tempting to see all the "AI" advances of recent years, then correctly predict that these "AI" advances will continue, but then jump to all sorts of wrong conclusions about what those advances will be.

For example, things like "AI" image and video generation are amazing, as are things like AlphaGo and AlphaFold, but none of these have anything to do with LLMs, and the only technology they share with LLMs is machine learning and neural nets. If you lump these together with LLMs, calling them all "AI", then you'll come to the wrong conclusion that all of these non-LLM advances indicate that "AI" is rapidly advancing and therefore LLMs (also being "AI") will do too ...

Even if you leave aside things like AlphaGo, and just focus on LLMs, and other future technology that may take all our jobs, then using terms like "AI" and "AGI" are still confusing and misleading. It's easy to fall into the mindset that "AGI" is just better "AI", and that since LLMs are "AI", AGI is just better LLMs, and is around the corner because "AI" is advancing rapidly ...

In reality LLMs are, like AlphaFold, something highly specific - they are auto-regressive next-word predictor language models (just as a statement of fact, and how they are trained, not a put-down), based on the Transformer architecture.

The technology that could replace humans for most jobs in the future isn't going to be a better language model - a better auto-regressive next-word predictor - but will need to be something much more brain like. The architecture itself doesn't have to be brain-like, but in order to deliver brain-like functionality it will probably need to include another half-dozen "Transformer-level" architectural/algorithmic breakthroughs including things like continual learning, which will likely turn the whole current LLM training and deployment paradigm on it's head.

Again, just focusing on LLMs, and LLM-based agents, regarding them as a black-box technology, it's easy to be misled into thinking that advances in capability are broadly advancing, and will rise all ships, when in reality progress is much more narrow. Headlines about LLMs achievement in math and competitive programming, touted as evidence of reasoning, do NOT imply that LLM reasoning is broadly advancing, but you need to get under the hood and understand RL training goals to realize why that is not necessarily the case. The correctness of most business and real-world reasoning is not as easy to check as is marking a math problem as correct or not, yet that capability is what RL training depends on.

I could go on .. LLM-based agents are also blurring the lines of what "AI" can do, and again if treated as a black box will also misinform as to what is actually progressing and what is not. Thousands of bright people are indeed working on improving LLM-adjacent low-hanging fruit like this, but it'd be illogical to conclude that this is somehow helping to create next-generation brain-like architectures that will take away our jobs.

replies(1): >>45791130 #

tim333 ◴[02 Nov 25 15:44 UTC] No.45791130[source]▶

>>45790308 #

I'll give you algorithmic breakthroughs have been quite slow to come about - I think backpropagation in 1986 and then transformers in 2017. Still the fact that LLMs can do well in things like the maths olympiad have me thinking there must be some way to tweak this to be more brain like. I recently read how LLMs work and was surprised how text focused it is, making word vectors and not physical understanding.

replies(3): >>45791310 #>>45792439 #>>45793612 #

dontlikeyoueith ◴[02 Nov 25 18:46 UTC] No.45792439[source]▶

>>45791130 #

> Still the fact that LLMs can do well in things like the maths olympiad have me thinking there must be some way to tweak this to be more brain like

That's because you, as you admit in the next sentence, have almost no understanding of how they work.

Your reasoning is on the same level as someone in the 1950s thinking ubiquitous flying cars are just a few years away. Or fusion power, for that matter.

In your defense, that seems to be about the average level of engagement with this technology, even on this website.

replies(1): >>45797824 #

1. tim333 ◴[03 Nov 25 10:59 UTC] No.45797824[source]▶

>>45792439 #

Maybe but the flying cars and fusion ran into fundamental barriers of the physics being hard. With human level intelligence though we have evidence it's possible from our brains which seem to use less compute than the LLMs going by power usage so I don't see a fundamental barrier to it just needing some different code.

replies(2): >>45799959 #>>45803704 #

2. HarHarVeryFunny ◴[03 Nov 25 15:16 UTC] No.45799959[source]▶

>>45797824 (TP) #

You could say there is no fundamental barrier to humans doing anything that is allowed by the laws of Physics, but that is not a very useful statement, and doesn't indicate how long it may take.

Since nobody has yet figured out how to build an artificial brain, having that as a proof it's possible doesn't much help. It will be decades or more before we figure out how the brain works and are able to copy that, although no doubt people will attempt to build animal intelligence before fully knowing how nature did it.

Saying that AGI "just needs some different code" than an LLM is like saying that building an interstellar spaceship "just needs some different parts than a wheelbarrow". Both are true, and both are useless statements offering zero insight into the timeline involved.

3. dontlikeyoueith ◴[03 Nov 25 20:01 UTC] No.45803704[source]▶

>>45797824 (TP) #

> I don't see a fundamental barrier to it

Neither did the people expecting fusion power and flying cars to come quickly.

We have just as much evidence that fusion power is possible as we do that human level intelligence is possible. Same with small vehicle flight for that matter.

None of that makes any of these things feasible.

↑