←back to thread

139 points obscurette | 1 comments | | HN request time: 0.419s | source
Show context
wyager ◴[] No.44465546[source]
> Large language models are impressive statistical text predictors — genuinely useful tools that excel at pattern matching and interpolation.

Slightly OT: It's interesting how many (smart!) people in tech like the author of this article still can't conceptualize the difference between training objective and learned capability. I wonder at this point if it's a sort of willful ignorance adopted as a psychological protection mechanism. I wonder if they're going to experience a moment of severe shock, just gradually forget that they held these opinions, or take on a sort of delusional belief that AI can't do XYZ despite all mounting evidence to the contrary.

replies(2): >>44465622 #>>44465973 #
ReptileMan ◴[] No.44465622[source]
Can you elaborate a bit?
replies(1): >>44466111 #
gjm11 ◴[] No.44466111[source]
(Not GP, but:)

LLMs' initial training is specifically for token-prediction.

However, this doesn't mean that what they end up doing is specifically token-prediction (except in the sense that anything that generates textual output can be described as doing token-prediction). Nor does it mean that the only things they can do are tasks most naturally described in terms of token-prediction.

For instance, suppose you successfully train something to predict the next token given input of the form "[lengthy number] x [lengthy number] = ", where "successfully" means that the system ends up able to predict correctly almost all the time even when the numbers are ones it hasn't seen before. How could it do that? Only by, in some sense, "learning to multiply". (I haven't checked but my hazy recollection is that somewhere around GPT-3.5 or GPT-4 LLMs went from not being able to do this at all to being able to do it fairly well on moderate-sized numbers.)

Or suppose you successfully train something to complete things of the form "The SHA256 hash of [lengthy string] is "; again, a system that could do that correctly would have to have, in some sense, "learned to implement SHA256". (I am pretty sure that today's LLMs cannot do this, though of course they might have learned to call out to a tool that can.)

If you successfully train something to complete things of the form "One grammatical English sentence whose SHA256 hash is [value] is " then that thing has to have "learned to break SHA256". (I am very sure that today's LLMs cannot do this and I think it enormously unlikely that any ever will be able to.)

If you successfully train something to complete things of the form "The complete source code for a program written in idiomatic Rust that does [difficult task] is " then that thing has to have "learned to write code in Rust". (Today's LLMs can kinda do some tasks like this, and there are a lot of people yelling at one another about just how much they can do.)

That is: some token-prediction tasks can only be accomplished by doing things that we would not normally think of as being about token prediction. This is essentially the point of the "Turing test".

For the avoidance of doubt, I am making no particular claims (beyond the illustrative ones explicitly made above) about what if anything today's LLMs, or plausible near-future LLMs, or other further-future AI systems, are able to do that goes beyond what we would normally think of as token prediction. The point is that whether or not today's LLMs are "just stochastic parrots" in some useful sense, it doesn't follow from the fact that they are trained on token-prediction that that's all they are.

replies(1): >>44466336 #
1. MountainMan1312 ◴[] No.44466336[source]
It's like how when you wrote that comment, the thing you were doing wasn't "operating your finger muscles"