←back to thread

170 points bookofjoe | 9 comments | | HN request time: 0.001s | source | bottom
Show context
slibhb ◴[] No.43644865[source]
LLMs are statistical models trained on human-generated text. They aren't the perfectly logical "machine brains" that Asimov and others imagined.

The upshot of this is that LLMs are quite good at the stuff that he thinks only humans will be able to do. What they aren't so good at (yet) is really rigorous reasoning, exactly the opposite of what 20th century people assumed.

replies(5): >>43645899 #>>43646817 #>>43647147 #>>43647395 #>>43650058 #
wubrr ◴[] No.43647147[source]
> LLMs are statistical models trained on human-generated text.

I mean, not only human-generated text. Also, human brains are arguably statistical models trained on human-generated/collected data as well...

replies(2): >>43647173 #>>43647476 #
slibhb ◴[] No.43647173[source]
> Also, human brains are arguably statistical models trained on human-generated/collected data as well...

I'd say no, human brains are "trained" on billions of years of sensory data. A very small amount of that is human-generated.

replies(1): >>43647230 #
wubrr ◴[] No.43647230[source]
Almost everything we learn in schools, universities, most jobs, history, news, hackernews, etc is literally human-generated text. Our brains have an efficient structure to learn language, which has evolved over time, but the processes of actually learning languages happens after you are born, based on human-generated text/voice. Things like balance/walking, motion control, speaking (physical voice control), other physical things are trained on sensory data, but there's no reason LLMs/AIs can't be trained on similar data (and in many cases they already are).
replies(1): >>43647571 #
1. skydhash ◴[] No.43647571[source]
What we generate is probably a function of our sensory data + what we call creativity. At least humans still have access to the sensory data, so we can separate the two (with various success).

LLMs have access to what we generate, but not the source. So it embed how we may use words, but not why we use this word and not others.

replies(2): >>43647697 #>>43649780 #
2. wubrr ◴[] No.43647697[source]
> At least humans still have access to the sensory data

I don't understand this point - we can obviously collect sensory data and use that for training. Many AI/LLM/robotics projects do this today...

> So it embed how we may use words, but not why we use this word and not others.

Humans learn language by observing other humans use language, not by being taught explicit rules about when to use which word and why.

replies(1): >>43647979 #
3. skydhash ◴[] No.43647979[source]
> I don't understand this point - we can obviously collect sensory data and use that for training.

Sensory data is not the main issue, but how we interpret them.

In Jacob Bronowski's The Origins of Knowledge and Imagination, IIRC, there's an argument that our eyes are very coarse sensors. Instead they do basic analysis from which the brain can infer the real world around us with other data from other organs. Like Plato's cave, but with much more dimensions.

But we humans came with the same mechanisms that roughly interpret things the same way. So there's some commonality there about the final interpretation.

> Humans learn language by observing other humans use language, not by being taught explicit rules about when to use which word and why.

Words are symbols that refers to things and the relations between them. In the same book, there's a rough explanation for language which describe the three elements that define it: Symbols or terms, the grammar (or the rules for using the symbols), and a dictionary which maps the symbols to things and the rules to interactions in another domain that we already accept as truth.

Maybe we are not taught the rules explicitly, but there's a lot of training done with corrections when we say a sentence incorrectly. We also learn the symbols and the dictionary as we grow and explore.

So LLMs learn the symbols and the rules, but not the whole dictionary. It can use the rules to create correct sentences, and relates some symbols to other, but ultimately there's no dictionary behind it.

replies(2): >>43648249 #>>43648880 #
4. wubrr ◴[] No.43648249{3}[source]
> In the same book, there's a rough explanation for language which describe the three elements that define it: Symbols or terms, the grammar (or the rules for using the symbols), and a dictionary which maps the symbols to things and the rules to interactions in another domain that we already accept as truth.

There are 2 types of grammar for natural language - descriptive (how the language actually works and is used) and prescriptive (a set of rule about how a language should be used). There is no known complete and consistent rule-based grammar for any natural human language - all of these grammar are based on some person or people, in a particular period of time, selecting a subset of the real descriptive grammar of the language and saying 'this is the better way'. Prescriptive, rule-based grammar is not at all how humans learn their first language, nor is prescriptive grammar generally complete or consistent. Babies can easily learn any language, even ones that do not have any prescriptive grammar rules, just by observing - there have been many studies that confirm this.

> there's a lot of training done with corrections when we say a sentence incorrectly.

There's a lot of the same training for LLMs.

> So LLMs learn the symbols and the rules, but not the whole dictionary. It can use the rules to create correct sentences, and relates some symbols to other, but ultimately there's no dictionary behind it.

LLMs definitely learn 'the dictionary' (more accurately a set of relations/associations between words and other types of data) and much better than humans do, not that such a 'dictionary' is an actual determined part of the human brain.

5. jstanley ◴[] No.43648880{3}[source]
> there's an argument that our eyes are very coarse sensors. Instead they do basic analysis from which the brain can infer the real world around us with other data from other organs

I don't buy it. I think our eyes are approximately as fine as we perceive them to be.

When you look through a pair of binoculars at a boat and some trees on the other side of a lake, the only organ that's getting a magnified view is the eyes, so any information you derive comes from the eyes and your imagination, it can't have been secretly inferred from other senses.

replies(1): >>43653630 #
6. throwaway7783 ◴[] No.43649780[source]
One can look at creativity as discovery of a hitherto unknown pattern in a very large space of patterns.

No reason to think an LLM (a few generations down the line if not now) cannot do that

replies(1): >>43650668 #
7. skydhash ◴[] No.43650668[source]
Not really, sometimes it's just plausible lies. We distort the world, but respects some basic rules, making it believable. Another difference from LLMs is that we can store this distortion and lay upon it as $TRUTH.

And we can distort quite far (see cartoons in drawing, dubstep in music,...)

replies(1): >>43667663 #
8. andsoitis ◴[] No.43653630{4}[source]
The brain turns the raw input from the eyes into the rich, layered visual experience we have of the world:

- basic features (color, brightness and contrast, edges and shapes, motion and direction)

- depth and spatial relationships

- recognition

- location and movement

- focus and attention

- prediction and filling in gaps

“Seeing” real world requires much more than simply seeing with one eye.

9. throwaway7783 ◴[] No.43667663{3}[source]
What you are saying does not seem to contradict what I'm saying. Any distortion would be another hitherto unknown pattern.