The upshot of this is that LLMs are quite good at the stuff that he thinks only humans will be able to do. What they aren't so good at (yet) is really rigorous reasoning, exactly the opposite of what 20th century people assumed.
The upshot of this is that LLMs are quite good at the stuff that he thinks only humans will be able to do. What they aren't so good at (yet) is really rigorous reasoning, exactly the opposite of what 20th century people assumed.
I mean, not only human-generated text. Also, human brains are arguably statistical models trained on human-generated/collected data as well...
I'd say no, human brains are "trained" on billions of years of sensory data. A very small amount of that is human-generated.
LLMs have access to what we generate, but not the source. So it embed how we may use words, but not why we use this word and not others.
I don't understand this point - we can obviously collect sensory data and use that for training. Many AI/LLM/robotics projects do this today...
> So it embed how we may use words, but not why we use this word and not others.
Humans learn language by observing other humans use language, not by being taught explicit rules about when to use which word and why.
Sensory data is not the main issue, but how we interpret them.
In Jacob Bronowski's The Origins of Knowledge and Imagination, IIRC, there's an argument that our eyes are very coarse sensors. Instead they do basic analysis from which the brain can infer the real world around us with other data from other organs. Like Plato's cave, but with much more dimensions.
But we humans came with the same mechanisms that roughly interpret things the same way. So there's some commonality there about the final interpretation.
> Humans learn language by observing other humans use language, not by being taught explicit rules about when to use which word and why.
Words are symbols that refers to things and the relations between them. In the same book, there's a rough explanation for language which describe the three elements that define it: Symbols or terms, the grammar (or the rules for using the symbols), and a dictionary which maps the symbols to things and the rules to interactions in another domain that we already accept as truth.
Maybe we are not taught the rules explicitly, but there's a lot of training done with corrections when we say a sentence incorrectly. We also learn the symbols and the dictionary as we grow and explore.
So LLMs learn the symbols and the rules, but not the whole dictionary. It can use the rules to create correct sentences, and relates some symbols to other, but ultimately there's no dictionary behind it.
There are 2 types of grammar for natural language - descriptive (how the language actually works and is used) and prescriptive (a set of rule about how a language should be used). There is no known complete and consistent rule-based grammar for any natural human language - all of these grammar are based on some person or people, in a particular period of time, selecting a subset of the real descriptive grammar of the language and saying 'this is the better way'. Prescriptive, rule-based grammar is not at all how humans learn their first language, nor is prescriptive grammar generally complete or consistent. Babies can easily learn any language, even ones that do not have any prescriptive grammar rules, just by observing - there have been many studies that confirm this.
> there's a lot of training done with corrections when we say a sentence incorrectly.
There's a lot of the same training for LLMs.
> So LLMs learn the symbols and the rules, but not the whole dictionary. It can use the rules to create correct sentences, and relates some symbols to other, but ultimately there's no dictionary behind it.
LLMs definitely learn 'the dictionary' (more accurately a set of relations/associations between words and other types of data) and much better than humans do, not that such a 'dictionary' is an actual determined part of the human brain.