The theme that LLMs reproduce knowledge from their training data rather than reason about it seems like one argument that will end up wrong pretty soon.
When given the prompt “Which is heavier, one pound of feathers or two pounds of feathers?”, GPT3.5 gives a bizarre answer: “One pound of feathers and two pounds of feathers both weigh the same amount, which is two pounds.” Presumably this is because circa 2016-2017 there was a large internet discussion of the riddle “which weighs more, a pound of feathers or a pound of steel”, and text from this discussion found its way into the training data for the model.
I see no reason why the training data would have changed to substantially exclude that discussion, and yet here is GPT 4: “Two pounds of feathers are heavier than one pound of feathers.” Improvements in the model appear to be improving the model’s ability to reason from the training data rather than merely reproduce it.
The theme that AI won’t replace e.g. lawyers because it is more knowledge base than reasoning engine also reminds me of early opinions in computer chess discussions, which held that computers were more tactics solvers (short-term several move look-ahead to avoid forks and traps) than strategy planners (long-term construction of multi-piece attacks, protecting small advantages and growing them into large advantages over multiple dozens of moves). With the benefit of hindsight we saw that more of strategy was actually just tactics in disguise than we thought, and that increasing compute could produce real strategy capabilities besides.
Separately, there is another theme I see in their writing, and also in some of them comments here: that humans passing standardized tests are doing something fundamentally different than LLMs passing standardized tests. The only thing that’s ‘uniquely human’ is being human, everything else is outputs from a black box. Arguments that ‘what’s inside the black box matters’ are risky, because the outputs gradually converge to indistinguishability; there’s no bright line to step off that train and pretty soon you end up like the person described in Boretti’s And Yet It Understands:
“There is a species of denialist for whom no evidence whatever will convince them that a computer is doing anything other than shuffling symbols without understanding them, because “Concepts” and “Ideas” are exclusive to humans (they live in the Leibniz organ, presumably, where they pupate from the black bile). … [These people are] so committed to human chauvinism [that] they will soon start denying their own sentience because their brains are made of flesh and not Chomsky production rules.”