They are fundamentally mathematical models which extrapolate from data points, and occasionally they will extrapolate in a way that is consistent with reality, i.e. they will approximate uncharted territory with reasonable accuracy.
Of course, being able to tell the difference (both for the human and the machine) is the real trick!
Reasoning seems to be a case where the model uncovers what, to some degree, it already "knows".
Conversely, some experimental models (e.g. Meta's work with Concepts) shift that compute to train time, i.e. spend more compute per training token. Either way, they're mining "more meaning" out of the data by "working harder".
This is one area where I see that synthetic data could have a big advantage. Training the next gen of LLMs on the results of the previous generation's thinking would mean that you "cache" that thinking -- it doesn't need to start from scratch every time, so it could solve problems more efficiently, and (given the same resources) it would be able to go further.
Of course, the problem here is that most reasoning is dogshit, and you'd need to first build a system smart enough to pick out the good stuff...
---
It occurs to me now that you rather hoped for a concrete example. The ones that come to mind involve drawing parallels between seemingly unrelated things. On some level, things are the same shape.
I argue that noticing such a connection, such a pattern, and naming it, constitutes new and useful knowledge. This is something I spend a lot of time doing (mostly for my own amusement!), and I've found that LLMs are surprisingly good at it. They can use known patterns to coherently describe previously unnamed ones.
In other words, they map concepts onto other concepts in ways that hasn't been done before. What I'm referring to here is, I will prompt the LLM with some such query, and it will "get it", in ways I wasn't expecting. The real trick would be to get it to do that on its own, i.e. without me prompting it (or, with current tech, find a way to get it to prompt itself that produces similar results... and then feed that into some kind of Novelty+Coherence filtering system, i.e. the "real trick" again... :).
A specific example eludes me now, but it's usually a matter of "X is actually a special case of Y", or "how does X map onto Y". It's pretty good at mapping the territory. It's not "creating new territory" by doing that, it's just pointing out things that "have always been there, but nobody has looked at before", if that makes sense.