If we had, there would be no reason to train a model with more parameters than are strictly necessary to represent the space's semantic structure. But then it should be impossible for distilled models with less parameters to come close to the performance of the original model.
Yet this is what happens - the distilled or quantized models often come very close to the original model.
So I think there are still many low-hanging fruits to pick.
We do understand how they work, we just have not optimised their usage.
For example someone who has a good general understanding of how an ICE or EV car works. Even if the user interface is very unfamiliar, they can figure out how to drive any car within a couple of minutes.
But that does not mean they can race a car, drift a car or drive a car on challenging terrain even if the car is physically capable of all these things.
Would a different sampler help you? I dunno, try it. Would a smaller dataset help? I dunno, try it. Would training the model for 5000 days help? I dunno, try it.
Car technology is the opposite of that - it’s a white box. It’s composed of very well defined elements whose interactions are defined and explained by laws of thermodynamics and whatnot.
It's like saying we don't understand how quantum chromodynamics works. Very few people do, and it's the kind of knowledge not easily distilled for the masses in an easily digestible in a popsci way.
Look into how older CNNs work -- we have very good visual/accesible/popsci materials on how they work.
I'm sure we'll have that for LLM but it's not worth it to the people who can produce that kind of material to produce it now when the field is moving so rapidly, those people's time is much better used in improving the LLMs.
The kind of progress being made leads me to believe there absolutely ARE people who absolutely know how the LLMs work and they're not just a bunch of monkeys randomly throwing things at GPUs and seeing what sticks.
Just like alchemists made enormous strides in chemistry, but their goal was to turn piss into gold.