The authors are computer scientists and people who work with large scale dynamic system. They aren't people who've actually produced an industry-scale LLM. However, I have to note that despite lots of practical progress in deep learning/transformers/etc systems, all the theory involved just analogies and equations of a similar sort, it's all alchemy and so people really good at producing these models seem to be using a bunch of effective rules of thumb and not any full or established models (despite books claiming to offer a mathematical foundation for enterprise, etc).
Which is to say, "outside of core competence" doesn't mean as much as it would for medicine or something.
Applied demon summoning is ruled by empiricism and experimentation. The best summoners in the field are the ones who have a lot of practical experience and a sharp, honed intuition for the bizarre dynamics of the summoning process. And even those very summoners, specialists worth their weight in gold, are slaves to the experiment! Their novel ideas and methods and refinements still fail more often than they succeed!
One of the first lessons you have to learn in the field is that of humility. That your "novel ideas" and "brilliant insights" are neither novel nor brilliant - and the only path to success lies through things small and testable, most of which do not survive the test.
With that, can you trust the demon summoning knowledge of someone who has never drawn a summoning diagram?
> One of the first lessons you have to learn in the field is that of humility.
I suggest then that you make your statements less confidently.
1. Sequence Models relying on a markov chain, with and without summarization to extend beyond fixed length horizons. 2. All forms of attention mechanisms/dense layers. 3. A specific Transformer architecture.
That there exists a limit on the representation or prediction powers of the model for tasks of all input/output token lengths or fixed size N input tokens/M output tokens. *Based On* a derived cost growth schedule for model size, data size, compute budgets.
Separately, I would have expected a clear literature review of existing mathematical studies on LLM capabilities and limitations - for which there are *many*. Including studies that purport that Transformers can represent any program of finite pre-determined execution length.