The authors are computer scientists and people who work with large scale dynamic system. They aren't people who've actually produced an industry-scale LLM. However, I have to note that despite lots of practical progress in deep learning/transformers/etc systems, all the theory involved just analogies and equations of a similar sort, it's all alchemy and so people really good at producing these models seem to be using a bunch of effective rules of thumb and not any full or established models (despite books claiming to offer a mathematical foundation for enterprise, etc).
Which is to say, "outside of core competence" doesn't mean as much as it would for medicine or something.
1. Sequence Models relying on a markov chain, with and without summarization to extend beyond fixed length horizons. 2. All forms of attention mechanisms/dense layers. 3. A specific Transformer architecture.
That there exists a limit on the representation or prediction powers of the model for tasks of all input/output token lengths or fixed size N input tokens/M output tokens. *Based On* a derived cost growth schedule for model size, data size, compute budgets.
Separately, I would have expected a clear literature review of existing mathematical studies on LLM capabilities and limitations - for which there are *many*. Including studies that purport that Transformers can represent any program of finite pre-determined execution length.