The authors are computer scientists and people who work with large scale dynamic system. They aren't people who've actually produced an industry-scale LLM. However, I have to note that despite lots of practical progress in deep learning/transformers/etc systems, all the theory involved just analogies and equations of a similar sort, it's all alchemy and so people really good at producing these models seem to be using a bunch of effective rules of thumb and not any full or established models (despite books claiming to offer a mathematical foundation for enterprise, etc).
Which is to say, "outside of core competence" doesn't mean as much as it would for medicine or something.
Applied demon summoning is ruled by empiricism and experimentation. The best summoners in the field are the ones who have a lot of practical experience and a sharp, honed intuition for the bizarre dynamics of the summoning process. And even those very summoners, specialists worth their weight in gold, are slaves to the experiment! Their novel ideas and methods and refinements still fail more often than they succeed!
One of the first lessons you have to learn in the field is that of humility. That your "novel ideas" and "brilliant insights" are neither novel nor brilliant - and the only path to success lies through things small and testable, most of which do not survive the test.
With that, can you trust the demon summoning knowledge of someone who has never drawn a summoning diagram?
> One of the first lessons you have to learn in the field is that of humility.
I suggest then that you make your statements less confidently.
While it's not a requirement to have published in a field before publishing in a field. Having a coauthor who is from the target field or a peer review venue in that field as an entry point certainly raises credibility.
From my limited claim to be in either Machine Learning or Large Language Models the paper does not appear to demonstrate what it claims. The author's language addresses the field of Machine Learning and LLM development as you would a young student - which does not help make their point.
I'm not saying anything about the content, merely making a remark.
Seth Lloyd, Wolpert, Landauer, Bennet, Fredkin, Feynman, Sejnowski, Hopfield, Zechinna, parisi,mezard, and zdebvora, Crutchfeld, Preskill, Deutsch, Manin, Szilard, MacKay....
I wish someone told them to shut up about computing. And I wouldn't dare claim von Neumann as merely a physicist, but that's where he was coming from. Oh and as much as I dislike him, Wolfram.
> Lots of chemists and physicists like to talk about computation without having any background in it.
I'm confused. Physicists deal with computation all the time. Are you confusing computation with programming? There's a big difference. Physicists and chemists are frequently at odds with the limits of computability. Remember, Turing, Church, and even Knuth obtained degrees in mathematics. The divide isn't so clear cut and there's lots of overlaps. I think if you go look at someone doing their PhD in Programming Languages you could easily be mistake them for a mathematician.Looking at the authors I don't see why this is out of their domain. Succi[0] looks like he deals a lot with fluid dynamics and has a big focus on Lattice Boltzmann. Modern fluid dynamics is all about computability and its limits. There's a lot of this that goes into the Navier–Stokes problem (even Terry Tao talks about this[1]), which is a lot about computational reproducibility.
Coveney[2] is a harder read for me, but doesn't seem suspect. Lots of work in molecular dynamics, so shares a lot of tools with Succi (seems like they like to work together too). There's a lot of papers there, but sorting by year there's quite a few that scream "limits of computability" to me.
I can't make strong comments without more intimate knowledge of their work, but nothing here is a clear red flag. I think you're misinterpreting because this is a position paper, written in the style you'd expect from a more formal field, but also is kinda scatterd. I've only done a quick read, -- don't get me wrong, I have critiques -- but there's no red flags that warrant quick dismissal. (My background: physicist -> computational physics -> ML) There's things they are pointing to that are more discussed within the more mathematically inclined sides of ML (it's a big field... even if only a small subset are most visible). I'll at least look at some of their other works on the topic as it seems they've written a few papers.
[0] https://scholar.google.com/citations?user=XrI0ffIAAAAJ
[1] I suspect this well above the average HN reader, but pay attention to what they mean by "blowup" and "singularity" https://terrytao.wordpress.com/tag/navier-stokes-equations/
But today, most people hold opinions about LLMs, both as to their limits and their potential, without any real knowledge of computational linguistics nor of deep learning.
I'm saying that lots of people like to post their opinions of LLMs regardless of whether or not they actually have any competence in either computational linguistics or deep learning.
1. Sequence Models relying on a markov chain, with and without summarization to extend beyond fixed length horizons. 2. All forms of attention mechanisms/dense layers. 3. A specific Transformer architecture.
That there exists a limit on the representation or prediction powers of the model for tasks of all input/output token lengths or fixed size N input tokens/M output tokens. *Based On* a derived cost growth schedule for model size, data size, compute budgets.
Separately, I would have expected a clear literature review of existing mathematical studies on LLM capabilities and limitations - for which there are *many*. Including studies that purport that Transformers can represent any program of finite pre-determined execution length.
Here's another example in case you still don't get the point - Schrodinger had no business talking about biology because he wasn't trained in it, right? Nevermind him being ahead of the entire field on understanding the role of "DNA"(yet undiscovered, but he correctly posited the crystal-ish structure) and information in evolution and inspiring Watson's quest to figure out DNA.
Judge ideas on the merit of the idea itself. It's not about whether they have computing backgrounds, its about the ideas.
Hell, look at the history of deep learning with Minsky's book. Sure glad everyone listened to the linguistics expert there...