The wall confronting large language models

1. Scene_Cast2 ◴[03 Sep 25 17:58 UTC] No.45118686[source]▶

The paper is hard to read. There is no concrete worked-through example, the prose is over the top, and the equations don't really help. I can't make head or tail of this paper.

replies(3): >>45118775 #>>45119154 #>>45120083 #

2. lumost ◴[03 Sep 25 18:06 UTC] No.45118775[source]▶

>>45118686 (TP) #

This appears to be a position paper written by authors outside of their core field. The presentation of "the wall" is only through analogy to derivatives on the discrete values computer's operate in.

replies(2): >>45119119 #>>45119709 #

3. joe_the_user ◴[03 Sep 25 18:43 UTC] No.45119119[source]▶

>>45118775 #

Paper seems to involve a series of analogies and equations. However, I think if the equations accepted, the "wall" is actually derived.

The authors are computer scientists and people who work with large scale dynamic system. They aren't people who've actually produced an industry-scale LLM. However, I have to note that despite lots of practical progress in deep learning/transformers/etc systems, all the theory involved just analogies and equations of a similar sort, it's all alchemy and so people really good at producing these models seem to be using a bunch of effective rules of thumb and not any full or established models (despite books claiming to offer a mathematical foundation for enterprise, etc).

Which is to say, "outside of core competence" doesn't mean as much as it would for medicine or something.

replies(2): >>45119694 #>>45127357 #

4. ◴[03 Sep 25 18:47 UTC] No.45119154[source]▶

>>45118686 (TP) #

5. ACCount37 ◴[03 Sep 25 19:43 UTC] No.45119694{3}[source]▶

>>45119119 #

No, that's all the more reason to distrust major, unverified claims made by someone "outside of core competence".

Applied demon summoning is ruled by empiricism and experimentation. The best summoners in the field are the ones who have a lot of practical experience and a sharp, honed intuition for the bizarre dynamics of the summoning process. And even those very summoners, specialists worth their weight in gold, are slaves to the experiment! Their novel ideas and methods and refinements still fail more often than they succeed!

One of the first lessons you have to learn in the field is that of humility. That your "novel ideas" and "brilliant insights" are neither novel nor brilliant - and the only path to success lies through things small and testable, most of which do not survive the test.

With that, can you trust the demon summoning knowledge of someone who has never drawn a summoning diagram?

replies(3): >>45119735 #>>45120082 #>>45120250 #

6. jibal ◴[03 Sep 25 19:45 UTC] No.45119709[source]▶

>>45118775 #

If you look at their other papers, you will see that this is very much within their core field.

replies(3): >>45119914 #>>45120336 #>>45124453 #

7. jibal ◴[03 Sep 25 19:48 UTC] No.45119735{4}[source]▶

>>45119694 #

Somehow the game of telephone took us from "outside of their core field" (which wasn't true) to "outside of core competence" (which is grossly untrue).

> One of the first lessons you have to learn in the field is that of humility.

I suggest then that you make your statements less confidently.

8. lumost ◴[03 Sep 25 20:12 UTC] No.45119914{3}[source]▶

>>45119709 #

Their other papers are on simulation and applied chemistry. Where does their expertise in Machine Learning, or Large Language Models derive from?

While it's not a requirement to have published in a field before publishing in a field. Having a coauthor who is from the target field or a peer review venue in that field as an entry point certainly raises credibility.

From my limited claim to be in either Machine Learning or Large Language Models the paper does not appear to demonstrate what it claims. The author's language addresses the field of Machine Learning and LLM development as you would a young student - which does not help make their point.

replies(1): >>45132135 #

9. cwmoore ◴[03 Sep 25 20:32 UTC] No.45120082{4}[source]▶

>>45119694 #

Your passions may have run away with you.

https://news.ycombinator.com/item?id=45114753

10. ForHackernews ◴[03 Sep 25 20:48 UTC] No.45120250{4}[source]▶

>>45119694 #

The freshly-summoned Gaap-5 was rumored to be the most accursed spirit ever witnessed by mankind, but so far it seems not dramatically more evil than previous demons, despite having been fed vastly more humans souls.

replies(1): >>45120510 #

11. JohnKemeny ◴[03 Sep 25 20:57 UTC] No.45120336{3}[source]▶

>>45119709 #

He's a chemist. Lots of chemists and physicists like to talk about computation without having any background in it.

I'm not saying anything about the content, merely making a remark.

replies(3): >>45120611 #>>45122263 #>>45122690 #

12. lazide ◴[03 Sep 25 21:17 UTC] No.45120510{5}[source]▶

>>45120250 #

Perhaps we’re reaching peak demon?

13. chermi ◴[03 Sep 25 21:29 UTC] No.45120611{4}[source]▶

>>45120336 #

You're really not saying anything? Just a random remark with no bearing?

Seth Lloyd, Wolpert, Landauer, Bennet, Fredkin, Feynman, Sejnowski, Hopfield, Zechinna, parisi,mezard, and zdebvora, Crutchfeld, Preskill, Deutsch, Manin, Szilard, MacKay....

I wish someone told them to shut up about computing. And I wouldn't dare claim von Neumann as merely a physicist, but that's where he was coming from. Oh and as much as I dislike him, Wolfram.

replies(1): >>45124354 #

14. 11101010001100 ◴[04 Sep 25 01:05 UTC] No.45122263{4}[source]▶

>>45120336 #

Succi is no slouch; hardcore multiscale physics guy, among other things.

15. godelski ◴[04 Sep 25 02:05 UTC] No.45122690{4}[source]▶

>>45120336 #

  > Lots of chemists and physicists like to talk about computation without having any background in it.

I'm confused. Physicists deal with computation all the time. Are you confusing computation with programming? There's a big difference. Physicists and chemists are frequently at odds with the limits of computability. Remember, Turing, Church, and even Knuth obtained degrees in mathematics. The divide isn't so clear cut and there's lots of overlaps. I think if you go look at someone doing their PhD in Programming Languages you could easily be mistake them for a mathematician.

Looking at the authors I don't see why this is out of their domain. Succi[0] looks like he deals a lot with fluid dynamics and has a big focus on Lattice Boltzmann. Modern fluid dynamics is all about computability and its limits. There's a lot of this that goes into the Navier–Stokes problem (even Terry Tao talks about this[1]), which is a lot about computational reproducibility.

Coveney[2] is a harder read for me, but doesn't seem suspect. Lots of work in molecular dynamics, so shares a lot of tools with Succi (seems like they like to work together too). There's a lot of papers there, but sorting by year there's quite a few that scream "limits of computability" to me.

I can't make strong comments without more intimate knowledge of their work, but nothing here is a clear red flag. I think you're misinterpreting because this is a position paper, written in the style you'd expect from a more formal field, but also is kinda scatterd. I've only done a quick read, -- don't get me wrong, I have critiques -- but there's no red flags that warrant quick dismissal. (My background: physicist -> computational physics -> ML) There's things they are pointing to that are more discussed within the more mathematically inclined sides of ML (it's a big field... even if only a small subset are most visible). I'll at least look at some of their other works on the topic as it seems they've written a few papers.

[0] https://scholar.google.com/citations?user=XrI0ffIAAAAJ

[1] I suspect this well above the average HN reader, but pay attention to what they mean by "blowup" and "singularity" https://terrytao.wordpress.com/tag/navier-stokes-equations/

[2] https://scholar.google.com/citations?user=_G6FZ6YAAAAJ

replies(2): >>45124095 #>>45124370 #

16. calf ◴[04 Sep 25 06:09 UTC] No.45124095{5}[source]▶

>>45122690 #

There are some good example posts on Scott Aaronson's blog where he eviscerated shoddy physicists' take on quantum complexity theory. Physicists today aren't like Turing et al, most never picked up a theory of computer science book and actually worked through the homework exercises, and with AI pivot and paper spawning, this is kind of a general problem (arguably more interdisciplinary expertise is needed but people need to actually take the time to learn material and internalize it without making sophomore mistakes etc.).

replies(1): >>45125416 #

17. JohnKemeny ◴[04 Sep 25 06:56 UTC] No.45124354{5}[source]▶

>>45120611 #

As you note, some physicists do have computing backgrounds. I'm not suggesting they can't do computer science.

But today, most people hold opinions about LLMs, both as to their limits and their potential, without any real knowledge of computational linguistics nor of deep learning.

replies(1): >>45140626 #

18. JohnKemeny ◴[04 Sep 25 06:59 UTC] No.45124370{5}[source]▶

>>45122690 #

Turing, Church, and even Knuth got their degrees before CS was an academic discipline. At least I don't think Turing studied Turing machines in his undergrads.

I'm saying that lots of people like to post their opinions of LLMs regardless of whether or not they actually have any competence in either computational linguistics or deep learning.

replies(2): >>45125375 #>>45128377 #

19. JohnKemeny ◴[04 Sep 25 07:10 UTC] No.45124453{3}[source]▶

>>45119709 #

Look at their actual papers before making a comment of what is or isn't their core field: https://dblp.org/pid/35/3081.html

replies(1): >>45128392 #

20. godelski ◴[04 Sep 25 09:38 UTC] No.45125375{6}[source]▶

>>45124370 #

Sure, but how long ago was that? Do you really think the fields fully decoupled in such a small time? That's the entire point of that comment

replies(1): >>45128423 #

21. ◴[04 Sep 25 09:46 UTC] No.45125416{6}[source]▶

>>45124095 #

22. lumost ◴[04 Sep 25 13:58 UTC] No.45127357{3}[source]▶

>>45119119 #

I will venture my 2 cents, the equations kinda sorta look like something - but in no way approach a derivation of the wall. Specifically, I would have looked for a derivation which proved for one of/all of

1. Sequence Models relying on a markov chain, with and without summarization to extend beyond fixed length horizons. 2. All forms of attention mechanisms/dense layers. 3. A specific Transformer architecture.

That there exists a limit on the representation or prediction powers of the model for tasks of all input/output token lengths or fixed size N input tokens/M output tokens. *Based On* a derived cost growth schedule for model size, data size, compute budgets.

Separately, I would have expected a clear literature review of existing mathematical studies on LLM capabilities and limitations - for which there are *many*. Including studies that purport that Transformers can represent any program of finite pre-determined execution length.

23. jibal ◴[04 Sep 25 15:29 UTC] No.45128377{6}[source]▶

>>45124370 #

Your whole take is extraordinarily ad hominem. The paper in question is not just people posting opinions.

24. jibal ◴[04 Sep 25 15:30 UTC] No.45128392{4}[source]▶

>>45124453 #

I did. And don't tell me what I can or can't comment on.

25. jibal ◴[04 Sep 25 15:32 UTC] No.45128423{7}[source]▶

>>45125375 #

The fellow is engaged in some pretty intense gatekeeping.

26. stonogo ◴[04 Sep 25 21:01 UTC] No.45132135{4}[source]▶

>>45119914 #

If you can't look at that publication list and see their expertise in macine learning, then it may be that they know more about your field than you know about theirs. Nothing wrong with that! Computational chemists use different terminology than computer scientists but there is significant overlap in the fields.

27. chermi ◴[05 Sep 25 16:43 UTC] No.45140626{6}[source]▶

>>45124354 #

Huh? Have you heard of learning something new? Physicists and scientists at large are pretty good at it. Do you want some certification program to determine who's allowed to opine? If someone is wrong, tell them and show them they're wrong. Don't preemptively dismiss ideas based on some authority mechanism.

Here's another example in case you still don't get the point - Schrodinger had no business talking about biology because he wasn't trained in it, right? Nevermind him being ahead of the entire field on understanding the role of "DNA"(yet undiscovered, but he correctly posited the crystal-ish structure) and information in evolution and inspiring Watson's quest to figure out DNA.

Judge ideas on the merit of the idea itself. It's not about whether they have computing backgrounds, its about the ideas.

Hell, look at the history of deep learning with Minsky's book. Sure glad everyone listened to the linguistics expert there...