Popular/hot comments

(arxiv.org)

Show context

Scene_Cast2 ◴[03 Sep 25 17:58 UTC] No.45118686[source]▶

The paper is hard to read. There is no concrete worked-through example, the prose is over the top, and the equations don't really help. I can't make head or tail of this paper.

replies(3): >>45118775 #>>45119154 #>>45120083 #

lumost ◴[03 Sep 25 18:06 UTC] No.45118775[source]▶

>>45118686 #

This appears to be a position paper written by authors outside of their core field. The presentation of "the wall" is only through analogy to derivatives on the discrete values computer's operate in.

replies(2): >>45119119 #>>45119709 #

1. joe_the_user ◴[03 Sep 25 18:43 UTC] No.45119119[source]▶

>>45118775 #

Paper seems to involve a series of analogies and equations. However, I think if the equations accepted, the "wall" is actually derived.

The authors are computer scientists and people who work with large scale dynamic system. They aren't people who've actually produced an industry-scale LLM. However, I have to note that despite lots of practical progress in deep learning/transformers/etc systems, all the theory involved just analogies and equations of a similar sort, it's all alchemy and so people really good at producing these models seem to be using a bunch of effective rules of thumb and not any full or established models (despite books claiming to offer a mathematical foundation for enterprise, etc).

Which is to say, "outside of core competence" doesn't mean as much as it would for medicine or something.

replies(2): >>45119694 #>>45127357 #

2. ACCount37 ◴[03 Sep 25 19:43 UTC] No.45119694[source]▶

>>45119119 (TP) #

No, that's all the more reason to distrust major, unverified claims made by someone "outside of core competence".

Applied demon summoning is ruled by empiricism and experimentation. The best summoners in the field are the ones who have a lot of practical experience and a sharp, honed intuition for the bizarre dynamics of the summoning process. And even those very summoners, specialists worth their weight in gold, are slaves to the experiment! Their novel ideas and methods and refinements still fail more often than they succeed!

One of the first lessons you have to learn in the field is that of humility. That your "novel ideas" and "brilliant insights" are neither novel nor brilliant - and the only path to success lies through things small and testable, most of which do not survive the test.

With that, can you trust the demon summoning knowledge of someone who has never drawn a summoning diagram?

replies(3): >>45119735 #>>45120082 #>>45120250 #

3. jibal ◴[03 Sep 25 19:48 UTC] No.45119735[source]▶

>>45119694 #

Somehow the game of telephone took us from "outside of their core field" (which wasn't true) to "outside of core competence" (which is grossly untrue).

> One of the first lessons you have to learn in the field is that of humility.

I suggest then that you make your statements less confidently.

4. cwmoore ◴[03 Sep 25 20:32 UTC] No.45120082[source]▶

>>45119694 #

Your passions may have run away with you.

https://news.ycombinator.com/item?id=45114753

5. ForHackernews ◴[03 Sep 25 20:48 UTC] No.45120250[source]▶

>>45119694 #

The freshly-summoned Gaap-5 was rumored to be the most accursed spirit ever witnessed by mankind, but so far it seems not dramatically more evil than previous demons, despite having been fed vastly more humans souls.

replies(1): >>45120510 #

6. lazide ◴[03 Sep 25 21:17 UTC] No.45120510{3}[source]▶

>>45120250 #

Perhaps we’re reaching peak demon?

7. lumost ◴[04 Sep 25 13:58 UTC] No.45127357[source]▶

>>45119119 (TP) #

I will venture my 2 cents, the equations kinda sorta look like something - but in no way approach a derivation of the wall. Specifically, I would have looked for a derivation which proved for one of/all of

1. Sequence Models relying on a markov chain, with and without summarization to extend beyond fixed length horizons. 2. All forms of attention mechanisms/dense layers. 3. A specific Transformer architecture.

That there exists a limit on the representation or prediction powers of the model for tasks of all input/output token lengths or fixed size N input tokens/M output tokens. *Based On* a derived cost growth schedule for model size, data size, compute budgets.

Separately, I would have expected a clear literature review of existing mathematical studies on LLM capabilities and limitations - for which there are *many*. Including studies that purport that Transformers can represent any program of finite pre-determined execution length.

↑

The wall confronting large language models