The wall confronting large language models

(arxiv.org)

170 points PaulHoule | 2 comments | 03 Sep 25 11:40 UTC | HN request time: 0.4s | source

Show context

measurablefunc ◴[03 Sep 25 20:29 UTC] No.45120049[source]▶

There is a formal extensional equivalence between Markov chains & LLMs but the only person who seems to be saying anything about this is Gary Marcus. He is constantly making the point that symbolic understanding can not be reduced to a probabilistic computation regardless of how large the graph gets it will still be missing basic stuff like backtracking (which is available in programming languages like Prolog). I think that Gary is right on basically all counts. Probabilistic generative models are fun but no amount of probabilistic sequence generation can be a substitute for logical reasoning.

replies(16): >>45120249 #>>45120259 #>>45120415 #>>45120573 #>>45120628 #>>45121159 #>>45121215 #>>45122702 #>>45122805 #>>45123808 #>>45123989 #>>45125478 #>>45125935 #>>45129038 #>>45130942 #>>45131644 #

Certhas ◴[03 Sep 25 20:48 UTC] No.45120259[source]▶

>>45120049 #

I don't understand what point you're hinting at.

Either way, I can get arbitrarily good approximations of arbitrary nonlinear differential/difference equations using only linear probabilistic evolution at the cost of a (much) larger state space. So if you can implement it in a brain or a computer, there is a sufficiently large probabilistic dynamic that can model it. More really is different.

So I view all deductive ab-initio arguments about what LLMs can/can't do due to their architecture as fairly baseless.

(Note that the "large" here is doing a lot of heavy lifting. You need _really_ large. See https://en.m.wikipedia.org/wiki/Transfer_operator)

replies(5): >>45120313 #>>45120341 #>>45120344 #>>45123837 #>>45124441 #

baselessness ◴[04 Sep 25 07:08 UTC] No.45124441[source]▶

>>45120259 #

That's what this debate has been reduced to. People point out the logical and empirical, by now very obvious limitation of LLMs. And boosters are the equivalent of Chopra's "quantum physics means anything is possible" saying "if you add enough information to a system anything is possible".

replies(1): >>45125289 #

yorwba ◴[04 Sep 25 09:24 UTC] No.45125289[source]▶

>>45124441 #

The argument isn't that anything is possible for LLMs, but that representing LLMs as Markov chains doesn't demonstrate a limitation, because the resulting Markov chain would be huge, much larger than the LLM, and anything that is possible is possible with a large enough Markov chain.

If you limit yourself to Markov chains where the full transition matrix can be stored in a reasonable amount of space (which is the kind of Markov chain that people usually have in mind when they think that Markov chains are very limited), LLMs cannot be represented as such a Markov chain.

If you want to show limitations of LLMs by reducing them to another system of computation, you need to pick one that is more limited than LLMs appear to be, not less.

replies(1): >>45127523 #

1. ariadness ◴[04 Sep 25 14:15 UTC] No.45127523[source]▶

>>45125289 #

> anything that is possible is possible with a large enough Markov chain

This is not true. Do you mean anything that is possible to compute? If yes than you missed the point entirely.

replies(1): >>45134479 #

2. yorwba ◴[05 Sep 25 02:33 UTC] No.45134479[source]▶

>>45127523 (TP) #

It's mostly a consequence of the laws of physics having the Markov property. So the time evolution of any physical system can be modeled as a Markov process. Of course the corresponding state space may in general be infinite.

↑