The wall confronting large language models

(arxiv.org)

Show context

measurablefunc ◴[03 Sep 25 20:29 UTC] No.45120049[source]▶

There is a formal extensional equivalence between Markov chains & LLMs but the only person who seems to be saying anything about this is Gary Marcus. He is constantly making the point that symbolic understanding can not be reduced to a probabilistic computation regardless of how large the graph gets it will still be missing basic stuff like backtracking (which is available in programming languages like Prolog). I think that Gary is right on basically all counts. Probabilistic generative models are fun but no amount of probabilistic sequence generation can be a substitute for logical reasoning.

replies(16): >>45120249 #>>45120259 #>>45120415 #>>45120573 #>>45120628 #>>45121159 #>>45121215 #>>45122702 #>>45122805 #>>45123808 #>>45123989 #>>45125478 #>>45125935 #>>45129038 #>>45130942 #>>45131644 #

Certhas ◴[03 Sep 25 20:48 UTC] No.45120259[source]▶

>>45120049 #

I don't understand what point you're hinting at.

Either way, I can get arbitrarily good approximations of arbitrary nonlinear differential/difference equations using only linear probabilistic evolution at the cost of a (much) larger state space. So if you can implement it in a brain or a computer, there is a sufficiently large probabilistic dynamic that can model it. More really is different.

So I view all deductive ab-initio arguments about what LLMs can/can't do due to their architecture as fairly baseless.

(Note that the "large" here is doing a lot of heavy lifting. You need _really_ large. See https://en.m.wikipedia.org/wiki/Transfer_operator)

replies(5): >>45120313 #>>45120341 #>>45120344 #>>45123837 #>>45124441 #

measurablefunc ◴[03 Sep 25 20:57 UTC] No.45120344[source]▶

>>45120259 #

What part about backtracking is baseless? Typical Prolog interpreters can be implemented in a few MBs of binary code (the high level specification is even simpler & can be in a few hundred KB)¹ but none of the LLMs (open source or not) are capable of backtracking even though there is plenty of room for a basic Prolog interpreter. This seems like a very obvious shortcoming to me that no amount of smooth approximation can overcome.

If you think there is a threshold at which point some large enough feedforward network develops the capability to backtrack then I'd like to see your argument for it.

¹https://en.wikipedia.org/wiki/Warren_Abstract_Machine

replies(3): >>45120516 #>>45121626 #>>45124764 #

bondarchuk ◴[03 Sep 25 21:18 UTC] No.45120516[source]▶

>>45120344 #

Backtracking makes sense in a search context which is basically what prolog is. Why would you expect a next-token-predictor to do backtracking and what should that even look like?

replies(2): >>45120536 #>>45120766 #

measurablefunc ◴[03 Sep 25 21:20 UTC] No.45120536[source]▶

>>45120516 #

I don't expect a Markov chain to be capable of backtracking. That's the point I am making. Logical reasoning as it is implemented in Prolog interpreters is not something that can be done w/ LLMs regardless of the size of their weights, biases, & activation functions between the nodes in the graph.

replies(4): >>45120598 #>>45121266 #>>45122145 #>>45124657 #

vidarh ◴[03 Sep 25 22:57 UTC] No.45121266{5}[source]▶

>>45120536 #

A (2,3) Turing machine can be trivially implemented with a loop around an LLM that treats the context as an IO channel, and a Prolog interpreter runs on a Turing complete computer, and so per Truing equivalence you can run a Prolog interpreter on an LLM.

Of course this would be pointless, but it demonstrates that a system where an LLM provides the logic can backtrack, as there's nothing computationally special about backtracking.

That current UIs to LLMs are set up for conversation-style use that makes this harder isn't an inherent limitation of what we can do with LLMs.

replies(1): >>45121294 #

measurablefunc ◴[03 Sep 25 23:01 UTC] No.45121294[source]▶

>>45121266 #

Loop around an LLM is not an LLM.

replies(1): >>45121337 #

1. vidarh ◴[03 Sep 25 23:07 UTC] No.45121337[source]▶

>>45121294 #

Then no current systems you are using are LLMs

replies(1): >>45121409 #

2. measurablefunc ◴[03 Sep 25 23:16 UTC] No.45121409[source]▶

>>45121337 (TP) #

Choice-free feedforward graphs are LLMs. The inputs/outputs are extensionally equivalent to context and transition probabilities of a Markov chain. What exactly is your argument b/c what it looks like to me is you're simply making a Turing tarpit argument which does not address any of my points.

replies(1): >>45121690 #

3. vidarh ◴[03 Sep 25 23:50 UTC] No.45121690[source]▶

>>45121409 #

My argument is that artificially limiting what you argue about to a subset of the systems people are actually using and arguing about the limitations of that makes your argument irrelevant to what people are actually using.

replies(1): >>45146244 #

4. measurablefunc ◴[06 Sep 25 02:58 UTC] No.45146244{3}[source]▶

>>45121690 #

So where is the error exactly? Loop around is simply a repetition of the argument for the equivalence between an LLM & a Markov chain. It doesn't matter how many times you sample the trajectories from either one, they're still extensionally equivalent.

replies(1): >>45156548 #

5. vidarh ◴[07 Sep 25 08:54 UTC] No.45156548{4}[source]▶

>>45146244 #

Since an LLM with a loop is trivially and demonstrably Turing complete if you allow it to use the context as an IO channel (and thereby memory), by extension arguing there's some limitation that prevents an LLM from doing what Prolog can is logically invalid.

In other words, this claim is categorically false:

> Logical reasoning as it is implemented in Prolog interpreters is not something that can be done w/ LLMs regardless of the size of their weights, biases, & activation functions between the nodes in the graph.

What is limiting "just" an LLM is not the ability of the model to encode reasoning, but the lack of a minimal and trivial runtime scaffolding to let it use it's capabilities.

replies(1): >>45157743 #

6. measurablefunc ◴[07 Sep 25 12:51 UTC] No.45157743{5}[source]▶

>>45156548 #

> Since an LLM with a loop is trivially and demonstrably Turing complete

Where is the demonstration?

↑