The wall confronting large language models

(arxiv.org)

170 points PaulHoule | 1 comments | 03 Sep 25 11:40 UTC | HN request time: 0s | source

Show context

measurablefunc ◴[03 Sep 25 20:29 UTC] No.45120049[source]▶

There is a formal extensional equivalence between Markov chains & LLMs but the only person who seems to be saying anything about this is Gary Marcus. He is constantly making the point that symbolic understanding can not be reduced to a probabilistic computation regardless of how large the graph gets it will still be missing basic stuff like backtracking (which is available in programming languages like Prolog). I think that Gary is right on basically all counts. Probabilistic generative models are fun but no amount of probabilistic sequence generation can be a substitute for logical reasoning.

replies(16): >>45120249 #>>45120259 #>>45120415 #>>45120573 #>>45120628 #>>45121159 #>>45121215 #>>45122702 #>>45122805 #>>45123808 #>>45123989 #>>45125478 #>>45125935 #>>45129038 #>>45130942 #>>45131644 #

Certhas ◴[03 Sep 25 20:48 UTC] No.45120259[source]▶

>>45120049 #

I don't understand what point you're hinting at.

Either way, I can get arbitrarily good approximations of arbitrary nonlinear differential/difference equations using only linear probabilistic evolution at the cost of a (much) larger state space. So if you can implement it in a brain or a computer, there is a sufficiently large probabilistic dynamic that can model it. More really is different.

So I view all deductive ab-initio arguments about what LLMs can/can't do due to their architecture as fairly baseless.

(Note that the "large" here is doing a lot of heavy lifting. You need _really_ large. See https://en.m.wikipedia.org/wiki/Transfer_operator)

replies(5): >>45120313 #>>45120341 #>>45120344 #>>45123837 #>>45124441 #

measurablefunc ◴[03 Sep 25 20:57 UTC] No.45120344[source]▶

>>45120259 #

What part about backtracking is baseless? Typical Prolog interpreters can be implemented in a few MBs of binary code (the high level specification is even simpler & can be in a few hundred KB)¹ but none of the LLMs (open source or not) are capable of backtracking even though there is plenty of room for a basic Prolog interpreter. This seems like a very obvious shortcoming to me that no amount of smooth approximation can overcome.

If you think there is a threshold at which point some large enough feedforward network develops the capability to backtrack then I'd like to see your argument for it.

¹https://en.wikipedia.org/wiki/Warren_Abstract_Machine

replies(3): >>45120516 #>>45121626 #>>45124764 #

bondarchuk ◴[03 Sep 25 21:18 UTC] No.45120516[source]▶

>>45120344 #

Backtracking makes sense in a search context which is basically what prolog is. Why would you expect a next-token-predictor to do backtracking and what should that even look like?

replies(2): >>45120536 #>>45120766 #

measurablefunc ◴[03 Sep 25 21:20 UTC] No.45120536[source]▶

>>45120516 #

I don't expect a Markov chain to be capable of backtracking. That's the point I am making. Logical reasoning as it is implemented in Prolog interpreters is not something that can be done w/ LLMs regardless of the size of their weights, biases, & activation functions between the nodes in the graph.

replies(4): >>45120598 #>>45121266 #>>45122145 #>>45124657 #

bondarchuk ◴[03 Sep 25 21:27 UTC] No.45120598[source]▶

>>45120536 #

Imagine the context window contains A-B-C, C turns out a dead end and we want to backtrack to B and try another branch. Then the LLM could produce outputs such that the context window would become A-B-C-[backtrack-back-to-B-and-don't-do-C] which after some more tokens could become A-B-C-[backtrack-back-to-B-and-don't-do-C]-D. This would essentially be backtracking and I don't see why it would be inherently impossible for LLMs as long as the different branches fit in context.

replies(1): >>45120655 #

measurablefunc ◴[03 Sep 25 21:35 UTC] No.45120655[source]▶

>>45120598 #

If you think it is possible then I'd like to see an implementation of a sudoku puzzle solver as Markov chain. This is a simple enough problem that can be implemented in a few dozen lines of Prolog but I've never seen a solver implemented as a Markov chain.

replies(4): >>45120792 #>>45120895 #>>45120900 #>>45121022 #

1. lelanthran ◴[03 Sep 25 22:22 UTC] No.45121022{7}[source]▶

>>45120655 #

> If you think it is possible then I'd like to see an implementation of a sudoku puzzle solver as Markov chain. This is a simple enough problem that can be implemented in a few dozen lines of Prolog but I've never seen a solver implemented as a Markov chain.

I think it can be done. I started a chatbot that works like this some time back (2024) but paused work on it since January.

In brief, you shorten the context by discarding the context that didn't work out.

↑