The wall confronting large language models

(arxiv.org)

170 points PaulHoule | 1 comments | 03 Sep 25 11:40 UTC | HN request time: 0s | source

Show context

measurablefunc ◴[03 Sep 25 20:29 UTC] No.45120049[source]▶

There is a formal extensional equivalence between Markov chains & LLMs but the only person who seems to be saying anything about this is Gary Marcus. He is constantly making the point that symbolic understanding can not be reduced to a probabilistic computation regardless of how large the graph gets it will still be missing basic stuff like backtracking (which is available in programming languages like Prolog). I think that Gary is right on basically all counts. Probabilistic generative models are fun but no amount of probabilistic sequence generation can be a substitute for logical reasoning.

replies(16): >>45120249 #>>45120259 #>>45120415 #>>45120573 #>>45120628 #>>45121159 #>>45121215 #>>45122702 #>>45122805 #>>45123808 #>>45123989 #>>45125478 #>>45125935 #>>45129038 #>>45130942 #>>45131644 #

Certhas ◴[03 Sep 25 20:48 UTC] No.45120259[source]▶

>>45120049 #

I don't understand what point you're hinting at.

Either way, I can get arbitrarily good approximations of arbitrary nonlinear differential/difference equations using only linear probabilistic evolution at the cost of a (much) larger state space. So if you can implement it in a brain or a computer, there is a sufficiently large probabilistic dynamic that can model it. More really is different.

So I view all deductive ab-initio arguments about what LLMs can/can't do due to their architecture as fairly baseless.

(Note that the "large" here is doing a lot of heavy lifting. You need _really_ large. See https://en.m.wikipedia.org/wiki/Transfer_operator)

replies(5): >>45120313 #>>45120341 #>>45120344 #>>45123837 #>>45124441 #

measurablefunc ◴[03 Sep 25 20:57 UTC] No.45120344[source]▶

>>45120259 #

What part about backtracking is baseless? Typical Prolog interpreters can be implemented in a few MBs of binary code (the high level specification is even simpler & can be in a few hundred KB)¹ but none of the LLMs (open source or not) are capable of backtracking even though there is plenty of room for a basic Prolog interpreter. This seems like a very obvious shortcoming to me that no amount of smooth approximation can overcome.

If you think there is a threshold at which point some large enough feedforward network develops the capability to backtrack then I'd like to see your argument for it.

¹https://en.wikipedia.org/wiki/Warren_Abstract_Machine

replies(3): >>45120516 #>>45121626 #>>45124764 #

bondarchuk ◴[03 Sep 25 21:18 UTC] No.45120516[source]▶

>>45120344 #

Backtracking makes sense in a search context which is basically what prolog is. Why would you expect a next-token-predictor to do backtracking and what should that even look like?

replies(2): >>45120536 #>>45120766 #

PaulHoule ◴[03 Sep 25 21:51 UTC] No.45120766[source]▶

>>45120516 #

If you want general-purpose generation than it has to be able to respect constraints (e.g. figure art of a person has 0..1 belly buttons, 0..2 legs is unspoken) as it is generative models usually get those things right but don't always if they can stick together the tiles they use internally in some combination that makes sense locally but not globally.

General intelligence may not be SAT/SMT solving but it has to be able to do it, hence, backtracking.

Today I had another of those experiences of the weaknesses of LLM reasoning, one that happens a lot when doing LLM-assisted coding. I was trying to figure out how to rebuild some CSS after the HTML changed for accessibility purposes and got a good idea for how to do it from talking to the LLM but at that point the context was poisoned, probably because there was a lot of content about the context describing what we were thinking about at different stages of the conversation which evolved considerably. It lost its ability to follow instructions and I'd tell it specifically to do this or do that and it just wouldn't do it properly and this happens a lot if a session goes on too long.

My guess is that the attention mechanism is locking on to parts of the conversation which are no longer relevant to where I think we're at and in general the logic that considers the variation of either a practice (instances) or a theory over time is a very tricky problem and 'backtracking' is a specific answer for maintaining your knowledge base across a search process.

replies(2): >>45121102 #>>45123910 #

XenophileJKO ◴[03 Sep 25 22:34 UTC] No.45121102{5}[source]▶

>>45120766 #

What if you gave the model a tool to "willfully forget" a section of context. That would be easy to make. Hmm I might be onto something.

replies(1): >>45121201 #

1. PaulHoule ◴[03 Sep 25 22:48 UTC] No.45121201{6}[source]▶

>>45121102 #

I guess you could have some kind of mask that would let you suppress some of the context from matching, but my guess is that kind of thing might cause problems as often as it solves them.

Back when I was thinking about commonsense reasoning with logic it was obviously a much more difficult problem to add things like "P was true before time t", "there will be some time t in the future such at P is true", "John believes Mary believes that P is true", "It is possible that P is true", "there is some person q who believes that P is true", particularly when you combine these qualifiers. For one thing you don't even have a sound and complete strategy for reasoning over first-order logic + arithmetic but you also have a combinatorical explosion over the qualifiers.

Back in the day I thought it was important to have sound reasoning procedures but one of the reasons none of my foundation models ever became ChatGPT was that I cared about that and I really needed to ask "does change C cause an unsound procedure to get the right answer more often?" and not care if the reasoning procedure was sound or not.

↑