The wall confronting large language models

(arxiv.org)

170 points PaulHoule | 2 comments | 03 Sep 25 11:40 UTC | HN request time: 0s | source

Show context

measurablefunc ◴[03 Sep 25 20:29 UTC] No.45120049[source]▶

There is a formal extensional equivalence between Markov chains & LLMs but the only person who seems to be saying anything about this is Gary Marcus. He is constantly making the point that symbolic understanding can not be reduced to a probabilistic computation regardless of how large the graph gets it will still be missing basic stuff like backtracking (which is available in programming languages like Prolog). I think that Gary is right on basically all counts. Probabilistic generative models are fun but no amount of probabilistic sequence generation can be a substitute for logical reasoning.

replies(16): >>45120249 #>>45120259 #>>45120415 #>>45120573 #>>45120628 #>>45121159 #>>45121215 #>>45122702 #>>45122805 #>>45123808 #>>45123989 #>>45125478 #>>45125935 #>>45129038 #>>45130942 #>>45131644 #

Certhas ◴[03 Sep 25 20:48 UTC] No.45120259[source]▶

>>45120049 #

I don't understand what point you're hinting at.

Either way, I can get arbitrarily good approximations of arbitrary nonlinear differential/difference equations using only linear probabilistic evolution at the cost of a (much) larger state space. So if you can implement it in a brain or a computer, there is a sufficiently large probabilistic dynamic that can model it. More really is different.

So I view all deductive ab-initio arguments about what LLMs can/can't do due to their architecture as fairly baseless.

(Note that the "large" here is doing a lot of heavy lifting. You need _really_ large. See https://en.m.wikipedia.org/wiki/Transfer_operator)

replies(5): >>45120313 #>>45120341 #>>45120344 #>>45123837 #>>45124441 #

1. patrick451 ◴[04 Sep 25 05:19 UTC] No.45123837[source]▶

>>45120259 #

> Either way, I can get arbitrarily good approximations of arbitrary nonlinear differential/difference equations using only linear probabilistic evolution at the cost of a (much) larger state space.

This is impossible. When driven by a sinusoid, a linear system will only ever output a sinusoid with exactly the same frequency but a different amplitude and phase regardless of how many states you give it. A non-linear system can change the frequency or output multiple frequencies.

replies(1): >>45124025 #

2. diffeomorphism ◴[04 Sep 25 05:58 UTC] No.45124025[source]▶

>>45123837 (TP) #

As far as I understand, the terminology says "linear" but means compositions of affine (with cutoffs etc). That gives you arbitrary polynomials and piecewise affine, which are dense in most classes of interest.

Of course, in practice you don't actually get arbitrary degree polynomials but some finite degree, so the approximation might still be quite bad or inefficient.

↑