The wall confronting large language models

(arxiv.org)

170 points PaulHoule | 1 comments | 03 Sep 25 11:40 UTC | HN request time: 0s | source

Show context

measurablefunc ◴[03 Sep 25 20:29 UTC] No.45120049[source]▶

There is a formal extensional equivalence between Markov chains & LLMs but the only person who seems to be saying anything about this is Gary Marcus. He is constantly making the point that symbolic understanding can not be reduced to a probabilistic computation regardless of how large the graph gets it will still be missing basic stuff like backtracking (which is available in programming languages like Prolog). I think that Gary is right on basically all counts. Probabilistic generative models are fun but no amount of probabilistic sequence generation can be a substitute for logical reasoning.

replies(16): >>45120249 #>>45120259 #>>45120415 #>>45120573 #>>45120628 #>>45121159 #>>45121215 #>>45122702 #>>45122805 #>>45123808 #>>45123989 #>>45125478 #>>45125935 #>>45129038 #>>45130942 #>>45131644 #

logicchains ◴[03 Sep 25 21:06 UTC] No.45120415[source]▶

>>45120049 #

LLMs are not formally equivalent to Markov chains, they're more powerful; transformers with sufficient chain of thought can solve any problem in P: https://arxiv.org/abs/2310.07923.

replies(1): >>45120484 #

measurablefunc ◴[03 Sep 25 21:15 UTC] No.45120484[source]▶

>>45120415 #

If you think there is a mistake in this argument then I'd like to know where it is: https://markov.dk.workers.dev/.

replies(2): >>45122838 #>>45124408 #

logicchains ◴[04 Sep 25 07:03 UTC] No.45124408{3}[source]▶

>>45120484 #

It assumes the LLM only runs once, i.e. it doesn't account for chain of thought, which makes the program not memoryless.

replies(1): >>45124467 #

1. measurablefunc ◴[04 Sep 25 07:12 UTC] No.45124467{4}[source]▶

>>45124408 #

There is no such assumption. You can run/sample the LLM & the equivalent Markov chain as many times as you want & the logical analysis remains the same b/c the extensional equivalence between the LLM & Markov chain has nothing to do w/ how many times the trajectories are sampled from each one.

↑