Either way, I can get arbitrarily good approximations of arbitrary nonlinear differential/difference equations using only linear probabilistic evolution at the cost of a (much) larger state space. So if you can implement it in a brain or a computer, there is a sufficiently large probabilistic dynamic that can model it. More really is different.
So I view all deductive ab-initio arguments about what LLMs can/can't do due to their architecture as fairly baseless.
(Note that the "large" here is doing a lot of heavy lifting. You need _really_ large. See https://en.m.wikipedia.org/wiki/Transfer_operator)
If you think there is a threshold at which point some large enough feedforward network develops the capability to backtrack then I'd like to see your argument for it.
Of course this would be pointless, but it demonstrates that a system where an LLM provides the logic can backtrack, as there's nothing computationally special about backtracking.
That current UIs to LLMs are set up for conversation-style use that makes this harder isn't an inherent limitation of what we can do with LLMs.
In other words, this claim is categorically false:
> Logical reasoning as it is implemented in Prolog interpreters is not something that can be done w/ LLMs regardless of the size of their weights, biases, & activation functions between the nodes in the graph.
What is limiting "just" an LLM is not the ability of the model to encode reasoning, but the lack of a minimal and trivial runtime scaffolding to let it use it's capabilities.
Where is the demonstration?