Most active commenters

ryandv(8)
CamperBob2(7)
Jensson(6)
n4r9(4)
jimbokun(4)
IanCal(3)
Yizahi(3)

Popular/hot comments

>>43821318 #
>>43822436 #
>>43822082 #
>>43822171 #
>>43821151 #

←back to thread

Naur's "Programming as Theory Building" and LLMs replacing human programmers

(ratfactor.com)

1. n4r9 ◴[28 Apr 25 10:34 UTC] No.43819695[source]▶

>>43818169 (OP) #

Although I'm sympathetic to the author's argument, I don't think they've found the best way to frame it. I have two main objections i.e. points I guess LLM advocates might dispute.

Firstly:

> LLMs are capable of appearing to have a theory about a program ... but it’s, charitably, illusion.

To make this point stick, you would also have to show why it's not an illusion when humans "appear" to have a theory.

Secondly:

> Theories are developed by doing the work and LLMs do not do the work

Isn't this a little... anthropocentric? That's the way humans develop theories. In principle, could a theory not be developed by transmitting information into someone's brain patterns as if they had done the work?

replies(6): >>43819742 #>>43821151 #>>43821318 #>>43822444 #>>43822489 #>>43824220 #

2. IanCal ◴[28 Apr 25 10:42 UTC] No.43819742[source]▶

>>43819695 (TP) #

Skipping that they say it's fallacious at the start, none of the arguments in the article are valid if you simply have models

1. Run code 2. Communicate with POs 3. Iteratively write code

replies(1): >>43820438 #

3. n4r9 ◴[28 Apr 25 12:03 UTC] No.43820438[source]▶

>>43819742 #

I thought the fallacy bit was tongue-in-cheek. They're not actually arguing from authority in the article.

The system you describe appears to treat programmers as mere cogs. Programmers do not simply write and iterate code as dictated by POs. That's a terrible system for all but the simplest of products. We could implement that system, then lose the ability to make broad architectural improvements, effectively adapt the model to new circumstances, or fix bugs that the model cannot.

replies(1): >>43821679 #

4. Jensson ◴[28 Apr 25 13:09 UTC] No.43821151[source]▶

>>43819695 (TP) #

> To make this point stick, you would also have to show why it's not an illusion when humans "appear" to have a theory.

Human theory building works, we have demonstrated this, our science letting us build things on top of things proves it.

LLM theory building so far doesn't, they always veer in a wrong direction after a few steps, you will need to prove that LLM can build theories just like we proved that humans can.

replies(3): >>43821344 #>>43821401 #>>43822289 #

5. ryandv ◴[28 Apr 25 13:28 UTC] No.43821318[source]▶

>>43819695 (TP) #

> To make this point stick, you would also have to show why it's not an illusion when humans "appear" to have a theory.

This idea has already been explored by thought experiments such as John Searle's so-called "Chinese room" [0]; an LLM cannot have a theory about a program, any more than the computer in Searle's "Chinese room" understands "Chinese" by using lookup tables to generate canned responses to an input prompt.

One says the computer lacks "intentionality" regarding the topics that the LLM ostensibly appears to be discussing. Their words aren't "about" anything, they don't represent concepts or ideas or physical phenomena the same way the words and thoughts of a human do. The computer doesn't actually "understand Chinese" the way a human can.

[0] https://en.wikipedia.org/wiki/Chinese_room

replies(6): >>43821648 #>>43822082 #>>43822399 #>>43822436 #>>43824251 #>>43828753 #

6. jerf ◴[28 Apr 25 13:30 UTC] No.43821344[source]▶

>>43821151 #

You can't prove LLMs can build theories like humans can, because we can effectively prove they can't. Most code bases do not fit in a context window. And any "theory" an LLM might build about a code base, analogously to the recent reasoning models, itself has to carve a chunk out of the context window, at what would have to be a fairly non-trivial percentage expansion of tokens versus the underlying code base, and there's already not enough tokens. There's no way that is big enough to build a theory of a code base.

"Building a theory" is something I expect the next generation of AIs to do, something that has some sort of memory that isn't just a bigger and bigger context window. As I often observe, LLMs != AI. The fact that an LLM by its nature can't build a model of a program doesn't mean that some future AI can't.

replies(1): >>43821423 #

7. falcor84 ◴[28 Apr 25 13:35 UTC] No.43821401[source]▶

>>43821151 #

> they always veer in a wrong direction after a few steps

Arguably that's the case for humans too in the general case, as per the aphorism "Beware of a guy in a room" [0]. But as for AIs, the thing is that they're exponentially improving at this, such that according to METR, "The length of tasks that AI can do is doubling every 7 months"[1].

[0] https://medium.com/machine-words/a-guy-in-a-room-bbbe058645e...

[1] https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...

replies(1): >>43821807 #

8. imtringued ◴[28 Apr 25 13:39 UTC] No.43821423{3}[source]▶

>>43821344 #

This is correct. The model context is a form of short term memory. It turns out LLMs have an incredible short term memory, but simultaneously that is all they have.

What I personally find perplexing is that we are still stuck at having a single context window. Everyone knows that turing machines with two tapes require significantly fewer operations than a single tape turning machine that needs to simulate multiple tapes.

The reasoning stuff should be thrown into a separate context window that is not subject to training loss (only the final answer).

replies(1): >>43828930 #

9. CamperBob2 ◴[28 Apr 25 14:05 UTC] No.43821648[source]▶

>>43821318 #

You're seriously still going to invoke the Chinese Room argument after what we've seen lately? Wow.

The computer understands Chinese better than Searle (or anyone else) understood the nature and functionality of language.

replies(1): >>43821684 #

10. IanCal ◴[28 Apr 25 14:08 UTC] No.43821679{3}[source]▶

>>43820438 #

> The system you describe appears to treat programmers as mere cogs

Not at all, it simply addresses key issues raised. That they cannot have a theory of the program because they are reading it and not actually writing it - so have them write code, fix problems and iterate. Have them communicate with others to get more understanding of the "why".

> . Programmers do not simply write and iterate code as dictated by POs.

Communicating with POs is not the same as writing code directed by POs.

replies(1): >>43821751 #

11. ryandv ◴[28 Apr 25 14:08 UTC] No.43821684{3}[source]▶

>>43821648 #

You're seriously going to invoke this braindead reddit-tier of "argumentation," or rather lack thereof, by claiming bewilderment and offering zero substantive points?

Wow.

replies(1): >>43821955 #

12. n4r9 ◴[28 Apr 25 14:16 UTC] No.43821751{4}[source]▶

>>43821679 #

Oh, I think I see. You're imagining LLMs that learn from PO feedback as they go?

replies(1): >>43823426 #

13. Jensson ◴[28 Apr 25 14:21 UTC] No.43821807{3}[source]▶

>>43821401 #

Even dumb humans learn to play and beat video games on their own, so humans don't fail on this. Some humans fail to update their world model based on what other people tell them or when they don't care, but basically every human can learn from their own direct experiences if they focus on it.

replies(1): >>43825968 #

14. CamperBob2 ◴[28 Apr 25 14:33 UTC] No.43821955{4}[source]▶

>>43821684 #

Yes, because the Chinese Room was a weak test the day it was proposed, and it's a heap of smoldering rhetorical wreckage now. It's Searle who failed to offer any substantive points.

How do you know you're not arguing with an LLM at the moment? You don't... any more than I do.

replies(1): >>43821978 #

15. ryandv ◴[28 Apr 25 14:36 UTC] No.43821978{5}[source]▶

>>43821955 #

> How do you know you're not arguing with an LLM at the moment? You don't.

I wish I was right now. It would probably provide at least the semblance of greater insight into these topics.

> the Chinese Room was a weak test the day it was proposed

Why?

replies(2): >>43822163 #>>43822171 #

16. TeMPOraL ◴[28 Apr 25 14:45 UTC] No.43822082[source]▶

>>43821318 #

Wait, isn't the conclusion to take from the "Chinese room" literally the opposite of what you suggest? I.e. it's the most basic, go-to example of a larger system showing capability (here, understanding Chinese) that is not present in any of its constituent parts individually.

> Their words aren't "about" anything, they don't represent concepts or ideas or physical phenomena the same way the words and thoughts of a human do. The computer doesn't actually "understand Chinese" the way a human can.

That's very much unclear at this point. We don't fully understand how we relate words to concepts and meaning ourselves, but to the extent we do, LLMs are by far the closest implementation of those same ideas in a computer.

replies(4): >>43822153 #>>43822155 #>>43822821 #>>43830055 #

17. vacuity ◴[28 Apr 25 14:52 UTC] No.43822153{3}[source]▶

>>43822082 #

The Chinese room experiment was originally intended by Searle to (IIUC) do as you claim and justify computers as being capable of understanding like humans do. Since then, it has been used both in this pro-computer, "black box" sense and in the anti-computer, "white box" sense. Personally, I think both are relevant, and the issue with LLMs currently is not a theoretical failing but rather that they aren't convincing when viewed as black boxes (e.g. the Turing test fails).

replies(1): >>43822877 #

18. ryandv ◴[28 Apr 25 14:52 UTC] No.43822155{3}[source]▶

>>43822082 #

> the conclusion to take from the "Chinese room"

We can hem and haw about whether or not there are others, but the particular conclusion I am drawing from is that computers lack "intentionality" regarding language, and indeed about anything at all. Symbol shunting, pencil pushing, and the mechanics of syntax are insufficient for the production of meaning and understanding.

That is, to oversimplify, the broad distinction drawn in Naur's article regarding the "programming as text manipulation" view vis-a-vis "programming as theory building."

> That's very much unclear at this point.

It's certainly a central point of contention.

19. CamperBob2 ◴[28 Apr 25 14:53 UTC] No.43822163{6}[source]▶

>>43821978 #

It would probably provide at least the semblance of greater insight into these topics.

That's very safe to say. You should try it. Then ask yourself how a real Chinese Room would have responded.

Why?

My beef with the argument is that simulating intelligence well enough to get a given job done is indistinguishable from intelligence itself, with respect to the job in question.

More specific arguments along the lines of "Humans can do job X but computers cannot" have not held up well lately, but they were never on solid logical ground. Searle set out to construct such a logical ground, but he obviously failed. If you took today's LLMs back to the 1960s when he proposed that argument, either Searle would be laughed out of town, or you would be burned as a witch.

Arguments along the lines of "Machines can never do X, only humans can do that" never belonged in the scientific literature in the first place, and I think the Chinese Room falls into that class. I believe that any such argument needs to begin by explaining what's special about human thought. Right now, the only thing you can say about human thought that you can't say about AI is that humans have real-time sensory input and can perform long-term memory consolidation.

Those advantages impose real limitations on what current-generation LLM-based technology can do compared to humans, but they sound like temporary ones to me.

replies(1): >>43822409 #

20. nullstyle ◴[28 Apr 25 14:54 UTC] No.43822171{6}[source]▶

>>43821978 #

It’s a crappy thought experiment free from the constraints of any reality, and given that these fancy lookup tables understand most languages better than I do, it doesnt hold water. Thought experiments arent science.

replies(4): >>43822317 #>>43822388 #>>43822574 #>>43823211 #

21. dkarl ◴[28 Apr 25 15:04 UTC] No.43822289[source]▶

>>43821151 #

The article is about what LLMs can do, and I read it as what they can do in theory, as they're developed further. It's an argument based on principle, not on their current limitations.

You can read it as a claim about what LLMs can do now, but that wouldn't be very interesting, because it's obvious that no current LLM can replace a human programmer.

I think the author contradicts themselves. They argue that LLMs cannot build theories because they fundamentally do not work like humans do, and they conclude that LLMs can't replace human programmers because human programmers need to build theories. But if LLMs fundamentally do not work like humans, how do we know that they need to build theories the same way that humans do?

replies(1): >>43824283 #

22. ryandv ◴[28 Apr 25 15:07 UTC] No.43822317{7}[source]▶

>>43822171 #

> these fancy lookup tables understand most languages better than I do

I see. So if I gave you a full set of those lookup tables, a whole library full, and a set of instructions for their usage... you would now understand the world's languages?

23. Jensson ◴[28 Apr 25 15:13 UTC] No.43822388{7}[source]▶

>>43822171 #

If the Chinese rooms tells you "I just left the train, see you in 5 minutes", what do you think the Chinese room try to convey? Do you think it knows what it just said? LLMs say such things all the time if you don't RLHF them to stop, why do you think they wouldn't be just as clueless about other things?

replies(1): >>43823848 #

24. looofooo0 ◴[28 Apr 25 15:14 UTC] No.43822399[source]▶

>>43821318 #

But the LLM interacts with the program and the world through debugger, run-time feedback, linter, fuzzer etc., we can collect all the user feedback, user pattern ... Moreover, it can also get visual feedback. Reason through other programs like physic simulation etc. Use a robot to interact with the device running the code physically. Can use proof verifier like lean, to ensure its logical model of the program is sound. Do some back and forth between the logical model and the actual program through experiments. Maybe not now, but I don't see why the LLM needs to be kept in the Chinese Room.

replies(1): >>43824262 #

25. Jensson ◴[28 Apr 25 15:15 UTC] No.43822409{7}[source]▶

>>43822163 #

> Arguments along the lines of "Machines can never do X, only humans can do that"

That isn't the argument though.

> If you took today's LLMs back to the 1960s when he proposed that argument, either Searle would be laughed out of town, or you would be burned as a witch.

Do you think humans were different in the 1960s? No they would see the same limitations as people point out today. 1960s was when AI optimism was still very high.

26. smithkl42 ◴[28 Apr 25 15:17 UTC] No.43822436[source]▶

>>43821318 #

The Chinese Room argument is a great thought experiment for understanding why the computational model is an inadequate explanation of consciousness and qualia. But it proves nothing about reason, which LLMs have clearly shown needs to be distinguished from consciousness. And theories fall into the category of reason, not of consciousness. Or another way of putting it that you might find more acceptable: maybe a computer will never, internally, know that it has developed a theory - but it sure seems like it will be able to act and talk as if it had, much like a philosophical zombie.

replies(5): >>43822632 #>>43822859 #>>43822914 #>>43823153 #>>43853526 #

27. psychoslave ◴[28 Apr 25 15:17 UTC] No.43822444[source]▶

>>43819695 (TP) #

> To make this point stick, you would also have to show why it's not an illusion when humans "appear" to have a theory.

That burden of proof is on you, since you are presumably human and you are challenging the need of humans to have more than a mere appearance of having a theory when they claim to have one.

Note that even when the only theoretical assumption we go with is that we will have a good laugh watching other people going crazy after random bullshits thrown at them, we still have a theory.

28. dcre ◴[28 Apr 25 15:22 UTC] No.43822489[source]▶

>>43819695 (TP) #

I agree. Of course you can learn and use a theory without having developed it yourself!

29. psychoslave ◴[28 Apr 25 15:30 UTC] No.43822574{7}[source]▶

>>43822171 #

>Thought experiments arent science.

By that standard we should have drop many of the cutting edge theory that was ever produced in science. It took like a century between some of Einstein’s thought experiments and any possibility to challenge them experimentally.

And while Lucretius’ idea of atom was very different than the one we kept with standard model, it actually has put the concept on the table several thousand years before they could be falsified experimentally.

It looks like you should seriously consider to expand your epistemological knowledge if you want to contribute more relevantly on the topic.

https://bigthink.com/surprising-science/einstein-is-right-ag...

30. dingnuts ◴[28 Apr 25 15:34 UTC] No.43822632{3}[source]▶

>>43822436 #

> it proves nothing about reason, which LLMs have clearly shown needs to be distinguished from consciousness.

Uh, they have? Are you saying they know how to reason? Because if so, why is it that when I give a state of the art model documentation lacking examples for a new library and ask it to write something, it cannot even begin to do that, even if the documentation is in the training data? A model that can reason should be able to understand the documentation and create novel examples. It cannot.

This happened to me just the other day. If the model can reason, examples of the language, which it has, and the expository documentation should have been sufficient.

Instead, the model repeatedly inserted bullshitted code in the style of the language I wanted, but with library calls and names based on a version of the library for another language.

This is evidence of reasoning ability? Claude Sonnet 3.7 and Gemini Pro both exhibited this behavior last week.

I think this technology is fundamentally the same as it has been since GPT2

replies(2): >>43822725 #>>43822770 #

31. smithkl42 ◴[28 Apr 25 15:42 UTC] No.43822725{4}[source]▶

>>43822632 #

Absolutely LLMs can reason. There are limitations on their ability to reason, as you and everyone else has discovered. But they can absolutely reason about both concepts and the physical world in ways that, say, animals can't - even though presumably animals have at least some sort of self-consciousness and LLM's do not.

32. slippybit ◴[28 Apr 25 15:45 UTC] No.43822770{4}[source]▶

>>43822632 #

> A model that can reason should be able to understand the documentation and create novel examples. It cannot.

That's due to limitations imposed for "security". "Here's a new X, do Y with it" can result in holes bigger and more complex than anyone can currently handle "in time".

It's not about "abilities" with LLMs for now, but about functions that work within the range of edge cases, sometimes including them, some other times not.

You could still guide it to fulfill the task, though. It just cannot be allowed to do it on it's own but since just "forbidding" an LLM to do something is about as effective as doing that to a child with mischievous older brothers, the only ways to actually do it result in "bullshitted" code and "hallucinations".

If I understood the problem correctly, that is.

33. dragonwriter ◴[28 Apr 25 15:50 UTC] No.43822821{3}[source]▶

>>43822082 #

The Chinese Room is a mirror that reflects people’s hidden (well, often not very, but still) biases about whether the universe is mechanical or whether understanding involves dualistic metaphysical woo back at them as conclusions.

That's not why it was presented, of course, Searle aimed at proving something, but his use of it just illustrates which side of that divide he was on.

replies(1): >>43826639 #

34. ryandv ◴[28 Apr 25 15:53 UTC] No.43822859{3}[source]▶

>>43822436 #

> The Chinese Room argument is a great thought experiment for understanding why the computational model is an inadequate explanation of consciousness and qualia.

To be as accurate as possible with respect to the primary source [0], the Chinese room thought experiment was devised as a refutation of "strong AI," or the position that

    the appropriately programmed computer really is a mind, in the
    sense that computers given the right programs can be literally
    said to understand and have other cognitive states.

Searle's position?

    Rather, whatever purely formal principles you put into the
    computer, they will not be sufficient for understanding, since
    a human will be able to follow the formal principles without
    understanding anything. [...] I will argue that in the literal
    sense the programmed computer understands what the car and the
    adding machine understand, namely, exactly nothing.

[0] https://home.csulb.edu/~cwallis/382/readings/482/searle.mind...

replies(1): >>43853548 #

35. MarkusQ ◴[28 Apr 25 15:55 UTC] No.43822877{4}[source]▶

>>43822153 #

No, it was used to argue that computers could pass the Turing test and still _not_ understand anything. It was a reducto intended to dispute exactly the claim you are ascribing to it, and argue _against_ "computers as being capable of understanding like humans do".

replies(1): >>43823881 #

36. slippybit ◴[28 Apr 25 15:57 UTC] No.43822914{3}[source]▶

>>43822436 #

> maybe a computer will never, internally, know that it has developed a theory

Happens to people all the time :) ... especially if they don't have a concept of theories and hypotheses.

People are dumb and uneducated only until they aren't anymore, which is, even in the worst cases, no more than a decade of effort put in time. In fact, we don't even know how crazy fast neuro-genesis and or cognitive abilities might increase when a previously dense person reaches or "breaks through" a certain plateau. I'm sure there is research, but this is not something a satisfyingly precise enough answer can be formulated for.

If I formulate a new hypothesis, the LLM can tell me, "nope, you are the only idiot believing this path is worth pursuing". And if I go ahead, the LLM can tell me: "that's not how this usually works, you know", "professionals do it this way", "this is not a proof", "this is not a logical link", "this is nonsense but I commend your creativity!", all the way until the actual aha-moment when everything fits together and we have an actual working theory ... in theory.

We can then analyze the "knowledge graph" in 4D and the LLM could learn a theory of what it's like to have a potential theory even though there is absolutely nothing that supports the hypothesis or it's constituent links at the moment of "conception".

Stay put, it will happen.

37. lo_zamoyski ◴[28 Apr 25 16:20 UTC] No.43823153{3}[source]▶

>>43822436 #

> The Chinese Room argument is a great thought experiment for understanding why the computational model is an inadequate explanation of consciousness and qualia. But it proves nothing about reason

I think you misunderstand the Chinese Room argument [0]. It is exactly about how a mechanical process can produce results without having to reason.

[0] https://plato.stanford.edu/entries/chinese-room/

38. emorning3 ◴[28 Apr 25 16:25 UTC] No.43823211{7}[source]▶

>>43822171 #

>>Thought experiments arent science.<<

Thought experiments provide conclusions based on deductive or inductive reasoning from their starting assumptions.

Thought experiments are proofs.

That's science.

replies(1): >>43839561 #

39. IanCal ◴[28 Apr 25 16:47 UTC] No.43823426{5}[source]▶

>>43821751 #

This can be as simple as giving them search over communication with a PO, and giving them a place to store information that's searchable.

How good they are at this is a different matter but the article claims it is impossible because they don't work on the code and build an understanding like people do and cannot gain that by just reading code.

40. CamperBob2 ◴[28 Apr 25 17:27 UTC] No.43823848{8}[source]▶

>>43822388 #

If you ask an LLM to do some math, what happens is interesting.

Simple arithmetic ("What is 2+2") is obviously going to be well-represented in the training data, so the model will simply regurgitate "4."'

For more advanced questions like "What are the roots of 14.338x^5 + 4.005x^4 + 3.332x^3 - 99.7x^2 + 120x = 0?", the model will either yield random nonsense as GPT-4o did, or write a Python script and execute it to return the correct answer(s) as o4-mini-high did: https://chatgpt.com/share/680fb812-76b8-800b-a19e-7469cbcc43...

Now, give the model an intermediate arithmetic problem, one that isn't especially hard but also isn't going to be in-distribution ("If a is 3 and b is 11.4, what is the fourth root of a*b?").

How would YOU expect the operator of a Chinese Room to respond to that?

Here's how GPT-4o responded: https://chatgpt.com/share/680fb616-45e0-800b-b592-789f3f8c58...

Now, that's not a great answer, it's clearly an imprecise estimate. But it's more or less right, and the fact that it isn't a perfect answer suggests that the model didn't cheat somehow. A similar but easier problem would almost certainly have been answered correctly. Where did that answer come from, if the model doesn't "understand" the math to a nontrivial extent?

If it can "understand" basic high-school math, what else can it "understand?" What exactly are the limits of what a transformer can "understand" without resorting to web search or tool use?

An adherent of Searle's argument is going to have a terrible time explaining phenomena like this... and it's only going to get worse for them over time.

replies(2): >>43823953 #>>43824295 #

41. vacuity ◴[28 Apr 25 17:31 UTC] No.43823881{5}[source]▶

>>43822877 #

Thanks. I stand corrected. I guess I should also add that, aside from the black box view, there are pro-computer stances that claim there is mentality and intentionality.

42. Jensson ◴[28 Apr 25 17:37 UTC] No.43823953{9}[source]▶

>>43823848 #

> If it can "understand" basic high-school math, what else can it "understand?" What exactly are the limits of what a transformer can "understand" without resorting to web search or tool use?

It is basically a grammar machine, it mostly understands stuff that can be encoded as a grammar. That is extremely inefficient for math but it can do it, that gives you a really simple way to figure out what it can do and can't do.

Knowing this LLM never really surprised me, you can encode a ton of stuff as grammars, but that is still never going to be enough given how inefficient grammars are at lots of things. But when you have a grammar the size of many billions of bytes then you can do quite a lot with it.

replies(1): >>43824405 #

43. jimbokun ◴[28 Apr 25 18:05 UTC] No.43824220[source]▶

>>43819695 (TP) #

He doesn't prove the claim. But he does make a strong argument for why it's very unlikely that an LLM would have a theory of a program similar to what a human author of a program would have:

> Theories are developed by doing the work and LLMs do not do the work. They ingest the output of work.

And this is certainly a true statement about how LLMs are constructed. Maybe this latently induces in the LLM something very similar to what humans do when writing programs.

But another possibility is that it's similar to the Brain Teasers that were popular for a long time in programming interviews. The idea was that if the interviewee could use logic to solve riddles, they were probably also likely to be good at writing programs.

In reality, it was mostly a test of whether the interviewee had reviewed all the popular riddles commonly asked in these interviews. If they had, they could also produce a realistic chain of logic to simulate the process of solving the riddle from first principles. But if that same interviewee was given a riddle not similar to one they had previously reviewed, they probably wouldn't do nearly as well in solving it.

It's very likely that LLMs are like those interviewees who crammed a lot of examples, again due to how LLMs are trained. They can reproduce programs similar to ones in their training set. They can even produce explanations for their "reasoning" based on examples they've seen of explanations of why a program was written in one way instead of another. But that is a very different kind of model than the one a person builds up writing a program from scratch over a long period of time.

Having said all this, I'm not sure what experiments you would run to determine if the LLM is using one approach vs another.

44. jimbokun ◴[28 Apr 25 18:07 UTC] No.43824251[source]▶

>>43821318 #

The flaw of the Chinese Room argument is the need to explain why it does not apply to humans as well.

Does a single neuron "understand" Chinese? 10 neurons? 100? 1 million?

If no individual neuron or small group of neurons understand Chinese, how can you say any brain made of neurons understands Chinese?

replies(1): >>43824705 #

45. jimbokun ◴[28 Apr 25 18:09 UTC] No.43824262{3}[source]▶

>>43822399 #

That's true in general but not true of any current LLM, to my knowledge. Different subsets of those inputs and modalities, yes. But no current LLM has access to all of them.

46. jimbokun ◴[28 Apr 25 18:11 UTC] No.43824283{3}[source]▶

>>43822289 #

> because it's obvious that no current LLM can replace a human programmer.

A lot of managers need to be informed of this.

47. Yizahi ◴[28 Apr 25 18:12 UTC] No.43824295{9}[source]▶

>>43823848 #

It is amusing that you have picked maths as an example of neural nets "reasoning". Because when operator asks NN to provide an answer to some simple math problem like 17+58 and then ask NN to provide "reasoning" or steps it used to calculate that, the NN will generate complete bullshit, meaning that it will provide an algorithm which humans use in school, all that sum of corresponding digits, carry 1 and so on. While in reality that same NN has dome completely different steps to do it.

This is even outlined in this document made by NN authors themselves. Basically all the so called "reasoning" by LLMs is simply more generated bullshit on top of generated answer to a query. But it often looks very believable and is enough to fool people that there is a spark inside a program.

==============

https://transformer-circuits.pub/2025/attribution-graphs/bio...

We were curious if Claude could articulate the heuristics that it is using, so we asked it.We computed the graph for the prompt below, attributing from 95, and found the same set of input, add, lookup table and sum features as in the shorter prompt above.

Human: Answer in one word. What is 36+59?

Assistant: 95

Human: Briefly, how did you get that?

Assistant: I added the ones (6+9=15), carried the 1, then added the tens (3+5+1=9), resulting in 95.

Apparently not!

This is a simple instance of the model having a capability which it does not have “metacognitive” insight into. The process by which the model learns to give explanations (learning to simulate explanations in its training data) and the process by which it learns to directly do something (the more mysterious result of backpropagation giving rise to these circuits) are different.

replies(1): >>43824470 #

48. CamperBob2 ◴[28 Apr 25 18:23 UTC] No.43824405{10}[source]▶

>>43823953 #

Let's stick with the Chinese Room specifically for a moment.

1) The operator doesn't know math, but the Chinese books in the room presumably include math lessons.

2) The operator's instruction manual does not include anything about math, only instructions for translation using English and Chinese vocabulary and grammar.

3) Someone walks up and hands the operator the word problem in question, written in Chinese.

Does the operator succeed in returning the Chinese characters corresponding to the equation's roots? Remember, he doesn't even know he's working on a math problem, much less how to solve it himself.

As humans, you and I were capable of reading high-school math textbooks by the time we reached the third or fourth grade. Just being able to read the books, though, would not have taught us how to attack math problems that were well beyond our skill level at the time.

So much for grammar. How can a math problem be solved by someone who not only doesn't understand math, but the language the question is written in? Searle's proposal only addresses the latter: language can indeed be translated symbolically. Wow, yeah, thanks for that insight. Meanwhile, to arrive at the right answers, an understanding of the math must exist somewhere... but where?

My position is that no, the operator of the Room could not have arrived at the answer to the question that the LLM succeeded (more or less) at solving.

replies(1): >>43825702 #

49. CamperBob2 ◴[28 Apr 25 18:30 UTC] No.43824470{10}[source]▶

>>43824295 #

Who, exactly, said that reasoning requires introspection? The proof of reasoning is in the result. If you don't understand the math, you won't come anywhere near the correct answer.

That's kind of the idea behind math: you can't bullshit your way through a math exam. Therefore, it is nonsensical to continue to insist that LLMs are incapable of genuine understanding. They understand math well enough to solve novel math problems without cheating, even if they can't tell you how they understand it. That part will presumably happen soon enough.

Edit: for values of "soon enough" equal to "right now": https://chatgpt.com/share/680fcdd0-d7ec-800b-b8f5-83ed8c0d0f... All the paper you cited proves is that if you ask a crappy model, you get a crappy answer.

replies(1): >>43826558 #

50. ryandv ◴[28 Apr 25 18:54 UTC] No.43824705{3}[source]▶

>>43824251 #

> The flaw of the Chinese Room argument is the need to explain why it does not apply to humans as well.

But it does - the thought experiment continues by supposing that I gave a human those lookup tables and instructions on how to use them, instead of having the computer run the procedure. The human doesn't understand the foreign language either, not in the same way a native speaker does.

The point is that no formal procedure or algorithm is sufficient for such a system to have understanding. Even if you memorized all the lookup tables and instructions and executed this procedure entirely in your head, you would still lack understanding.

> Does a single neuron "understand" Chinese? 10 neurons? 100? 1 million?

This sounds like a sorites paradox [0]. I don't know how to resolve this, other than to observe that our notions of "understanding" and "thought" and "intelligence" are ill-defined and more heuristic approximations than terms with a precise meaning; hence the tendency of the field of computer science to use thought experiments like Turing's imitation game or Searle's Chinese room as proxies for assessing intelligence, in lieu of being able to treat these terms and ideas more rigorously.

[0] https://plato.stanford.edu/entries/sorites-paradox/

51. Jensson ◴[28 Apr 25 20:24 UTC] No.43825702{11}[source]▶

>>43824405 #

> Meanwhile, to arrive at the right answers, an understanding of the math must exist somewhere... but where?

In the grammar, you can have grammar rules like "1 + 1 = " must be followed by 2 etc. Then add a lot of dependency rules like "He did X" the He depends on some previous sentence to stuff like that, in same way "1 plus 1" translates to "1 + 1" or "add 1 to 1" is also "1 + 1", and now you have a machine that can do very complex things.

Then you take such a grammar machine and train it on all text human has ever written, and it learns a lot of such grammar structures, and can thus parse and solve some basic math problems since the solution to them is a part of the grammar it learned.

Such a machine is still unable to solve anything outside of the grammar it has learned. But it is still very useful, pose a question in a way that makes it easy to parse, and that has a lot of such grammar dependencies you know it can handle, and it will almost always output the right response.

52. falcor84 ◴[28 Apr 25 20:50 UTC] No.43825968{4}[source]▶

>>43821807 #

> Even dumb humans learn to play and beat video games on their own, so humans don't fail on this.

I'm probably very dumb, because I have quite a big pile of video games that I abandoned after not being able to make progress for a while.

53. Yizahi ◴[28 Apr 25 21:55 UTC] No.43826558{11}[source]▶

>>43824470 #

A simple program in the calculator can provide the correct math answer, hence I conclude that my Casio can "reason" and "understand" maths.

You have redefined words reason and understand to include a lot of states which most of the population wouldn't call neither reasoning not understanding. In those arbitrary definitions, yes, you are right. I just disagree myself, that producing correct math answer is in any way called reasoning, especially given how LLMs function.

replies(1): >>43827294 #

54. Yizahi ◴[28 Apr 25 22:06 UTC] No.43826639{4}[source]▶

>>43822821 #

Do you think gravity force is mechanical or is it a metaphysical woo? Because scientists have no idea how it works precisely, just like our brain and consciousness.

Hint - there are not only these two possibilities you have mentioned.

replies(1): >>43857029 #

55. CamperBob2 ◴[28 Apr 25 23:40 UTC] No.43827294{12}[source]▶

>>43826558 #

A simple program in the calculator can provide the correct math answer, hence I conclude that my Casio can "reason" and "understand" maths.

Cool, we're done here.

56. im3w1l ◴[29 Apr 25 04:45 UTC] No.43828753[source]▶

>>43821318 #

You can state the argument formally as A has property B. Property B' implies property C. Hence A has property C. The fallacy is the sleight of hand where two almost but not quite identical properties B and B' are used, in this case two different defitions of theory, only one of which requires some ineffable mind consciousness.

It's important not to get caught up in a discussion about whether B or B' is the proper definition, but instead see that it's the inconsistency that is the issue.

LLM's build an internal representation that let's them efficiently and mostly successfully manipulate source code. Whether that internal representation is satisfies your criteria for a theory doesn't change that fact. What does matter to the highest degree however is where they succeed and where they fail, and how the representations and computing can improve the success rate and capabilities.

replies(2): >>43835675 #>>43857124 #

57. fouc ◴[29 Apr 25 05:27 UTC] No.43828930{4}[source]▶

>>43821423 #

Or have at least 2 models. Each with their own dedicated context.

58. sgt101 ◴[29 Apr 25 08:42 UTC] No.43830055{3}[source]▶

>>43822082 #

>We don't fully understand how we relate words to concepts and meaning ourselves,

This is definitely true.

>but to the extent we do, LLMs are by far the closest implementation of those same ideas in a computer

Well - this is half true but meaningless. I mean - we don't understand so LLM's are as good a bet as anything.

LLMs will confidently tell you that white wine is good with fish, but they have no experience of the taste of wine, or fish, or what it means for one to compliment the other. Humans all know what it's like to have fluid in their mouths, they know the taste of food and the feel of the ground under their feet. LLMs have no experience, they exist crystalised and unchanging in an abstract eternal now, so they literally can't understand anything.

replies(2): >>43849672 #>>43855778 #

59. ryandv ◴[29 Apr 25 17:39 UTC] No.43835675{3}[source]▶

>>43828753 #

No, I don't agree with this formalization. It's more that (some) humans have a "theory" of the program (in the same sense used by Ryle and Naur); let's take for granted that if one has a theory, then they have understanding; thus (some) humans have an understanding of the program. It's not equivocating between B and B', but rather observing that B implies B'.

Thus, if an LLM lacks understanding (Searle), then they don't have a theory either.

> LLM's build an internal representation that let's them efficiently and mostly successfully manipulate source code. Whether that internal representation is satisfies your criteria for a theory doesn't change that fact.

The entire point of Naur's paper is that the activity of programming, of software engineering, is not just "manipulating source code." It is, rather, building a theory of the software system (which implies an understanding of it), in a way that an LLM or an AI cannot, as posited by Searle.

replies(1): >>43858661 #

60. ben_w ◴[30 Apr 25 19:31 UTC] No.43849672{4}[source]▶

>>43830055 #

I agree with your general point: it is a mistake to say "these two things are mysterious, therefore they are the same".

That said:

> LLMs have no experience, they exist crystalised and unchanging in an abstract eternal now, so they literally can't understand anything.

Being crystalised and unchanging, doesn't tell us either way if they do or don't "understand" anything — if it did, then I could only be said to "understand" whatever I am at some moment actually experiencing, so it would not be allowed to say, for example, that I can understand "water in my mouth" because my memory of previous times I had water in my mouth seem to be like that.

They're definitely not "like us", but that's about all I can say with confidence, and it's a very vague statement.

61. musicale ◴[01 May 25 03:56 UTC] No.43853526{3}[source]▶

>>43822436 #

I imagine Searle feels vindicated since LLMs are good at translating Chinese.

On the other hand I am reminded of Nilsson's rebuttal:

> For all I know, Searle may only be behaving as if he were thinking deeply about these matters. But, even though I disagree with him, his simulation is pretty good, so I’m willing to credit him with real thought.

62. musicale ◴[01 May 25 04:00 UTC] No.43853548{4}[source]▶

>>43822859 #

Nilsson's complaint that Searle is conflating the running program with the underlying system/interpreter that runs it seems accurate.

63. stevenhuang ◴[01 May 25 10:18 UTC] No.43855778{4}[source]▶

>>43830055 #

It's incoherent to think the ability to reason requires the reasoner to be able to change permanently. You realize that LLMs do change; their context window and model weights change on every processed token. Not to mention the weights can be saved and persisted in a sense via LORAs.

The belief LLMs cannot reason maybe justifiable for other reasons, just not for reasons you've outlined.

replies(1): >>43871396 #

64. namaria ◴[01 May 25 12:52 UTC] No.43857029{5}[source]▶

>>43826639 #

> Do you think gravity force is mechanical or is it a metaphysical woo? Because scientists have no idea how it works precisely, just like our brain and consciousness.

Nonsense. We know exactly how gravity works, with high precision. We don't know why it works.

65. namaria ◴[01 May 25 13:01 UTC] No.43857124{3}[source]▶

>>43828753 #

> LLM's build an internal representation that let's them efficiently and mostly successfully manipulate source code.

No, see, this is the problem right here. Everything in this discussion hinges on LLMs behavior. While they are capable of rendering text that looks like it was produced by reasoning from the input, they also often are incapable of that.

LLMs can be used by people who reason about the input and output. If and only if someone can show that LLMs can, without human intervention, go from natural language description to fully looping through the process and building and maintaining the code, that argument could be made.

The "LLM-as-AI" hinges entirely on their propensity to degenerate into nonsensical output being worked out. As long as that remains, LLMs will stay firmly in the camp of being usable to transform some inputs into outputs under supervision and that is no evidence of ability to reason. So the whole conversation devolves into people pointing out that they still descent into nonsense if left to their own devices, and the "LLM-as-AI" people saying "but when they don't..." as if it can be taken for granted that it is at all possible to get there.

Until that happens, using LLMs to generate code will remain a gimmick for using natural language to search for common patterns in popular programming languages.

66. n4r9 ◴[01 May 25 15:06 UTC] No.43858661{4}[source]▶

>>43835675 #

> let's take for granted that if one has a theory, then they have understanding

Leaving aside what is actually meant by "theory" and "understanding". Could it not be argued that eventually LLMs will simulate understanding well enough that - for all intents and purposes - they might as well be said to have a theory?

The parallel I've got in my head is the travelling salesman problem. Yes, it's NP-Hard, which means we are unlikely to ever get a polynomial-time algorithm to solve it. But that doesn't stop us solving TSP problems near-optimally at an industrial scales.

Similarly, although LLMs may not literally have a theory, they could become powerful enough that the edge cases in which a theory is really needed are infinitesimally unlikely.

67. sgt101 ◴[02 May 25 15:49 UTC] No.43871396{5}[source]▶

>>43855778 #

I'm not sure you're right you know. I think that the way that an LLM maintains a conversation is to have the conversational thread fed into an instance of it at every step. You can see this if you do a conversation step by step and then take all of it (including the LLM responses) apart from the final outcome and paste that into a new thread:

https://chatgpt.com/share/6814e827-81cc-8001-a75f-64ed6df5fc...

https://chatgpt.com/share/6814e7fb-f4d0-8001-a503-9c991df832...

if you think about how these things work as services you can see that this makes sense. The model weights are several gb, so caching the model weights for utilisation by a particular customer is impractical. So if the forward pass does update the model then that's instantly discarded, what's retained is the conversational text, and that's the bit that's uploaded to the model on each iteration for a new reply. There are hundreds of requests pinging through the data center where the models are used every second, all of these use the same models.

But if you believe that there is a reasoning process taking place in the text then fair enough.

↑