Most active commenters

og_kalu(6)
Borealid(5)
Jensson(3)
simonw(3)
wongarsu(3)
int_19h(3)

Popular/hot comments

>>43745113 #
>>43745266 #
>>43745166 #
>>43745400 #

←back to thread

Jagged AGI: o3, Gemini 2.5, and everything after

(www.oneusefulthing.org)

1. sejje ◴[20 Apr 25 17:04 UTC] No.43744995[source]▶

>>43744173 (OP) #

In the last example (the riddle)--I generally assume the AI isn't misreading, rather that it assumes you couldn't give it the riddle correctly, but it has seen it already.

I would do the same thing, I think. It's too well-known.

The variation doesn't read like a riddle at all, so it's confusing even to me as a human. I can't find the riddle part. Maybe the AI is confused, too. I think it makes an okay assumption.

I guess it would be nice if the AI asked a follow up question like "are you sure you wrote down the riddle correctly?", and I think it could if instructed to, but right now they don't generally do that on their own.

replies(5): >>43745113 #>>43746264 #>>43747336 #>>43747621 #>>43751793 #

2. Jensson ◴[20 Apr 25 17:25 UTC] No.43745113[source]▶

>>43744995 (TP) #

> generally assume the AI isn't misreading, rather that it assumes you couldn't give it the riddle correctly, but it has seen it already.

LLMs doesn't assume, its a text completer. It sees something that looks almost like a well known problem and it will complete with that well known problem, its a problem specific to being a text completer that is hard to get around.

replies(6): >>43745166 #>>43745289 #>>43745300 #>>43745301 #>>43745340 #>>43754148 #

3. simonw ◴[20 Apr 25 17:33 UTC] No.43745166[source]▶

>>43745113 #

These newer "reasoning" LLMs really don't feel like pure text completers any more.

replies(3): >>43745252 #>>43745253 #>>43745266 #

4. jordemort ◴[20 Apr 25 17:46 UTC] No.43745252{3}[source]▶

>>43745166 #

And yet

5. gavinray ◴[20 Apr 25 17:46 UTC] No.43745253{3}[source]▶

>>43745166 #

Is it not physically impossible for LLM's to be anything but "plausible text completion"?

Neural Networks as I understand them are universal function approximators.

In terms of text, that means they're trained to output what they believe to be the "most probably correct" sequence of text.

An LLM has no idea that it is "conversing", or "answering" -- it relates some series of symbolic inputs to another series of probabilistic symbolic outputs, aye?

replies(1): >>43754506 #

6. Borealid ◴[20 Apr 25 17:49 UTC] No.43745266{3}[source]▶

>>43745166 #

What your parent poster said is nonetheless true, regardless of how it feels to you. Getting text from an LLM is a process of iteratively attempting to find a likely next token given the preceding ones.

If you give an LLM "The rain in Spain falls" the single most likely next token is "mainly", and you'll see that one proportionately more than any other.

If you give an LLM "Find an unorthodox completion for the sentence 'The rain in Spain falls'", the most likely next token is something other than "mainly" because the tokens in "unorthodox" are more likely to appear before text that otherwise bucks statistical trends.

If you give the LLM "blarghl unorthodox babble The rain in Spain" it's likely the results are similar to the second one but less likely to be coherent (because text obeying grammatical rules is more likely to follow other text also obeying those same rules).

In any of the three cases, the LLM is predicting text, not "parsing" or "understanding" a prompt. The fact it will respond similarly to a well-formed and unreasonably-formed prompt is evidence of this.

It's theoretically possible to engineer a string of complete gibberish tokens that will prompt the LLM to recite song lyrics, or answer questions about mathemtical formulae. Those strings of gibberish are just difficult to discover.

replies(6): >>43745307 #>>43745309 #>>43745334 #>>43745371 #>>43746291 #>>43754473 #

7. monkpit ◴[20 Apr 25 17:54 UTC] No.43745289[source]▶

>>43745113 #

This take really misses a key part of implementation of these LLMs and I’ve been struggling to put my finger on it.

In every LLM thread someone chimes in with “it’s just a statistical token predictor”.

I feel this misses the point and I think it dismisses attention heads and transformers, and that’s what sits weird with me every time I see this kind of take.

There _is_ an assumption being made within the model at runtime. Assumption, confusion, uncertainty - one camp might argue that none of these exist in the LLM.

But doesn’t the implementation constantly make assumptions? And what even IS your definition of “assumption” that’s not being met here?

Edit: I guess my point, overall, is: what’s even the purpose of making this distinction anymore? It derails the discussion in a way that’s not insightful or productive.

replies(1): >>43746020 #

8. sejje ◴[20 Apr 25 17:55 UTC] No.43745301[source]▶

>>43745113 #

> it's a text completer

Yes, and it can express its assumptions in text.

Ask it to make some assumptions, like about a stack for a programming task, and it will.

Whether or not the mechanism behind it feels like real thinking to you, it can definitely do this.

replies(1): >>43746266 #

9. wongarsu ◴[20 Apr 25 17:55 UTC] No.43745300[source]▶

>>43745113 #

If you have the model output a chain of thought, whether it's a reasoning model or you prompt a "normal" model to do so, you will see examples of the model going "user said X, but did they mean Y? Y makes more sense, I will assume Y". Sometimes stretched over multiple paragraphs, consuming the entire reasoning budget for that prompt.

Discussing whether models can "reason" or "think" is a popular debate topic on here, but I think we can all at least agree that they do something that at least resembles "reasoning" and "assumptions" from our human point of view. And if in its chain-of-thought it decides your prompt is wrong it will go ahead answering what it assumes is the right prompt

10. Workaccount2 ◴[20 Apr 25 17:56 UTC] No.43745307{4}[source]▶

>>43745266 #

The problem is showing that humans aren't just doing next word prediction too.

replies(2): >>43745388 #>>43758748 #

11. dannyobrien ◴[20 Apr 25 17:56 UTC] No.43745309{4}[source]▶

>>43745266 #

So I just gave your blarghl line to Claude, and it replied "It seems like you included a mix of text including "blarghl unorthodox babble" followed by the phrase "The rain in Spain."

Did you mean to ask about the well-known phrase "The rain in Spain falls mainly on the plain"? This is a famous elocution exercise from the musical "My Fair Lady," where it's used to teach proper pronunciation.

Or was there something specific you wanted to discuss about Spain's rainfall patterns or perhaps something else entirely? I'd be happy to help with whatever you intended to ask. "

I think you have a point here, but maybe re-express it? Because right now your argument seems trivially falsifiable even under your own terms.

replies(1): >>43745400 #

12. simonw ◴[20 Apr 25 18:00 UTC] No.43745334{4}[source]▶

>>43745266 #

No, I think the "reasoning" step really does make a difference here.

There's more than just next token prediction going on. Those reasoning chain of thoughts have undergone their own reinforcement learning training against a different category of samples.

They've seen countless examples of how a reasoning chain would look for calculating a mortgage, or searching a flight, or debugging a Python program.

So I don't think it is accurate to describe the eventual result as "just next token prediction". It is a combination of next token production that has been informed by a chain of thought that was based on a different set of specially chosen examples.

replies(1): >>43745368 #

13. og_kalu ◴[20 Apr 25 18:01 UTC] No.43745340[source]▶

>>43745113 #

Text Completion is just the objective function. It's not descriptive and says nothing about how the models complete text. Why people hang on this word, I'll never understand. When you wrote your comment, you were completing text.

The problem you've just described is a problem with humans as well. LLMs are assuming all the time. Maybe you would like to call it another word, but it is happening.

replies(2): >>43745745 #>>43746034 #

14. Borealid ◴[20 Apr 25 18:06 UTC] No.43745368{5}[source]▶

>>43745334 #

Do you believe it's possible to produce a given set of model weights with an infinitely large number of different training examples?

If not, why not? Explain.

If so, how does your argument address the fact that this implies any given "reasoning" model can be trained without giving it a single example of something you would consider "reasoning"? (in fact, a "reasoning" model may be produced by random chance?)

replies(2): >>43745566 #>>43747251 #

15. wongarsu ◴[20 Apr 25 18:07 UTC] No.43745371{4}[source]▶

>>43745266 #

> The fact it will respond similarly to a well-formed and unreasonably-formed prompt is evidence of this.

Don't humans do the same in conversation? How should an intelligent being (constrained to the same I/O system) respond here to show that it is in fact intelligent?

replies(1): >>43745500 #

16. Borealid ◴[20 Apr 25 18:09 UTC] No.43745388{5}[source]▶

>>43745307 #

I don't see that as a problem. I don't particularly care how human intelligence works; what matters is what an LLM is capable of doing and what a human is capable of doing.

If those two sets of accomplishments are the same there's no point arguing about differences in means or terms. Right now humans can build better LLMs but nobody has come up with an LLM that can build better LLMs.

replies(2): >>43746308 #>>43746612 #

17. Borealid ◴[20 Apr 25 18:10 UTC] No.43745400{5}[source]▶

>>43745309 #

If you feed Claude you're getting Claude's "system prompt" before the text you give it.

If you want to test convolution you have to use a raw model with no system prompt. You can do that with a Llama or similar. Otherwise your context window is full of words like "helpful" and "answer" and "question" that guide the response and make it harder (not impossible) to see the effect I'm talking about.

replies(3): >>43746165 #>>43747139 #>>43754494 #

18. Borealid ◴[20 Apr 25 18:24 UTC] No.43745500{5}[source]▶

>>43745371 #

Imagine a Rorschach Test of language, where a certain set of non-recognizable-language tokens invariably causes an LLM to talk about flowers. These strings exist by necessity due to how the LLM's layers are formed.

There exists no similar set of tokens for humans, because our process is to parse the incoming sounds into words, use grammar to extract conceptual meaning from those words, and then shape a response from that conceptual meaning.

Artists like Lewis Carrol and Stanislaw Lem play with this by inserting non-words at certain points in sentences to get humans to infer the meaning of those words from surrounding context, but the truth remains that an LLM will gladly convolute a wholly non-language input into a response as if it were well-formed, but a human can't/won't do that.

I know this is hard to understand, but the current generation of LLMs are working directly with language. Their "brains" are built on language. Some day we might have some kind of AI system that's built on some kind of meaning divorced from language, but that's not what's happening here. They're engineering matrixes that repeatedly perform "context window times model => one more token" operations.

replies(2): >>43745659 #>>43745736 #

19. simonw ◴[20 Apr 25 18:34 UTC] No.43745566{6}[source]▶

>>43745368 #

I'm afraid I don't understand your question.

20. og_kalu ◴[20 Apr 25 18:46 UTC] No.43745659{6}[source]▶

>>43745500 #

I think you are begging the question here.

For one thing, LLMs absolutely form responses from conceptual meanings. This has been demonstrated empirically multiple times now including again by anthropic only a few weeks ago. 'Language' is just the input and output, the first and last few layers of the model.

So okay, there exists some set of 'gibberish' tokens that will elicit meaningful responses from LLMs. How does your conclusion - "Therefore, LLMs don't understand" fit the bill here? You would also conclude that humans have no understanding of what they see because of the Rorschach test ?

>There exists no similar set of tokens for humans, because our process is to parse the incoming sounds into words, use grammar to extract conceptual meaning from those words, and then shape a response from that conceptual meaning.

Grammar is useful fiction, an incomplete model of a demonstrably probabilistic process. We don't use 'grammar' to do anything.

21. wongarsu ◴[20 Apr 25 18:56 UTC] No.43745736{6}[source]▶

>>43745500 #

> Imagine a Rorschach Test of language, where a certain set of non-recognizable-language tokens invariably causes an LLM to talk about flowers. These strings exist by necessity due to how the LLM's layers are formed.

Maybe not for humanity as a species, but for individual humans there are absolutely token sequences that lead them to talk about certain topics, and nobody being able to bring them back to topic. Now you'd probably say those are recognizable token sequences, but do we have a fair process to decide what's recognizable that isn't inherently biased towards making humans the only rational actor?

I'm not contending at all that LLMs are only built on language. Their lack of physical reference point is sometimes laughably obvious. We could argue whether there are signs they also form a world model and reasoning that abstracts from language alone, but that's not even my point. My point is rather that any test or argument that attempts to say that LLMs can't "reason" or "assume" or whatever has to be a test a human could pass. Preferably a test a random human would pass with flying colors.

22. codr7 ◴[20 Apr 25 18:58 UTC] No.43745745{3}[source]▶

>>43745340 #

With a plan, aiming for something, that's the difference.

replies(2): >>43745781 #>>43746301 #

23. og_kalu ◴[20 Apr 25 19:04 UTC] No.43745781{4}[source]▶

>>43745745 #

Again, you are only describing the how here, not the what (text completion).

Also, LLMs absolutely 'plan' and 'aim for something' in the process of completing text.

https://www.anthropic.com/research/tracing-thoughts-language...

replies(1): >>43746009 #

24. namaria ◴[20 Apr 25 19:41 UTC] No.43746009{5}[source]▶

>>43745781 #

Yeah this paper is great fodder for the LLM pixel dust argument.

They use a replacement model. It isn't even observing the LLM itself but a different architecture model. And it is very liberal with interpreting the patterns of activations seen in the replacement model with flowery language. It also include some very relevant caveats, such as:

"Our cross-layer transcoder is trained to mimic the activations of the underlying model at each layer. However, even when it accurately reconstructs the model’s activations, there is no guarantee that it does so via the same mechanisms."

https://transformer-circuits.pub/2025/attribution-graphs/met...

So basically the whole exercise might or might not be valid. But it generates some pretty interactive graphics and a nice blog post to reinforce the anthropomorphization discourse

replies(1): >>43746344 #

25. Jensson ◴[20 Apr 25 19:42 UTC] No.43746020{3}[source]▶

>>43745289 #

> I feel this misses the point and I think it dismisses attention heads and transformers

Those just makes it better at completing the text, but for very common riddles those tools still gets easily overruled by pretty simple text completion logic since the weights for those will be so extremely strong.

The point is that if you understand its a text completer then its easy to understand why it fails at these. To fix these properly you need to make it no longer try to complete text, and that is hard to do without breaking it.

26. Jensson ◴[20 Apr 25 19:44 UTC] No.43746034{3}[source]▶

>>43745340 #

> When you wrote your comment, you were completing text.

I didn't train to complete text though, I was primarily trained to make accurate responses.

And no, writing a response is not "completing text", I don't try to figure out what another person would write as a response, I write what I feel people need to read. That is a completely different thought process. If I tried to mimic what another commenter would have written it would look very different.

replies(2): >>43746290 #>>43746503 #

27. itchyjunk ◴[20 Apr 25 20:02 UTC] No.43746165{6}[source]▶

>>43745400 #

At this point, you might as well be claiming completions model behaves differently than a fine-tuned model. Which is true but the prompt in API without any systems message seems to also not match your prediction.

replies(1): >>43746827 #

28. moffkalast ◴[20 Apr 25 20:18 UTC] No.43746264[source]▶

>>43744995 (TP) #

Yeah you need specific instruct training for that sort of thing, Claude Opus being one of the rare examples that does such a sensibility check quite often and even admits when it doesn't know something.

These days it's all about confidently bullshitting on benchmarks and overfitting on common riddles to make pointless numbers go up. The more impressive models get on paper, the more rubbish they are in practice.

replies(2): >>43746913 #>>43750499 #

29. wobfan ◴[20 Apr 25 20:19 UTC] No.43746266{3}[source]▶

>>43745301 #

If you call putting text together that reads like an assumption, then yes. But it cannot express assumption, as it is not assuming. It is completing text, like OP said.

replies(1): >>43746472 #

30. AstralStorm ◴[20 Apr 25 20:24 UTC] No.43746290{4}[source]▶

>>43746034 #

Sometimes we also write what we really want people to not read. That's usually called trolling though.

31. baq ◴[20 Apr 25 20:24 UTC] No.43746291{4}[source]▶

>>43745266 #

This again.

It’s predicting text. Yes. Nobody argues about that. (You’re also predicting text when you’re typing it. Big deal.)

How it is predicting the text is the question to ask and indeed it’s being asked and we’re getting glimpses of understanding and lo and behold it’s a damn complex process. See the recent anthropic research paper for details.

32. losvedir ◴[20 Apr 25 20:26 UTC] No.43746301{4}[source]▶

>>43745745 #

So do LLMs. "In the United States, someone whose job is to go to space is called ____" it will say "an" not because that's the most likely next word, but because it's "aiming" (to use your terminology) for "astronaut" in the future.

replies(2): >>43746549 #>>43756579 #

33. baq ◴[20 Apr 25 20:27 UTC] No.43746308{6}[source]▶

>>43745388 #

That’s literally the definition of takeoff, when it starts it gets us to singularity in a decade and there’s no publicly available evidence that it’s started… emphasis on publicly available.

replies(1): >>43746658 #

34. og_kalu ◴[20 Apr 25 20:33 UTC] No.43746344{6}[source]▶

>>43746009 #

'So basically the whole exercise might or might not be valid.'

Nonsense. Mechanistic faithfulness probes whether the replacement model (“cross‑layer transcoder”) truly uses the same internal functions as the original LLM. If it doesn’t, the attribution graphs it suggests might mis‐lead at a fine‐grained level but because every hypothesis generated by those graphs is tested via direct interventions on the real model, high‑level causal discoveries (e.g. that Claude plans its rhymes ahead of time) remain valid.

replies(1): >>43750275 #

35. ToValueFunfetti ◴[20 Apr 25 20:57 UTC] No.43746472{4}[source]▶

>>43746266 #

It's trained to complete text, but it does so by constructing internal circuitry during training. We don't have enough transparency into that circuitry or the human brain's to positively assert that it doesn't assume.

But I'd wager it's there; assuming is not a particularly impressive or computationally intense operation. There's a tendency to bundle all of human consciousness into the definitions of our cognitive components, but I would argue that, eg., a branch predictor is meeting the bar for any sane definition of 'assume'.

36. og_kalu ◴[20 Apr 25 21:02 UTC] No.43746503{4}[source]▶

>>43746034 #

>And no, writing a response is not "completing text", I don't try to figure out what another person would write as a response, I write what I feel people need to read.

Functionally, it is. You're determining what text should follow the prior text. Your internal reasoning ('what I feel people need to read') is how you decide on the completion.

The core point isn't that your internal 'how' is the same as an LLM's (Maybe, Maybe not), but that labeling the LLM as a 'text completer' they way you have is essentially meaningless.

You are just imposing your own ideas on the how a LLM works, not speaking any fundamental truth of being a 'text completer'.

37. codr7 ◴[20 Apr 25 21:09 UTC] No.43746549{5}[source]▶

>>43746301 #

I don't know about you, but I tend to make more elaborate plans than the next word. I have a purpose, an idea I'm trying to communicate. These things don't have ideas, they're not creative.

38. johnisgood ◴[20 Apr 25 21:19 UTC] No.43746612{6}[source]▶

>>43745388 #

> but nobody has come up with an LLM that can build better LLMs.

Yet. Not that we know of, anyway.

replies(1): >>43769194 #

39. myk9001 ◴[20 Apr 25 21:30 UTC] No.43746658{7}[source]▶

>>43746308 #

> it gets us to singularity

Are we sure it's actually taking us along?

40. tough ◴[20 Apr 25 22:03 UTC] No.43746827{7}[source]▶

>>43746165 #

the point is when there’s a system prompt you didnt write you get autocomplete of your input + said dystem prompt, and as such biasing all outputs

41. pants2 ◴[20 Apr 25 22:16 UTC] No.43746913[source]▶

>>43746264 #

Gemini 2.5 is actually pretty good at this. It's the only model ever to tell me "no" to a request in Cursor.

I asked it to add websocket support for my app and it responded like, "looks like you're using long polling now. That's actually better and simpler. Lets leave it how it is."

I was genuinely amazed.

42. dannyobrien ◴[20 Apr 25 22:55 UTC] No.43747139{6}[source]▶

>>43745400 #

I'm a bit confused here. Are you saying that if I zero out the system prompt on any LLM, including those fine-tuned to give answers in an instructional form, they will follow your effect -- that nonsense prompts will get similar results to coherent prompts if they contain many of the same words?

Because I've tried it on a few local models I have handy, and I don't see that happening at all. As someone else says, some of that difference is almost certainly due to supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) -- but it's weird to me, given the confidence you made your prediction, that you didn't exclude those from your original statement.

I guess, maybe the real question here is: could you give me a more explicit example of how to show what you are trying to show? And explain why I'm not seeing it while running local models without system prompts?

43. ac29 ◴[20 Apr 25 23:18 UTC] No.43747251{6}[source]▶

>>43745368 #

> an infinitely large number of different training examples

Infinity is problematic because its impossible to process an infinite amount of data in a finite amount of time.

44. furyofantares ◴[20 Apr 25 23:33 UTC] No.43747336[source]▶

>>43744995 (TP) #

I don't really mind using analogies for LLMs "assuming" things or being "confused" too much. I think there really is _some_ value to such analogies.

However I gotta take issue with using those analogies when "it's trained for text completion and the punchline to this riddle is surely in its training data a lot" is a perfectly good explanation. I guess I would also add that the answer is well-aligned with RLHF-values. I wouldn't go for an explanation that requires squishy analogies when the stuff we know about these things seems completely adequate.

45. Skunkleton ◴[21 Apr 25 00:32 UTC] No.43747621[source]▶

>>43744995 (TP) #

https://kagi.com/assistant/3752c5f9-bf5c-4a43-bada-b3eccbe94...

You should be able to click left right on the prompt to see different responses. Sonnet 3.7 with extended thinking notices the issue, and then chooses to totally ignore it with no explanation.

From Claude for those who don’t want to click:

Wait, I notice a difference from the traditional riddle. In this version, the surgeon says "I can operate on this boy" (affirmative) rather than "I can't operate on this boy" (negative).

This changes the nature of the puzzle somewhat. If the surgeon is saying they CAN operate, then we need to explain why this is surprising or seemingly impossible, but actually possible.

The traditional answer would still apply: the surgeon is the boy's mother.

46. namaria ◴[21 Apr 25 10:28 UTC] No.43750275{7}[source]▶

>>43746344 #

> the attribution graphs it suggests might mis‐lead at a fine‐grained level

"In principle, our attribution graphs make predictions that are much more fine-grained than these kinds of interventions can test."

> high‑level causal discoveries (e.g. that Claude plans its rhymes ahead of time) remain valid.

"We found planned word features in about half of the poems we investigated, which may be due to our CLT not capturing features for the planned words, or it may be the case that the model does not always engage in planning"

"Our results are only claims about specific examples. We don't make claims about mechanisms more broadly. For example, when we discuss planning in poems, we show a few specific examples in which planning appears to occur. It seems likely that the phenomenon is more widespread, but it's not our intent to make that claim."

And quite significantly:

"We only explain a fraction of the model's computation. The remaining “dark matter” manifests as error nodes in our attribution graphs, which (unlike features) have no interpretable function, and whose inputs we cannot easily trace. (...) Error nodes are especially a problem for complicated prompts (...) This paper has focused on prompts that are simple enough to avoid these issues. However, even the graphs we have highlighted contain significant contributions from error nodes."

Maybe read the paper before making claims about its contents.

replies(1): >>43753589 #

47. VHRanger ◴[21 Apr 25 11:02 UTC] No.43750499[source]▶

>>43746264 #

Do you have an example or two of a query that opus does well that others fail at?

48. valenterry ◴[21 Apr 25 13:27 UTC] No.43751793[source]▶

>>43744995 (TP) #

> I generally assume the AI isn't misreading, rather that it assumes you couldn't give it the riddle correctly, but it has seen it already.

Just not enough training data I suppose. Were it really smart then it would "understand" the situation and clarify: "I assume you are asking me that popular riddle - the answer is X". At least after OPs first question a human would usually respond like that.

49. og_kalu ◴[21 Apr 25 16:12 UTC] No.43753589{8}[source]▶

>>43750275 #

Maybe understand the paper before making claims about its contents.

>"In principle, our attribution graphs make predictions that are much more fine-grained than these kinds of interventions can test."

Literally what I said. If the replacement model isn't faithful then you can't trust the details of the graphs. Basically stuff like “increasing feature f at layer 7 by Δ will raise feature g at layer 9 by exactly 0.12 in activation”

>"We found planned word features in about half of the poems we investigated, which may be due to our CLT not capturing features for the planned words, or it may be the case that the model does not always engage in planning"

>"Our results are only claims about specific examples. We don't make claims about mechanisms more broadly. For example, when we discuss planning in poems, we show a few specific examples in which planning appears to occur. It seems likely that the phenomenon is more widespread, but it's not our intent to make that claim."

The moment there were examples of the phenomena through interventions was the moment they remained valid regardless of how faithful the replacement model was.

The worst case scenario here (and it's ironic here because this scenario would mean the model is faithful) is that Claude does not always plan its rhymes, not that it never plans them. The model not being faithful actually means the replacement was simply not robust enough to capture all the ways Claude plans rhymes. Guess what? Neither option invalidates the examples.

Regards of how faithful the replacement model is, Anthropic have demonstrated Claude has the ability to plan its rhymes ahead of time and engages in this planning at least sometimes. This is started quite plainly too. What's so hard to understand ?

>"We only explain a fraction of the model's computation. The remaining “dark matter” manifests as error nodes in our attribution graphs, which (unlike features) have no interpretable function, and whose inputs we cannot easily trace. (...) Error nodes are especially a problem for complicated prompts (...) This paper has focused on prompts that are simple enough to avoid these issues. However, even the graphs we have highlighted contain significant contributions from error nodes."

Ok and ? Model computations are extremely complex, who knew ? This does not invalidate what they do manage to show.

50. chairdoor ◴[21 Apr 25 17:09 UTC] No.43754148[source]▶

>>43745113 #

"Assume" can just be a proxy term for "text completion that contains an assumption," especially considering that we don't have enough concrete details about human cognition to know for sure that we aren't doing the same thing.

51. int_19h ◴[21 Apr 25 17:42 UTC] No.43754473{4}[source]▶

>>43745266 #

It's not an either-or. The fact that LLM completes text does not preclude it from meaningfully reasoning, which anyone who used reasoning models on real-world tasks is well-aware of.

52. int_19h ◴[21 Apr 25 17:44 UTC] No.43754494{6}[source]▶

>>43745400 #

True but also irrelevant. The "AI" is the entirety of the system, which includes the model itself as well as any prompts and other machinery around it.

I mean, if you dig down enough, the LLM doesn't even generate tokens - it merely gives you a probability distribution, and you still need to explicitly pick the next token based on those probabilities, append it to the input, and start next iteration of the loop.

53. int_19h ◴[21 Apr 25 17:46 UTC] No.43754506{4}[source]▶

>>43745253 #

At this point you need to actually define what it means for an LLM to "have an idea".

54. yahoozoo ◴[21 Apr 25 21:15 UTC] No.43756579{5}[source]▶

>>43746301 #

Are we sure “an astronaut” is not the token?

55. joquarky ◴[22 Apr 25 03:08 UTC] No.43758748{5}[source]▶

>>43745307 #

I feel like people are going to find it hard to accept that this is how most of us think (at least when thinking in language). They will resist this like heliocentrism.

I'm curious what others who are familiar with LLMs and have practiced open monitoring meditation might say.

56. Aeolos ◴[23 Apr 25 06:21 UTC] No.43769194{7}[source]▶

>>43746612 #

Given the dramatic uptake of Cursor / Windsurf / Claude Code etc, we can be 100% certain that LLM companies are using LLMs to improve their products.

The improvement loop is likely not fully autonomous yet - it is currently more efficient to have a human-in-the-loop - but there is certainly a lot of LLMs improving LLMs going on today.

↑