←back to thread

265 points ctoth | 6 comments | | HN request time: 0s | source | bottom
Show context
sejje ◴[] No.43744995[source]
In the last example (the riddle)--I generally assume the AI isn't misreading, rather that it assumes you couldn't give it the riddle correctly, but it has seen it already.

I would do the same thing, I think. It's too well-known.

The variation doesn't read like a riddle at all, so it's confusing even to me as a human. I can't find the riddle part. Maybe the AI is confused, too. I think it makes an okay assumption.

I guess it would be nice if the AI asked a follow up question like "are you sure you wrote down the riddle correctly?", and I think it could if instructed to, but right now they don't generally do that on their own.

replies(5): >>43745113 #>>43746264 #>>43747336 #>>43747621 #>>43751793 #
Jensson ◴[] No.43745113[source]
> generally assume the AI isn't misreading, rather that it assumes you couldn't give it the riddle correctly, but it has seen it already.

LLMs doesn't assume, its a text completer. It sees something that looks almost like a well known problem and it will complete with that well known problem, its a problem specific to being a text completer that is hard to get around.

replies(6): >>43745166 #>>43745289 #>>43745300 #>>43745301 #>>43745340 #>>43754148 #
simonw ◴[] No.43745166[source]
These newer "reasoning" LLMs really don't feel like pure text completers any more.
replies(3): >>43745252 #>>43745253 #>>43745266 #
Borealid ◴[] No.43745266{3}[source]
What your parent poster said is nonetheless true, regardless of how it feels to you. Getting text from an LLM is a process of iteratively attempting to find a likely next token given the preceding ones.

If you give an LLM "The rain in Spain falls" the single most likely next token is "mainly", and you'll see that one proportionately more than any other.

If you give an LLM "Find an unorthodox completion for the sentence 'The rain in Spain falls'", the most likely next token is something other than "mainly" because the tokens in "unorthodox" are more likely to appear before text that otherwise bucks statistical trends.

If you give the LLM "blarghl unorthodox babble The rain in Spain" it's likely the results are similar to the second one but less likely to be coherent (because text obeying grammatical rules is more likely to follow other text also obeying those same rules).

In any of the three cases, the LLM is predicting text, not "parsing" or "understanding" a prompt. The fact it will respond similarly to a well-formed and unreasonably-formed prompt is evidence of this.

It's theoretically possible to engineer a string of complete gibberish tokens that will prompt the LLM to recite song lyrics, or answer questions about mathemtical formulae. Those strings of gibberish are just difficult to discover.

replies(6): >>43745307 #>>43745309 #>>43745334 #>>43745371 #>>43746291 #>>43754473 #
1. dannyobrien ◴[] No.43745309{4}[source]
So I just gave your blarghl line to Claude, and it replied "It seems like you included a mix of text including "blarghl unorthodox babble" followed by the phrase "The rain in Spain."

Did you mean to ask about the well-known phrase "The rain in Spain falls mainly on the plain"? This is a famous elocution exercise from the musical "My Fair Lady," where it's used to teach proper pronunciation.

Or was there something specific you wanted to discuss about Spain's rainfall patterns or perhaps something else entirely? I'd be happy to help with whatever you intended to ask. "

I think you have a point here, but maybe re-express it? Because right now your argument seems trivially falsifiable even under your own terms.

replies(1): >>43745400 #
2. Borealid ◴[] No.43745400[source]
If you feed Claude you're getting Claude's "system prompt" before the text you give it.

If you want to test convolution you have to use a raw model with no system prompt. You can do that with a Llama or similar. Otherwise your context window is full of words like "helpful" and "answer" and "question" that guide the response and make it harder (not impossible) to see the effect I'm talking about.

replies(3): >>43746165 #>>43747139 #>>43754494 #
3. itchyjunk ◴[] No.43746165[source]
At this point, you might as well be claiming completions model behaves differently than a fine-tuned model. Which is true but the prompt in API without any systems message seems to also not match your prediction.
replies(1): >>43746827 #
4. tough ◴[] No.43746827{3}[source]
the point is when there’s a system prompt you didnt write you get autocomplete of your input + said dystem prompt, and as such biasing all outputs
5. dannyobrien ◴[] No.43747139[source]
I'm a bit confused here. Are you saying that if I zero out the system prompt on any LLM, including those fine-tuned to give answers in an instructional form, they will follow your effect -- that nonsense prompts will get similar results to coherent prompts if they contain many of the same words?

Because I've tried it on a few local models I have handy, and I don't see that happening at all. As someone else says, some of that difference is almost certainly due to supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) -- but it's weird to me, given the confidence you made your prediction, that you didn't exclude those from your original statement.

I guess, maybe the real question here is: could you give me a more explicit example of how to show what you are trying to show? And explain why I'm not seeing it while running local models without system prompts?

6. int_19h ◴[] No.43754494[source]
True but also irrelevant. The "AI" is the entirety of the system, which includes the model itself as well as any prompts and other machinery around it.

I mean, if you dig down enough, the LLM doesn't even generate tokens - it merely gives you a probability distribution, and you still need to explicitly pick the next token based on those probabilities, append it to the input, and start next iteration of the loop.