←back to thread

178 points themgt | 1 comments | | HN request time: 0.202s | source
Show context
bobbylarrybobby ◴[] No.45777222[source]
I wonder whether they're simply priming Claude to produce this introspective-looking output. They say “do you detect anything” and then Claude says “I detect the concept of xyz”. Could it not be the case that Claude was ready to output xyz on its own (e.g. write some text in all caps) but knowing it's being asked to detect something, it simply does “detect? + all caps = “I detect all caps””.
replies(1): >>45777606 #
drdeca ◴[] No.45777606[source]
They address that. The thing is that when they don’t fiddle with things, it (almost always) answers along the lines of “No, I don’t notice anything weird”, while when they do fiddle with things, it (substantially more often than when they don’t fiddle with it) answers along the lines of “Yes, I notice something weird. Specifically, I notice [description]”.

The key thing being that the yes/no comes before what it says it notices. If it weren’t for that, then yeah, the explanation you gave would cover it.

replies(1): >>45778167 #
1. drivebyhooting ◴[] No.45778167[source]
How about fiddling with the input prompt? I didn’t see that covered in the paper.