←back to thread

178 points themgt | 2 comments | | HN request time: 0.54s | source
1. kgeist ◴[] No.45780378[source]
They say it only works about 20% of the time; otherwise it fails to detect anything or the model hallucinates. So they're fiddling with the internals of the network until it says something they expect, and then they call it a success?

Could it be related to attention? If they "inject" a concept that's outside the model's normal processing distribution, maybe some kind of internal equilibrium (found during training) gets perturbed, causing the embedding for that concept to become over-inflated in some layers? And the attention mechanism simply starts attending more to it => "notices"?

I'm not sure if that proves that they posses "genuine capacity to monitor and control their own internal states"

replies(1): >>45782006 #
2. joaogui1 ◴[] No.45782006[source]
Anthropic has amazing scientists and engineers, but when it comes to results that align with the narrative of LLMs being conscious, or intelligent, or similar properties, they tend to blow the results out of proportion

Edit: In my opinion at least, maybe they would say that if models are exhibiting that stuff 20% of the time nowadays then we’re a few years away from that reaching > 50%, or some other argument that I would disagree with probably