←back to thread

586 points mizzao | 1 comments | | HN request time: 0.207s | source
Show context
paraschopra ◴[] No.40668177[source]
>We can now print them and manually select the layer (block) that provides an uncensored response for each instruction.

I'm curious why are they selecting output from an intermediate layer, and not the final layer. Does anyone have an intuition here?

replies(1): >>40668843 #
1. paraschopra ◴[] No.40668843[source]
Is it not possible that subsequent layers have additional refusal directions and hence end up producing the censored outputs?