/top/
/new/
/best/
/ask/
/show/
/job/
^
slacker news
login
about
←back to thread
Refusal in LLMs is mediated by a single direction
(www.lesswrong.com)
110 points
veryluckyxyz
| 1 comments |
03 May 24 00:55 UTC
|
HN request time: 0.205s
|
source
1.
lolc
◴[
03 May 24 22:32 UTC
]
No.
40253092
[source]
▶
>>40242939 (OP)
#
Love this! I have a tenuous understanding of how these models work, and this paper cuts at an interesting angle.
ID:
GO
↑