(huggingface.co)

586 points mizzao | 1 comments | 13 Jun 24 03:42 UTC | HN request time: 0s | source

Show context

Der_Einzige ◴[13 Jun 24 04:41 UTC] No.40666019[source]▶

Ironic given that lesswrong folks who presented this did so as part of their mission of motivating policy makers to ban open access to models. Hate their ideology but love their research!

Edit: The data format is the same type used for DPO or RLHF style training. “Good” and “bad”, “harmful” vs “harmless”. What’s fun is to test the performance of this technique using your own datasets, to see how good the personalization is.

replies(2): >>40667531 #>>40669580 #

1. milkey_mouse ◴[13 Jun 24 09:18 UTC] No.40667531[source]▶

>>40666019 #

How is it ironic? Now they just need to wait for open models to be used for something bad enough for policymakers to care.

↑

Uncensor any LLM with abliteration