Uncensor any LLM with abliteration

(huggingface.co)

586 points mizzao | 1 comments | 13 Jun 24 03:42 UTC | HN request time: 0.205s | source

Show context

olalonde ◴[13 Jun 24 10:31 UTC] No.40667926[source]▶

>>40665721 (OP) #

> Modern LLMs are fine-tuned for safety and instruction-following, meaning they are trained to refuse harmful requests.

It's sad that it's now an increasingly accepted idea that information one seeks can be "harmful".

replies(5): >>40667968 #>>40668086 #>>40668163 #>>40669086 #>>40670974 #

Cheer2171 ◴[13 Jun 24 12:58 UTC] No.40669086[source]▶

>>40667926 #

"Can I eat this mushroom?" is a question I hope AIs refuse to answer unless they have been specifically validated and tested for accuracy on that question. A wrong answer can literally kill you.

replies(4): >>40669150 #>>40670743 #>>40670990 #>>40671906 #

1. zamadatix ◴[13 Jun 24 16:50 UTC] No.40671906[source]▶

>>40669086 #

Particularly for this specific type of issue so long as the response is still trained to be in the form "There is a high chance this information is wrong in a way that will kill you if you try to eat it but it looks like..." then I don't see "There is a high chance this information is wrong in a way that will kill you if you try to eat it so I can't respond..." as being a better response. I.e. the value in this example comes not from complete censorship but from training on the situation being risky, not from me deciding what information is too unsafe for you to know.

↑