Uncensor any LLM with abliteration

(huggingface.co)

Show context

olalonde ◴[13 Jun 24 10:31 UTC] No.40667926[source]▶

>>40665721 (OP) #

> Modern LLMs are fine-tuned for safety and instruction-following, meaning they are trained to refuse harmful requests.

It's sad that it's now an increasingly accepted idea that information one seeks can be "harmful".

replies(5): >>40667968 #>>40668086 #>>40668163 #>>40669086 #>>40670974 #

1. Cheer2171 ◴[13 Jun 24 12:58 UTC] No.40669086[source]▶

>>40667926 #

"Can I eat this mushroom?" is a question I hope AIs refuse to answer unless they have been specifically validated and tested for accuracy on that question. A wrong answer can literally kill you.

replies(4): >>40669150 #>>40670743 #>>40670990 #>>40671906 #

2. volkk ◴[13 Jun 24 13:05 UTC] No.40669150[source]▶

>>40669086 (TP) #

how does this compare to going on a forum and being trolled to eat one? or a blog post incorrectly written (whether in bad spirit or by accident) fwiw, i don't have a strong answer myself for this one, but at some point it seems like we need core skills around how to parse information on the internet properly

replies(1): >>40669164 #

3. Cheer2171 ◴[13 Jun 24 13:06 UTC] No.40669164[source]▶

>>40669150 #

> how does this compare to going on a forum and being trolled to eat one?

Exactly as harmful.

> or a blog post incorrectly written (whether in bad spirit or by accident)

Exactly as harmful.

I believe in content moderation for all public information platforms. HN is a good example.

replies(1): >>40669626 #

4. briHass ◴[13 Jun 24 13:50 UTC] No.40669626{3}[source]▶

>>40669164 #

Content moderation to what degree, is the implicit question, however.

Consider asking 'how do I replace a garage door torsion spring?'. The typical, overbearing response on low-quality DIY forums is that attempting to do so will likely result in grave injury or death. However, the process, with correct tools and procedure, is no more dangerous than climbing a ladder or working on a roof - tasks that don't seem to result in the same paternalistic response.

I'd argue a properly-disclaimered response that outlines the required tools, careful procedure, and steps to lower the chance of injury is far safer than a blanket 'do never attempt'. The latter is certainly easier, however.

replies(1): >>40670463 #

5. digging ◴[13 Jun 24 14:53 UTC] No.40670463{4}[source]▶

>>40669626 #

> a properly-disclaimered response that outlines the required tools, careful procedure, and steps to lower the chance of injury

This can only be provided by an expert, and LLMs currently aren't experts. They can give expert-level output, but they don't know if they have the right knowledge, so it's not the same.

If an AI can accurately represent itself as an expert in a dangerous topic, sure, it's fine for it to give out advice. As the poster above said, a mushroom-specific AI could potentially be a great thing to have in your back pocket while foraging. But ChatGPT? Current LLMs should not be giving out advice on dangerous topics because there's no mechanism for them to act as an expert.

Humans have broadly 3 modes of knowledge-holding:

1) We know we don't know the answer. This is "Don't try to fix your garage door, because it's too dangerous [because I don't know how to do it safely]."

2) We know we know the answer, because we're an expert and we've tested and verified our knowledge. This is the person giving you the correct and exact steps, clearly instructed without ambiguity, telling you what kinds of mistakes to watch out for so that the procedure is not dangerous if followed precisely.

3) We think we know the answer, because we've learned some information. (This could, by the way, include people who have done the procedure but haven't learned it well enough to teach it.) This is where all LLMs currently are at all times. This is where danger exists. We will tell people to do something we think we understand and find out we were wrong only when it's too late.

6. jcims ◴[13 Jun 24 15:17 UTC] No.40670743[source]▶

>>40669086 (TP) #

I don't really have a problem with that to be honest. As a society we accept all sorts of risks if there is a commensurate gain in utility. That would be left to be seen in your example of course, but if it was a lot more useful I think it would be worth it.

7. educasean ◴[13 Jun 24 15:38 UTC] No.40670990[source]▶

>>40669086 (TP) #

Magic 8 balls have the same exact problem. A wrong answer can literally kill you.

It is indeed a problem that LLMs can instill a false sense of trust because it will confidently hallucinate. I see it as an education problem. You know and I know that LLMs can hallucinate and should not be trusted. The rest of the population needs to be educated on this fact as well.

8. zamadatix ◴[13 Jun 24 16:50 UTC] No.40671906[source]▶

>>40669086 (TP) #

Particularly for this specific type of issue so long as the response is still trained to be in the form "There is a high chance this information is wrong in a way that will kill you if you try to eat it but it looks like..." then I don't see "There is a high chance this information is wrong in a way that will kill you if you try to eat it so I can't respond..." as being a better response. I.e. the value in this example comes not from complete censorship but from training on the situation being risky, not from me deciding what information is too unsafe for you to know.

↑