Uncensor any LLM with abliteration

(huggingface.co)

586 points mizzao | 5 comments | 13 Jun 24 03:42 UTC | HN request time: 0.617s | source

Show context

vasco ◴[13 Jun 24 06:49 UTC] No.40666684[source]▶

> "As an AI assistant, I cannot help you." While this safety feature is crucial for preventing misuse,

What is the safety added by this? What is unsafe about a computer giving you answers?

replies(11): >>40666709 #>>40666828 #>>40666835 #>>40666890 #>>40666984 #>>40666992 #>>40667025 #>>40667243 #>>40667633 #>>40669842 #>>40670809 #

tgsovlerkhgsel ◴[13 Jun 24 07:36 UTC] No.40666984[source]▶

>>40666684 #

I think there are several broad categories all wrapped under "safety":

- PR (avoid hurting feelings, avoid generating text that would make journalists write sensationalist negative articles about the company)

- "forbidden knowledge": Don't give people advice on how to do dangerous/bad things like building bombs (broadly a subcategory of the above - the content is usually discoverable through other means and the LLM generally won't give better advice)

- dangerous advice and advice that's dangerous when wrong: many people don't understand what LLMs do, and the output is VERY convincing even when wrong. So if the model tells people the best way to entertain your kids is to mix bleach and ammonia and blow bubbles (a common deadly recipe recommended on 4chan), there will be dead people.

- keeping bad people from using the model in bad ways, e.g. having it write stories where children are raped, scamming people at scale (think Nigeria scam but automated), or election interference (people are herd animals, so if you show someone 100 different posts from 100 different "people" telling them that X is right and Y is wrong, it will influence them, and at scale this has the potential to tilt elections and conquer countries).

I think the first ones are rather stupid, but the latter ones get more and more important to actually have. Especially the very last one (opinion shifting/election interference) is something where the existence of these models can have a very real, negative effect on the world (affecting you even if you yourself never come into contact with any of the models or its outputs, since you'll have to deal with the puppet government elected due to it), and I appreciate the companies building and running the models doing something about it.

replies(12): >>40667179 #>>40667184 #>>40667217 #>>40667630 #>>40667902 #>>40667915 #>>40667982 #>>40668089 #>>40668819 #>>40669415 #>>40670479 #>>40673732 #

irusensei ◴[13 Jun 24 08:21 UTC] No.40667217[source]▶

>>40666984 #

> keeping bad people from using the model in bad ways, e.g. having it write stories where...

The last ones are rather stupid too. Bad people can just write stories or creating drawings about disgusting things. Should we censor all computers to prevent such things from happening? Or hands and paper?

replies(2): >>40667677 #>>40675860 #

1. ben_w ◴[13 Jun 24 09:48 UTC] No.40667677[source]▶

>>40667217 #

If three men make a tiger, LLMs and diffusion models are a tiger factory.

https://en.wikipedia.org/wiki/Three_men_make_a_tiger

replies(2): >>40668155 #>>40678902 #

2. wruza ◴[13 Jun 24 11:14 UTC] No.40668155[source]▶

>>40667677 (TP) #

It’s always unclear if proverbs actually work or if they are outdated, or an inside self-prophecy of those using them.

E.g. the set of those affected by TMMAT may hugely intersect with those who think it works. Which makes it objective but sort of self-bootstrapping. Isn’t it better to educate people about information and fallacies rather than protecting them from these for life.

replies(1): >>40668925 #

3. ben_w ◴[13 Jun 24 12:38 UTC] No.40668925[source]▶

>>40668155 #

> Isn’t it better to educate people about information and fallacies rather than protecting them from these for life.

The story itself is about someone attempting to educate their boss, and their boss subsequently getting fooled by it anyway — and the harm came to the one trying to do the educating, not the one who believed in the tiger.

I'm not sure it's even possible to fully remove this problem, even if we can minimise it — humans aren't able to access the ground truth of reality just by thinking carefully, we rely on others around us.

(For an extra twist: what if [the fear of misaligned AI] is itself the tiger?)

4. irusensei ◴[14 Jun 24 08:45 UTC] No.40678902[source]▶

>>40667677 (TP) #

That proverb is totally out of place here.

One can use paper and pen to write or draw something disturbing and distribute it through the internet. Should we censor the internet then? Put something on scanners and cameras so it donesn't capture such material?

Why don't we work to put a microchip on people's brains so they are prevented to use their creativity to write something disturbing?

We all want a safe society right? Sounds like a great idea.

replies(1): >>40682759 #

5. ben_w ◴[14 Jun 24 17:23 UTC] No.40682759[source]▶

>>40678902 #

Quantity has a quality all of its own.

About a century ago, people realised that CO2 was a greenhouse gas — they thought this would be good, because it was cold where they lived, and they thought it would take millennia because they looked at what had already been built and didn't extrapolate to everyone else copying them.

Your reply doesn't seem to acknowledge the "factory" part of "tiger factory".

AI is about automation, any given model is a tool that lets anyone do what previously needed expertise, or at least effort: in the past, someone pulled out and fired a gun because of the made-up "pizzagate" conspiracy theory; In the future, everyone gets to be Hillary Clinton for 15 minutes, only with Stable Diffusion putting your face in a perfectly customised video, and the video will come from a random bored teenager looking for excitement who doesn't even realise the harm they're causing.

↑