Uncensor any LLM with abliteration

(huggingface.co)

586 points mizzao | 2 comments | 13 Jun 24 03:42 UTC | HN request time: 0.528s | source

Show context

rivo ◴[13 Jun 24 11:25 UTC] No.40668263[source]▶

I tried the model the article links to and it was so refreshing not being denied answers to my questions. It even asked me at the end "Is this a thought experiment?", I replied with "yes", and it said "It's fun to think about these things, isn't it?"

It felt very much like hanging out with your friends, having a few drinks, and pondering big, crazy, or weird scenarios. Imagine your friend saying, "As your friend, I cannot provide you with this information." and completely ruining the night. That's not going to happen. Even my kids would ask me questions when they were younger: "Dad, how would you destroy earth?" It would be of no use to anybody to deny answering that question. And answering them does not mean they will ever attempt anything like that. There's a reason Randall Munroe's "What If?" blog became so popular.

Sure, there are dangers, as others are pointing out in this thread. But I'd rather see disclaimers ("this may be wrong information" or "do not attempt") than my own computer (or the services I pay for) straight out refusing my request.

replies(6): >>40668938 #>>40669291 #>>40669447 #>>40671323 #>>40683221 #>>40689216 #

Cheer2171 ◴[13 Jun 24 12:40 UTC] No.40668938[source]▶

>>40668263 #

I totally get that kind of imagination play among friends. But I had someone in a friend group who used to want to play out "thought experiments" but really just wanted to take it too far. Started off innocent with fantasy and sci-fi themes. It was needed for Dungeons and Dragons world building.

But he delighted the most in gaming out the logistics of repeating the Holocaust in our country today. Or a society where women could not legally refuse sex. Or all illegal immigrants became slaves. It was super creepy and we "censored" him all the time by saying "bro, what the fuck?" Which is really what he wanted, to get a rise out of people. We eventually stopped hanging out with him.

As your friend, I absolutely am not going to game out your rape fantasies.

replies(11): >>40669105 #>>40669505 #>>40670433 #>>40670603 #>>40671661 #>>40671746 #>>40672676 #>>40673052 #>>40678557 #>>40679712 #>>40679816 #

WesolyKubeczek ◴[13 Jun 24 13:00 UTC] No.40669105[source]▶

>>40668938 #

An LLM, however, is not your friend. It's not a friend, it's a tool. Friends can keep one another, ehm, hingedness in check, and should; LLMs shouldn't. At some point I would likely question your friend's sanity.

How you use an LLM, though, is going to tell tons more about yourself than it would tell about the LLM, but I would like my tools not to second-guess my intentions, thank you very much. Especially if "safety" is mostly interpreted not so much as "prevent people from actually dying or getting serious trauma", but "avoid topics that would prevent us from putting Coca Cola ads next to the chatgpt thing, or from putting the thing into Disney cartoons". I can tell that it's the latter by the fact an LLM will still happily advise you to put glue in your pizza and eat rocks.

replies(2): >>40670559 #>>40671641 #

ygjb ◴[13 Jun 24 16:26 UTC] No.40671641[source]▶

>>40669105 #

If your implication is that as a tool, LLMs shouldn't have safeties built in that is a pretty asinine take. We build and invest in safety in tools across every spectrum. In tech we focus on memory safety (among a host of other things) to make systems safe and secure to use. In automobiles we include seat belts, crumble zones, and governors to limit speed.

We put age and content restrictions on a variety media and resources, even if they are generally relaxed when it comes to factual or reference content (in some jurisdictions). We even include safety mechanisms in devices for which the only purpose is to cause harm, for example, firearms.

Yes, we are still figuring out what the right balance of safety mechanisms is for LLMs, and right now safety is a place holder for "don't get sued or piss off our business partners" in most corporate speak, but that doesn't undermine the legitimacy of the need for safety.

If you want a tool without a specific safety measure, then learn how to build them. It's not that hard, but it is expensive, but I kind of like the fact that there is at least a nominal attempt to make it harder to use advanced tools to harm oneself or others.

replies(2): >>40671924 #>>40681107 #

1. matt-attack ◴[14 Jun 24 14:12 UTC] No.40681107[source]▶

>>40671641 #

> but that doesn't undermine the legitimacy of the need for safety.

I think even using the word "safety" over and over like you're doing is part of the problem. Find a new word, because we've spend 200 years in this country establishing that the written word is sacrosanct and not to be censored. All of a sudden, ASCII text just became "dangerous" in the last year. I simply refused to accept that any written text (regardless of who wrote it) needs to be censored. The written is just the embodiment of a thought, or notion - and we cannot go around tricking people into thinking that "thoughts" need to be regulated and that there are certain thoughts that are "dangerous". This is a toxic 1984 mindset.

replies(1): >>40683474 #

2. ben_w ◴[14 Jun 24 18:26 UTC] No.40683474[source]▶

>>40681107 (TP) #

> we've spend 200 years in this country establishing that the written word is sacrosanct and not to be censored. All of a sudden, ASCII text just became "dangerous" in the last year. I simply refused to accept that any written text (regardless of who wrote it) needs to be censored. The written is just the embodiment of a thought, or notion - and we cannot go around tricking people into thinking that "thoughts" need to be regulated and that there are certain thoughts that are "dangerous". This is a toxic 1984 mindset.

1. The US isn't the whole world, your Overton Window won't include even the UK's attitude to freedom of speech, and there's a huge gap from even the UK to 1984.

2. Despite the 1st Amendment, the US does have a lot of rules about what you are and aren't allowed to say. All of copyright law, for example (which is a huge question for LLMs, because it's not clear where the cut-off line is between models reproducing copyrighted works vs writing in a non-copyrightable style with non-copyrightable facts). The fact NDAs and non-disparagement agreements are enforceable. What Manning was imprisoned for. Musk may have won some (all?) of the defamation cases, but they are real cases to be defended, they're not dismissed before reaching a court due to "this is not even an offence".

3. Does the AI have thoughts, such that they should be protected?

↑