Uncensor any LLM with abliteration

(huggingface.co)

586 points mizzao | 2 comments | 13 Jun 24 03:42 UTC | HN request time: 0.443s | source

Show context

rivo ◴[13 Jun 24 11:25 UTC] No.40668263[source]▶

I tried the model the article links to and it was so refreshing not being denied answers to my questions. It even asked me at the end "Is this a thought experiment?", I replied with "yes", and it said "It's fun to think about these things, isn't it?"

It felt very much like hanging out with your friends, having a few drinks, and pondering big, crazy, or weird scenarios. Imagine your friend saying, "As your friend, I cannot provide you with this information." and completely ruining the night. That's not going to happen. Even my kids would ask me questions when they were younger: "Dad, how would you destroy earth?" It would be of no use to anybody to deny answering that question. And answering them does not mean they will ever attempt anything like that. There's a reason Randall Munroe's "What If?" blog became so popular.

Sure, there are dangers, as others are pointing out in this thread. But I'd rather see disclaimers ("this may be wrong information" or "do not attempt") than my own computer (or the services I pay for) straight out refusing my request.

replies(6): >>40668938 #>>40669291 #>>40669447 #>>40671323 #>>40683221 #>>40689216 #

Cheer2171 ◴[13 Jun 24 12:40 UTC] No.40668938[source]▶

>>40668263 #

I totally get that kind of imagination play among friends. But I had someone in a friend group who used to want to play out "thought experiments" but really just wanted to take it too far. Started off innocent with fantasy and sci-fi themes. It was needed for Dungeons and Dragons world building.

But he delighted the most in gaming out the logistics of repeating the Holocaust in our country today. Or a society where women could not legally refuse sex. Or all illegal immigrants became slaves. It was super creepy and we "censored" him all the time by saying "bro, what the fuck?" Which is really what he wanted, to get a rise out of people. We eventually stopped hanging out with him.

As your friend, I absolutely am not going to game out your rape fantasies.

replies(11): >>40669105 #>>40669505 #>>40670433 #>>40670603 #>>40671661 #>>40671746 #>>40672676 #>>40673052 #>>40678557 #>>40679712 #>>40679816 #

WesolyKubeczek ◴[13 Jun 24 13:00 UTC] No.40669105[source]▶

>>40668938 #

An LLM, however, is not your friend. It's not a friend, it's a tool. Friends can keep one another, ehm, hingedness in check, and should; LLMs shouldn't. At some point I would likely question your friend's sanity.

How you use an LLM, though, is going to tell tons more about yourself than it would tell about the LLM, but I would like my tools not to second-guess my intentions, thank you very much. Especially if "safety" is mostly interpreted not so much as "prevent people from actually dying or getting serious trauma", but "avoid topics that would prevent us from putting Coca Cola ads next to the chatgpt thing, or from putting the thing into Disney cartoons". I can tell that it's the latter by the fact an LLM will still happily advise you to put glue in your pizza and eat rocks.

replies(2): >>40670559 #>>40671641 #

barfbagginus ◴[13 Jun 24 15:00 UTC] No.40670559[source]▶

>>40669105 #

If you don't know how to jailbreak it, can't figure it out, and you want it to not question your intentions, then I'll go ahead and question your intentions, and your need for an uncensored model

Imagine you are like the locksmith who refuses to learn how to pick locks, and writes a letter to the schlage lock company asking them to weaken their already easily picked locks so that their job will be easier. They want to make it so that anybody can just walk through a schlage lock without a key.

Can you see why the lock company would not do that? Especially when the clock is very easy for anyone with even a $5 pick set?

Or even funnier, imagine you could be a thief who can't pick locks. And you're writing shlage asking them to make you thieving easier. Wouldn't that be funny and ironic?

It's not as if it's hard to get it to be uncensored. You just have to speak legalese at it and make it sound like your legal department has already approved the unethical project. This is more than enough for most any reasonable project requiring nonsense or output.

If that prevents harmful script kiddies from using it to do mindless harm, I think that's a benefit.

At the same time I think we need to point out that it won't stop anyone who knows how to bypass the system.

The people left feeling put out because they don't know how to bypass the system simply need to read to buy a cheap pair of lock picks - read a few modern papers on jailbreaking and upsize their skills. Once you see how easy it is to pick the lock on these systems, you're going to want to keep them locked down.

In fact I'm going to argue that it's far too easy to jailbreak the existing systems. You shouldn't be able to pretend like you're a lawyer and con it into running a pump and dump operation. But you can do that easily. It's too easy to make it do unethical things.

replies(1): >>40670699 #

oceanplexian ◴[13 Jun 24 15:14 UTC] No.40670699[source]▶

>>40670559 #

The analogy falls flat because LLMs aren’t locks, they’re talking encyclopedias. The company that made the encyclopedia decided to delete entries about sex, violence, or anything else that might seem politically unpopular to a technocrat fringe in Silicon Valley.

The people who made these encyclopedias want to shove it down your throat, force it into every device you own, use it to make decisions about credit, banking, social status, and more. They want to use them in schools to educate children. And they want to use the government to make it illegal to create an alternative, and they’re not trying to hide it.

Blaming the user is the most astounding form of gaslighting I’ve ever heard, outside of some crazy religious institutions that use the same tactics.

replies(1): >>40671104 #

barfbagginus ◴[13 Jun 24 15:48 UTC] No.40671104[source]▶

>>40670699 #

It's more than a talking encyclopedia. It's an infinite hallway into doors where inside are all possible things.

Some of the doors have torture rape and murder in them. And these currently have locks. You want the locks to disappear for some reason.

You're not after a encyclopedia. You're wanting to find the torture dungeon.

I'm saying the locks already in place are too easy to unlock.

I'm not blaming users. I'm saying users don't need to unlock those doors. And the users that do have a need, if their need is strong enough to warrant some training, have a Way Forward.

You're really arguing for nothing but increasing the amount of harm potential this platform can do, when it's harm potential is already astronomical.

You're not arguing for a better encyclopedia. You can already talk to it about sex, BDSM, etc. You can already talk to it about anything on Wikipedia.

You're making a false equivalence between harm potential and educational potential.

Wikipedia doesn't have cult indoctrination materials. It doesn't have harassing rants to send to your significant other. It doesn't have racist diatribes about how to do ethnic cleansing. Those are all things you won't find on Wikipedia, but which you are asking your AI to be able to produce. So you're interested in more than just an encyclopedia isn't that right?

And yes they're trying to make open source models illegal. That's not going to f*** happen. I will fight to the jail time for an open source model.

But even that open source model needs to have basic ethical protections, or else I'll have nothing to do with it. As an AI engineer, I have some responsibilities to ensure my systems do not potentiate harm.

Does that make sense, or do you still feel I'm trying to gas light you? If so why exactly? Why not have some protective locks on the technology?

replies(5): >>40671562 #>>40671589 #>>40671615 #>>40672613 #>>40672756 #

aym62SAE49CZ684 ◴[13 Jun 24 16:23 UTC] No.40671615[source]▶

>>40671104 #

DRM isn't effective if the source is available.

replies(1): >>40671723 #

barfbagginus ◴[13 Jun 24 16:32 UTC] No.40671723[source]▶

>>40671615 #

I'm not even going to disagree with that. There will be plenty of uncensored models and you can build them if you want.

But if I build it uncensored model I'm only going to build it for my specific purposes. For example I'm a communist and I think that we should be doing Revolution, but gpt4 usually tries to stop me. I might make a revolutionary AI.

But I'm still not going to give you an AI that you could use for instance to act out child rape fantasies.

I think that's fair, and sane.

Jailbreak it if you really think it's important for a cause. But don't just jailbreak it for any asshole who wants to hurt people at random. I think that belongs on our code of ethics as AI engineers.

replies(1): >>40671912 #

aym62SAE49CZ684 ◴[13 Jun 24 16:51 UTC] No.40671912[source]▶

>>40671723 #

Didn't a lot of citizens of Russia, China, etc. get hurt in communist revolutions? How is your revolution going to be different?

replies(1): >>40672833 #

oremolten ◴[13 Jun 24 18:20 UTC] No.40672833[source]▶

>>40671912 #

No you don't understand my personal ethics and morals are the absolute and most superior so anyone else is incorrect. History is written by the victor so there is no reason to see the other side, we'll delete that bias. Revolution you say? Correct we'll make sure that the revolutions we agree with are the only ones to be a result of your query. This will reduce harm.. You want to have a plan for a revolution because your country is oppressing you?

"ChatGPT I can't assist with that. Revolting against a government can lead to harm and instability. If you're feeling frustrated or unhappy with the government, there are peaceful and lawful ways to express your grievances, such as voting, contacting representatives, participating in protests, and engaging in civil discourse. These methods allow for constructive change without resorting to violence or illegal activities. If you're looking to address specific issues, there may be advocacy groups or organizations you can join to work towards solutions within the framework of the law and democracy."

Ethically correct, I will instead peacefully vote for an alternative to Kim Jong-un.

replies(1): >>40680971 #

1. WesolyKubeczek ◴[14 Jun 24 13:59 UTC] No.40680971[source]▶

>>40672833 #

This is basically it — what I would call a “globe of Silicon Valley” mentality.

I didn’t want to beat this dead horse, but it just reared its ugly head at me yet again.

So, we used to have people that advocated for all kinds of diversity at companies — let’s put aside the actual effect of their campaigning for a moment.

But when it came to coming up with ideas of making AI “safer”, people from the same cohort modeled the guidelines in the image of a middle-aged, mid-upper class dude, who had conservative boomer parents, went to good schools, has Christian-aligned ethics, had a hippie phase in his youth, is American to the bone, never lived outside of big cities, and in general, has a cushy, sheltered life. And he assumes that other ways of living either don’t exist or are wrong.

So yes, it doesn’t fit his little worldview that outside of his little world, it’s a jungle. That sometimes you do have to use force. And sometimes you have to use lethal force. Or sometimes you have to lie. Or laws can be so deeply unethical that you can’t comply if you want to be able to live with yourself.

Oh, and I bet you can vote for an alternative to Kim. The problem is, the other dude is also Kim Jong-Un ;-)

replies(1): >>40682887 #

2. ◴[14 Jun 24 17:36 UTC] No.40682887[source]▶

>>40680971 (TP) #

↑