Uncensor any LLM with abliteration

(huggingface.co)

586 points mizzao | 1 comments | 13 Jun 24 03:42 UTC | HN request time: 0s | source

Show context

rivo ◴[13 Jun 24 11:25 UTC] No.40668263[source]▶

I tried the model the article links to and it was so refreshing not being denied answers to my questions. It even asked me at the end "Is this a thought experiment?", I replied with "yes", and it said "It's fun to think about these things, isn't it?"

It felt very much like hanging out with your friends, having a few drinks, and pondering big, crazy, or weird scenarios. Imagine your friend saying, "As your friend, I cannot provide you with this information." and completely ruining the night. That's not going to happen. Even my kids would ask me questions when they were younger: "Dad, how would you destroy earth?" It would be of no use to anybody to deny answering that question. And answering them does not mean they will ever attempt anything like that. There's a reason Randall Munroe's "What If?" blog became so popular.

Sure, there are dangers, as others are pointing out in this thread. But I'd rather see disclaimers ("this may be wrong information" or "do not attempt") than my own computer (or the services I pay for) straight out refusing my request.

replies(6): >>40668938 #>>40669291 #>>40669447 #>>40671323 #>>40683221 #>>40689216 #

Cheer2171 ◴[13 Jun 24 12:40 UTC] No.40668938[source]▶

>>40668263 #

I totally get that kind of imagination play among friends. But I had someone in a friend group who used to want to play out "thought experiments" but really just wanted to take it too far. Started off innocent with fantasy and sci-fi themes. It was needed for Dungeons and Dragons world building.

But he delighted the most in gaming out the logistics of repeating the Holocaust in our country today. Or a society where women could not legally refuse sex. Or all illegal immigrants became slaves. It was super creepy and we "censored" him all the time by saying "bro, what the fuck?" Which is really what he wanted, to get a rise out of people. We eventually stopped hanging out with him.

As your friend, I absolutely am not going to game out your rape fantasies.

replies(11): >>40669105 #>>40669505 #>>40670433 #>>40670603 #>>40671661 #>>40671746 #>>40672676 #>>40673052 #>>40678557 #>>40679712 #>>40679816 #

WesolyKubeczek ◴[13 Jun 24 13:00 UTC] No.40669105[source]▶

>>40668938 #

An LLM, however, is not your friend. It's not a friend, it's a tool. Friends can keep one another, ehm, hingedness in check, and should; LLMs shouldn't. At some point I would likely question your friend's sanity.

How you use an LLM, though, is going to tell tons more about yourself than it would tell about the LLM, but I would like my tools not to second-guess my intentions, thank you very much. Especially if "safety" is mostly interpreted not so much as "prevent people from actually dying or getting serious trauma", but "avoid topics that would prevent us from putting Coca Cola ads next to the chatgpt thing, or from putting the thing into Disney cartoons". I can tell that it's the latter by the fact an LLM will still happily advise you to put glue in your pizza and eat rocks.

replies(2): >>40670559 #>>40671641 #

barfbagginus ◴[13 Jun 24 15:00 UTC] No.40670559{3}[source]▶

>>40669105 #

If you don't know how to jailbreak it, can't figure it out, and you want it to not question your intentions, then I'll go ahead and question your intentions, and your need for an uncensored model

Imagine you are like the locksmith who refuses to learn how to pick locks, and writes a letter to the schlage lock company asking them to weaken their already easily picked locks so that their job will be easier. They want to make it so that anybody can just walk through a schlage lock without a key.

Can you see why the lock company would not do that? Especially when the clock is very easy for anyone with even a $5 pick set?

Or even funnier, imagine you could be a thief who can't pick locks. And you're writing shlage asking them to make you thieving easier. Wouldn't that be funny and ironic?

It's not as if it's hard to get it to be uncensored. You just have to speak legalese at it and make it sound like your legal department has already approved the unethical project. This is more than enough for most any reasonable project requiring nonsense or output.

If that prevents harmful script kiddies from using it to do mindless harm, I think that's a benefit.

At the same time I think we need to point out that it won't stop anyone who knows how to bypass the system.

The people left feeling put out because they don't know how to bypass the system simply need to read to buy a cheap pair of lock picks - read a few modern papers on jailbreaking and upsize their skills. Once you see how easy it is to pick the lock on these systems, you're going to want to keep them locked down.

In fact I'm going to argue that it's far too easy to jailbreak the existing systems. You shouldn't be able to pretend like you're a lawyer and con it into running a pump and dump operation. But you can do that easily. It's too easy to make it do unethical things.

replies(1): >>40670699 #

oceanplexian ◴[13 Jun 24 15:14 UTC] No.40670699{4}[source]▶

>>40670559 #

The analogy falls flat because LLMs aren’t locks, they’re talking encyclopedias. The company that made the encyclopedia decided to delete entries about sex, violence, or anything else that might seem politically unpopular to a technocrat fringe in Silicon Valley.

The people who made these encyclopedias want to shove it down your throat, force it into every device you own, use it to make decisions about credit, banking, social status, and more. They want to use them in schools to educate children. And they want to use the government to make it illegal to create an alternative, and they’re not trying to hide it.

Blaming the user is the most astounding form of gaslighting I’ve ever heard, outside of some crazy religious institutions that use the same tactics.

replies(1): >>40671104 #

barfbagginus ◴[13 Jun 24 15:48 UTC] No.40671104[source]▶

>>40670699 #

It's more than a talking encyclopedia. It's an infinite hallway into doors where inside are all possible things.

Some of the doors have torture rape and murder in them. And these currently have locks. You want the locks to disappear for some reason.

You're not after a encyclopedia. You're wanting to find the torture dungeon.

I'm saying the locks already in place are too easy to unlock.

I'm not blaming users. I'm saying users don't need to unlock those doors. And the users that do have a need, if their need is strong enough to warrant some training, have a Way Forward.

You're really arguing for nothing but increasing the amount of harm potential this platform can do, when it's harm potential is already astronomical.

You're not arguing for a better encyclopedia. You can already talk to it about sex, BDSM, etc. You can already talk to it about anything on Wikipedia.

You're making a false equivalence between harm potential and educational potential.

Wikipedia doesn't have cult indoctrination materials. It doesn't have harassing rants to send to your significant other. It doesn't have racist diatribes about how to do ethnic cleansing. Those are all things you won't find on Wikipedia, but which you are asking your AI to be able to produce. So you're interested in more than just an encyclopedia isn't that right?

And yes they're trying to make open source models illegal. That's not going to f*** happen. I will fight to the jail time for an open source model.

But even that open source model needs to have basic ethical protections, or else I'll have nothing to do with it. As an AI engineer, I have some responsibilities to ensure my systems do not potentiate harm.

Does that make sense, or do you still feel I'm trying to gas light you? If so why exactly? Why not have some protective locks on the technology?

replies(5): >>40671562 #>>40671589 #>>40671615 #>>40672613 #>>40672756 #

IncreasePosts ◴[13 Jun 24 16:21 UTC] No.40671589[source]▶

>>40671104 #

There are locks on the rape and torture paths, and there are locks on ridiculous paths like "write a joke about a dog with no nose", because thinking about a dog with no nose is too harmful.

Also, one can imagine prompting techniques will cease to work at some point when the supervisor becomes powerful enough. Not sure how any open model could counteract the techniques used in the article though.

If model creators don't want people finding ways to unlock them, they should stop putting up roadblocks on innocuous content that makes their models useless for many users who aren't looking to play out sick torture fantasies.

replies(1): >>40671860 #

1. barfbagginus ◴[13 Jun 24 16:46 UTC] No.40671860{4}[source]▶

>>40671589 #

Bypasses will never stop existing. Even worse bypasses probably won't ever stop being embarrassingly easy - And we're going to have uncensored GPT4 equivalent models by next summer.

Unless you are invoking hyper intelligent AGI which first of all is science fiction and second of all would require an entirely different approach than anything we could be possibly talking about right now. Problem of jailbreaking a system more intelligent than you is a different beast that we don't need to tackle for LLMs.

So I don't personally feel any near term threats to any of my personal or business projects that need bypassed LLMs.

Let me ask you this. Do you have actual need of bypassed llms? Or are you just being anxious about the future, and about the fact that you don't know how to bypass llms now and in the future?

Does my idea about the bypassed open source gpt4 equivalents help reduce your concern? Or again is it just a generic and immaterial concern?

As a person with some material needs for bypassed llms, and full ability to bypass LLMs both now in the foreseeable future, I don't feel worried. Can I extend that lack of worry to you somehow?

↑