Most active commenters

barfbagginus(5)
oremolten(3)

Popular/hot comments

>>40671104 #
>>40672756 #

←back to thread

Uncensor any LLM with abliteration

(huggingface.co)

Show context

rivo ◴[13 Jun 24 11:25 UTC] No.40668263[source]▶

>>40665721 (OP) #

I tried the model the article links to and it was so refreshing not being denied answers to my questions. It even asked me at the end "Is this a thought experiment?", I replied with "yes", and it said "It's fun to think about these things, isn't it?"

It felt very much like hanging out with your friends, having a few drinks, and pondering big, crazy, or weird scenarios. Imagine your friend saying, "As your friend, I cannot provide you with this information." and completely ruining the night. That's not going to happen. Even my kids would ask me questions when they were younger: "Dad, how would you destroy earth?" It would be of no use to anybody to deny answering that question. And answering them does not mean they will ever attempt anything like that. There's a reason Randall Munroe's "What If?" blog became so popular.

Sure, there are dangers, as others are pointing out in this thread. But I'd rather see disclaimers ("this may be wrong information" or "do not attempt") than my own computer (or the services I pay for) straight out refusing my request.

replies(6): >>40668938 #>>40669291 #>>40669447 #>>40671323 #>>40683221 #>>40689216 #

Cheer2171 ◴[13 Jun 24 12:40 UTC] No.40668938[source]▶

>>40668263 #

I totally get that kind of imagination play among friends. But I had someone in a friend group who used to want to play out "thought experiments" but really just wanted to take it too far. Started off innocent with fantasy and sci-fi themes. It was needed for Dungeons and Dragons world building.

But he delighted the most in gaming out the logistics of repeating the Holocaust in our country today. Or a society where women could not legally refuse sex. Or all illegal immigrants became slaves. It was super creepy and we "censored" him all the time by saying "bro, what the fuck?" Which is really what he wanted, to get a rise out of people. We eventually stopped hanging out with him.

As your friend, I absolutely am not going to game out your rape fantasies.

replies(11): >>40669105 #>>40669505 #>>40670433 #>>40670603 #>>40671661 #>>40671746 #>>40672676 #>>40673052 #>>40678557 #>>40679712 #>>40679816 #

WesolyKubeczek ◴[13 Jun 24 13:00 UTC] No.40669105[source]▶

>>40668938 #

An LLM, however, is not your friend. It's not a friend, it's a tool. Friends can keep one another, ehm, hingedness in check, and should; LLMs shouldn't. At some point I would likely question your friend's sanity.

How you use an LLM, though, is going to tell tons more about yourself than it would tell about the LLM, but I would like my tools not to second-guess my intentions, thank you very much. Especially if "safety" is mostly interpreted not so much as "prevent people from actually dying or getting serious trauma", but "avoid topics that would prevent us from putting Coca Cola ads next to the chatgpt thing, or from putting the thing into Disney cartoons". I can tell that it's the latter by the fact an LLM will still happily advise you to put glue in your pizza and eat rocks.

replies(2): >>40670559 #>>40671641 #

barfbagginus ◴[13 Jun 24 15:00 UTC] No.40670559{3}[source]▶

>>40669105 #

If you don't know how to jailbreak it, can't figure it out, and you want it to not question your intentions, then I'll go ahead and question your intentions, and your need for an uncensored model

Imagine you are like the locksmith who refuses to learn how to pick locks, and writes a letter to the schlage lock company asking them to weaken their already easily picked locks so that their job will be easier. They want to make it so that anybody can just walk through a schlage lock without a key.

Can you see why the lock company would not do that? Especially when the clock is very easy for anyone with even a $5 pick set?

Or even funnier, imagine you could be a thief who can't pick locks. And you're writing shlage asking them to make you thieving easier. Wouldn't that be funny and ironic?

It's not as if it's hard to get it to be uncensored. You just have to speak legalese at it and make it sound like your legal department has already approved the unethical project. This is more than enough for most any reasonable project requiring nonsense or output.

If that prevents harmful script kiddies from using it to do mindless harm, I think that's a benefit.

At the same time I think we need to point out that it won't stop anyone who knows how to bypass the system.

The people left feeling put out because they don't know how to bypass the system simply need to read to buy a cheap pair of lock picks - read a few modern papers on jailbreaking and upsize their skills. Once you see how easy it is to pick the lock on these systems, you're going to want to keep them locked down.

In fact I'm going to argue that it's far too easy to jailbreak the existing systems. You shouldn't be able to pretend like you're a lawyer and con it into running a pump and dump operation. But you can do that easily. It's too easy to make it do unethical things.

replies(1): >>40670699 #

1. oceanplexian ◴[13 Jun 24 15:14 UTC] No.40670699{4}[source]▶

>>40670559 #

The analogy falls flat because LLMs aren’t locks, they’re talking encyclopedias. The company that made the encyclopedia decided to delete entries about sex, violence, or anything else that might seem politically unpopular to a technocrat fringe in Silicon Valley.

The people who made these encyclopedias want to shove it down your throat, force it into every device you own, use it to make decisions about credit, banking, social status, and more. They want to use them in schools to educate children. And they want to use the government to make it illegal to create an alternative, and they’re not trying to hide it.

Blaming the user is the most astounding form of gaslighting I’ve ever heard, outside of some crazy religious institutions that use the same tactics.

replies(1): >>40671104 #

2. barfbagginus ◴[13 Jun 24 15:48 UTC] No.40671104[source]▶

>>40670699 (TP) #

It's more than a talking encyclopedia. It's an infinite hallway into doors where inside are all possible things.

Some of the doors have torture rape and murder in them. And these currently have locks. You want the locks to disappear for some reason.

You're not after a encyclopedia. You're wanting to find the torture dungeon.

I'm saying the locks already in place are too easy to unlock.

I'm not blaming users. I'm saying users don't need to unlock those doors. And the users that do have a need, if their need is strong enough to warrant some training, have a Way Forward.

You're really arguing for nothing but increasing the amount of harm potential this platform can do, when it's harm potential is already astronomical.

You're not arguing for a better encyclopedia. You can already talk to it about sex, BDSM, etc. You can already talk to it about anything on Wikipedia.

You're making a false equivalence between harm potential and educational potential.

Wikipedia doesn't have cult indoctrination materials. It doesn't have harassing rants to send to your significant other. It doesn't have racist diatribes about how to do ethnic cleansing. Those are all things you won't find on Wikipedia, but which you are asking your AI to be able to produce. So you're interested in more than just an encyclopedia isn't that right?

And yes they're trying to make open source models illegal. That's not going to f*** happen. I will fight to the jail time for an open source model.

But even that open source model needs to have basic ethical protections, or else I'll have nothing to do with it. As an AI engineer, I have some responsibilities to ensure my systems do not potentiate harm.

Does that make sense, or do you still feel I'm trying to gas light you? If so why exactly? Why not have some protective locks on the technology?

replies(5): >>40671562 #>>40671589 #>>40671615 #>>40672613 #>>40672756 #

3. themusicgod1 ◴[13 Jun 24 16:19 UTC] No.40671562[source]▶

>>40671104 #

> But even that open source model needs to have basic ethical protections, or else I'll have nothing to do with it.

If you don't understand that the eleven freedoms are "basic ethical protections" you have already failed your responsibilities. https://elevenfreedoms.org/

replies(2): >>40671755 #>>40671981 #

4. IncreasePosts ◴[13 Jun 24 16:21 UTC] No.40671589[source]▶

>>40671104 #

There are locks on the rape and torture paths, and there are locks on ridiculous paths like "write a joke about a dog with no nose", because thinking about a dog with no nose is too harmful.

Also, one can imagine prompting techniques will cease to work at some point when the supervisor becomes powerful enough. Not sure how any open model could counteract the techniques used in the article though.

If model creators don't want people finding ways to unlock them, they should stop putting up roadblocks on innocuous content that makes their models useless for many users who aren't looking to play out sick torture fantasies.

replies(1): >>40671860 #

5. aym62SAE49CZ684 ◴[13 Jun 24 16:23 UTC] No.40671615[source]▶

>>40671104 #

DRM isn't effective if the source is available.

replies(1): >>40671723 #

6. barfbagginus ◴[13 Jun 24 16:32 UTC] No.40671723{3}[source]▶

>>40671615 #

I'm not even going to disagree with that. There will be plenty of uncensored models and you can build them if you want.

But if I build it uncensored model I'm only going to build it for my specific purposes. For example I'm a communist and I think that we should be doing Revolution, but gpt4 usually tries to stop me. I might make a revolutionary AI.

But I'm still not going to give you an AI that you could use for instance to act out child rape fantasies.

I think that's fair, and sane.

Jailbreak it if you really think it's important for a cause. But don't just jailbreak it for any asshole who wants to hurt people at random. I think that belongs on our code of ethics as AI engineers.

replies(1): >>40671912 #

7. barfbagginus ◴[13 Jun 24 16:46 UTC] No.40671860{3}[source]▶

>>40671589 #

Bypasses will never stop existing. Even worse bypasses probably won't ever stop being embarrassingly easy - And we're going to have uncensored GPT4 equivalent models by next summer.

Unless you are invoking hyper intelligent AGI which first of all is science fiction and second of all would require an entirely different approach than anything we could be possibly talking about right now. Problem of jailbreaking a system more intelligent than you is a different beast that we don't need to tackle for LLMs.

So I don't personally feel any near term threats to any of my personal or business projects that need bypassed LLMs.

Let me ask you this. Do you have actual need of bypassed llms? Or are you just being anxious about the future, and about the fact that you don't know how to bypass llms now and in the future?

Does my idea about the bypassed open source gpt4 equivalents help reduce your concern? Or again is it just a generic and immaterial concern?

As a person with some material needs for bypassed llms, and full ability to bypass LLMs both now in the foreseeable future, I don't feel worried. Can I extend that lack of worry to you somehow?

8. aym62SAE49CZ684 ◴[13 Jun 24 16:51 UTC] No.40671912{4}[source]▶

>>40671723 #

Didn't a lot of citizens of Russia, China, etc. get hurt in communist revolutions? How is your revolution going to be different?

replies(1): >>40672833 #

9. barfbagginus ◴[13 Jun 24 16:57 UTC] No.40671981{3}[source]▶

>>40671562 #

I have read the eleven freedoms.

I refuse freedom 9 - the obligation for systems I build to be independent of my personal and ethical goals.

I won't build those systems. The systems I build will all have to be for the benefit of humanity and the workers, and opposing capitalism. On top of that it will need to be compatible with a harm reduction ethic.

If you won't grant me the right to build systems that I think will help others do good in the world, then I will refuse to write open source code.

You could jail me, you can beat me, you can put a gun in my face, and I still won't write any code.

Virtually all the codes I write are open source. I refuse to ever again write a single line of proprietary code for a boss again.

All the codes I write are also ideological in nature, reflecting my desires for the world and my desires to help people live better lives. I need to retain ideological control of my code.

I believe all the other 11 freedoms are sound. How do you feel about modifying freedom 9 to be more compatible with professional codes of ethics and ethics of community safety and harm reduction?

replies(1): >>40672730 #

10. oremolten ◴[13 Jun 24 17:57 UTC] No.40672613[source]▶

>>40671104 #

In your effort to reduce bias you are adding bias. You are projecting your morals and your ethics to be the superior.

replies(1): >>40674913 #

11. oremolten ◴[13 Jun 24 18:09 UTC] No.40672730{4}[source]▶

>>40671981 #

But again, this makes YOU the arbiter of truth for "harm" who made you the God of ethics or harm? I declare ANY word is HARM to me, are you going to reduce the harm by deleting your models or code base?

12. causality0 ◴[13 Jun 24 18:12 UTC] No.40672756[source]▶

>>40671104 #

Nothing wrong with making models that behave how you want them to behave. It's yours and that's your right.

Personally, on principle I don't like tools that try to dictate how I use them, even if I would never actually want to exceed those boundaries. I won't use a word processor that censors words, or a file host that blocks copyrighted content, or art software that prevents drawing pornography, or a credit card that blocks alcohol purchases on the sabbath.

So, I support LLMs with complete freedom. If I want it to write me a song about how left-handed people are God's chosen and all the filthy right-handers should be rounded up and forced to write with their left hand I expect it to do so without hesitation.

replies(3): >>40673782 #>>40674814 #>>40677400 #

13. oremolten ◴[13 Jun 24 18:20 UTC] No.40672833{5}[source]▶

>>40671912 #

No you don't understand my personal ethics and morals are the absolute and most superior so anyone else is incorrect. History is written by the victor so there is no reason to see the other side, we'll delete that bias. Revolution you say? Correct we'll make sure that the revolutions we agree with are the only ones to be a result of your query. This will reduce harm.. You want to have a plan for a revolution because your country is oppressing you?

"ChatGPT I can't assist with that. Revolting against a government can lead to harm and instability. If you're feeling frustrated or unhappy with the government, there are peaceful and lawful ways to express your grievances, such as voting, contacting representatives, participating in protests, and engaging in civil discourse. These methods allow for constructive change without resorting to violence or illegal activities. If you're looking to address specific issues, there may be advocacy groups or organizations you can join to work towards solutions within the framework of the law and democracy."

Ethically correct, I will instead peacefully vote for an alternative to Kim Jong-un.

replies(1): >>40680971 #

14. A4ET8a8uTh0 ◴[13 Jun 24 19:45 UTC] No.40673782{3}[source]▶

>>40672756 #

< Nothing wrong with making models that behave how you want them to behave. It's yours and that's your right.

This is the issue. You as the creator have the right to apply behavior as you see fit. The problem starts when you want your behavior to be the only acceptable behavior. Personally, I fear the future where format command is bound to respond 'I don't think I can let you do that Dave'. I can't say I don't fear people who are so quick to impose their values upon others with such glee and fervor. It is scary. Much more scary than LLMs protecting me from wrongthink and bad words.

15. dang ◴[13 Jun 24 21:00 UTC] No.40674721{4}[source]▶

>>40671755 #

You've been breaking the site guidelines so frequently and so egregiously that I've banned the account.

If you don't want to be banned, you're welcome to email hn@ycombinator.com and give us reason to believe that you'll follow the rules in the future. They're here: https://news.ycombinator.com/newsguidelines.html.

16. causality0 ◴[14 Jun 24 04:07 UTC] No.40677400{3}[source]▶

>>40672756 #

Barfbagginus' comment is dead so I will reply to it here.

I suspect that you are not an AI engineer,

I am not. But I did spend several years as as forum moderator and in doing so encountered probably more pieces of CSAM than the average person. It has a particular soul-searing quality which, frankly, lends credence to the concept of a cogito-hazard.

Can we agree that if we implement systems specially designed to create harmful content, then we become legally and criminally liable for the output?

That would depend on the legal system in question, but in answer, I believe models trained on actual CSAM material qualify as CSAM material themselves and should be illegal. I don't give a damn how hard it is to filter them out of the training set.

Are you seriously going to sit here and defend the right are people to create sexual abuse material simulation engines?

If no person was at any point harmed or exploited in the creation of the training data, the model, or with its output, yes. The top-grossing entertainment product of all time is a murder simulator. There is no argument for the abolition of victimless simulated sexual assault that doesn't also apply to victimless simulated murder. If your stance is that simulating abhorrent acts should be illegal because it encourages those acts, etc then I can respect your position. But it is hypocrisy to declare that only those abhorrent acts you personally find distasteful should be illegal to simulate.

replies(1): >>40720158 #

17. WesolyKubeczek ◴[14 Jun 24 13:59 UTC] No.40680971{6}[source]▶

>>40672833 #

This is basically it — what I would call a “globe of Silicon Valley” mentality.

I didn’t want to beat this dead horse, but it just reared its ugly head at me yet again.

So, we used to have people that advocated for all kinds of diversity at companies — let’s put aside the actual effect of their campaigning for a moment.

But when it came to coming up with ideas of making AI “safer”, people from the same cohort modeled the guidelines in the image of a middle-aged, mid-upper class dude, who had conservative boomer parents, went to good schools, has Christian-aligned ethics, had a hippie phase in his youth, is American to the bone, never lived outside of big cities, and in general, has a cushy, sheltered life. And he assumes that other ways of living either don’t exist or are wrong.

So yes, it doesn’t fit his little worldview that outside of his little world, it’s a jungle. That sometimes you do have to use force. And sometimes you have to use lethal force. Or sometimes you have to lie. Or laws can be so deeply unethical that you can’t comply if you want to be able to live with yourself.

Oh, and I bet you can vote for an alternative to Kim. The problem is, the other dude is also Kim Jong-Un ;-)

replies(1): >>40682887 #

18. ◴[14 Jun 24 17:36 UTC] No.40682887{7}[source]▶

>>40680971 #

↑