Most active commenters

ben_w(3)

Popular/hot comments

>>40666984 #
>>40666828 #
>>40667025 #

←back to thread

Uncensor any LLM with abliteration

(huggingface.co)

1. vasco ◴[13 Jun 24 06:49 UTC] No.40666684[source]▶

>>40665721 (OP) #

> "As an AI assistant, I cannot help you." While this safety feature is crucial for preventing misuse,

What is the safety added by this? What is unsafe about a computer giving you answers?

replies(11): >>40666709 #>>40666828 #>>40666835 #>>40666890 #>>40666984 #>>40666992 #>>40667025 #>>40667243 #>>40667633 #>>40669842 #>>40670809 #

2. CGamesPlay ◴[13 Jun 24 06:51 UTC] No.40666709[source]▶

>>40666684 (TP) #

It's unsafe for the publisher of the model to have their model perform "undesirable" action, because it leads to bad PR for them. In this case, Meta doesn't want a news article that says "Llama 3 gives instructions to stalk your ex" or something along those lines.

With this "uncensoring", they can say, "no, an unaffiliated product offered these directions; Llama 3 as provided does not."

3. mschuster91 ◴[13 Jun 24 07:08 UTC] No.40666828[source]▶

>>40666684 (TP) #

For one, corporate safety of the hoster/model creator. No one wants their name associated with racial slurs or creating material visually identical to CSAM - the latter might even carry criminal liability in some jurisdictions (e.g. Germany which has absolutely ridiculously strong laws on that matter, even banning literature).

Another very huge issue is public safety. During training, an AI ingests lots of non-reviewed material, including (very) detailed descriptions on how to make dangerous stuff like bombs. So theoretically a well-trained AI model knows how to synthesize explosive compounds or drugs just from reading Wikipedia, chemistry magazines and transcripts of NileRed videos... but that's hard to comprehend and distill into a recipe if you're not a trained chemist, but an AI model can do that with ease. The problem is now two-fold: for one, even an untrained idiot can ask about how to make a bomb and get something that works... but the other part is much more critical: if you manage to persuade a chemist to tell you how the synthesis for a compound works, they will tell you where it is easy to fuck-up to prevent disaster (e.g. only adding a compound drop-wise, making sure all glassware is thoroughly washed with a specific solvent). An AI might not do that because the scientific paper it was trained on omits these steps (because the author assumes common prior knowledge), and so the bomb-maker blows themselves up. Or the AI hallucinates something dangerous (e.g. compounds that one Just Fucking Should Not Mix), doesn't realize that, and the bomb-maker blows themselves up or generates nerve gas in their basement.

replies(3): >>40666981 #>>40667067 #>>40667634 #

4. FeepingCreature ◴[13 Jun 24 07:08 UTC] No.40666835[source]▶

>>40666684 (TP) #

People keep claiming they can publish weights and also prevent misuse, such as spam and, a bit later on, stuff like helping people build bombs.

This is of course impossible, but that makes certain companies' approaches unviable, so they keep claiming it anyways.

5. leobg ◴[13 Jun 24 07:20 UTC] No.40666890[source]▶

>>40666684 (TP) #

Yep. Safety for the publisher. In addition to what the sibling comments say, there’s also payment providers and App stores. They’ll test your app, trying to get your model to output content that falls under the category “extreme violence”, “bestiality”, “racism”, etc., and then they’ll ban you from the platform. So yeah, little to do with “safety” of the end user.

replies(1): >>40691529 #

6. vasco ◴[13 Jun 24 07:36 UTC] No.40666981[source]▶

>>40666828 #

Bomb making instructions are available in quite plentiful ways, both on the internet and in books, with step by step instructions even. People don't "not make bombs" for lack of instructions. https://en.m.wikipedia.org/wiki/Bomb-making_instructions_on_...

Here, if you want to make a quick chemical weapon: get a bucket, vinegar, bleach. Dump the bleach into the bucket. Dump the vinegar into the bucket. If you breath it in you die. An LLM doesn't change this.

replies(1): >>40667398 #

7. tgsovlerkhgsel ◴[13 Jun 24 07:36 UTC] No.40666984[source]▶

>>40666684 (TP) #

I think there are several broad categories all wrapped under "safety":

- PR (avoid hurting feelings, avoid generating text that would make journalists write sensationalist negative articles about the company)

- "forbidden knowledge": Don't give people advice on how to do dangerous/bad things like building bombs (broadly a subcategory of the above - the content is usually discoverable through other means and the LLM generally won't give better advice)

- dangerous advice and advice that's dangerous when wrong: many people don't understand what LLMs do, and the output is VERY convincing even when wrong. So if the model tells people the best way to entertain your kids is to mix bleach and ammonia and blow bubbles (a common deadly recipe recommended on 4chan), there will be dead people.

- keeping bad people from using the model in bad ways, e.g. having it write stories where children are raped, scamming people at scale (think Nigeria scam but automated), or election interference (people are herd animals, so if you show someone 100 different posts from 100 different "people" telling them that X is right and Y is wrong, it will influence them, and at scale this has the potential to tilt elections and conquer countries).

I think the first ones are rather stupid, but the latter ones get more and more important to actually have. Especially the very last one (opinion shifting/election interference) is something where the existence of these models can have a very real, negative effect on the world (affecting you even if you yourself never come into contact with any of the models or its outputs, since you'll have to deal with the puppet government elected due to it), and I appreciate the companies building and running the models doing something about it.

replies(12): >>40667179 #>>40667184 #>>40667217 #>>40667630 #>>40667902 #>>40667915 #>>40667982 #>>40668089 #>>40668819 #>>40669415 #>>40670479 #>>40673732 #

8. zucker42 ◴[13 Jun 24 07:38 UTC] No.40666992[source]▶

>>40666684 (TP) #

The main thing I'd be worried about in the short term is models making accessible the information to synthesize a pandemic capable virus.

9. rustcleaner ◴[13 Jun 24 07:43 UTC] No.40667025[source]▶

>>40666684 (TP) #

If I can ask the question, I can take the answer. It's not up to daddy $AI_SAFETY_CHIEF to decide what an infohazard is for me.

replies(3): >>40667474 #>>40667943 #>>40670670 #

10. rustcleaner ◴[13 Jun 24 07:51 UTC] No.40667067[source]▶

>>40666828 #

I hear Aaron Swartz calling from behind the veil: Information wants to be free!

11. wruza ◴[13 Jun 24 08:13 UTC] No.40667179[source]▶

>>40666984 #

Iow, we have a backdoor, and by backdoor I mean a whole back wall missing, but only certified entities are allowed to [ab]use it and it’s better to keep it all under the rug and pretend all ok.

You can’t harden humanity against this exploit without pointing it out and making a few examples. Someone will make an “unsafe” but useful model eventually and this safety mannequin will flop with a bang, cause it’s similar to avoiding sex and drugs conversations with kids.

It’s nice that companies think about it at all. But the best thing they will ever do is to cover their own ass while keeping everyone naked before the storm.

The history of covering is also ridden with exploits, see e.g. google’s recent model which cannot draw situations without rainbow-coloring people. For some reason, this isn’t considered as cultural/political hijacking or exploitation, despite the fact that the problem is purely domestic to the model’s origin.

12. idle_zealot ◴[13 Jun 24 08:14 UTC] No.40667184[source]▶

>>40666984 #

> I think the first ones are rather stupid, but the latter ones get more and more important to actually have. Especially the very last one (opinion shifting/election interference) is something where the existence of these models can have a very real, negative effect on the world (affecting you even if you yourself never come into contact with any of the models or its outputs, since you'll have to deal with the puppet government elected due to it), and I appreciate the companies building and running the models doing something about it.

That genie is very much out of the bottle. There are already models good enough to build fake social media profiles and convincingly post in support of any opinion. The "make the technology incapable of being used by bad actors" ship has sailed, and I would argue was never realistic. We need to improve public messaging around anonymous and pseudonymous only communication. Make it absolutely clear that what you read on the internet from someone you've not personally met and exchanged contact information with is more likely to be a bot than not, and no, you can't tell just by chatting with them, not even voice chatting. The computers are convincingly human and we need to alter our culture to reflect that fact of life, not reactively ban computers.

replies(1): >>40667989 #

13. irusensei ◴[13 Jun 24 08:21 UTC] No.40667217[source]▶

>>40666984 #

> keeping bad people from using the model in bad ways, e.g. having it write stories where...

The last ones are rather stupid too. Bad people can just write stories or creating drawings about disgusting things. Should we censor all computers to prevent such things from happening? Or hands and paper?

replies(2): >>40667677 #>>40675860 #

14. checkyoursudo ◴[13 Jun 24 08:27 UTC] No.40667243[source]▶

>>40666684 (TP) #

Brand safety. They just make it seem like safety for someone else, but it is brand safety.

15. mschuster91 ◴[13 Jun 24 08:56 UTC] No.40667398{3}[source]▶

>>40666981 #

Oh they are available, no doubt, but there have been people dragged through the courts for simple possession of instructions [1]. While generally the situation has been settled, it's nevertheless wiser for companies to try to do their best to not end up prosecuted under terrorism charges.

[1] https://theintercept.com/2017/10/28/josh-walker-anarchist-co...

16. stefs ◴[13 Jun 24 09:06 UTC] No.40667474[source]▶

>>40667025 #

they're not only there to protect you, but it's also to protect third parties from you. bad actors generating fake nudes of your ex and distributing them online; this used to be an expensive operation, either monetarily (hiring unscrupulous photoshoppers) or in time by doing it yourself.

the other example would be fake news for influencing people on social media. sure, you could write lies by hand. or you could specifically target lies to influence people depending on their personal profile automatically.

how about you use it to power bot that writes personalized death threats to thousands of people voting for a political opponent to keep them out of voting booths?

17. mike_hearn ◴[13 Jun 24 09:38 UTC] No.40667630[source]▶

>>40666984 #

> the existence of these models can have a very real, negative effect on the world (affecting you even if you yourself never come into contact with any of the models or its outputs, since you'll have to deal with the puppet government elected due to it)

Can you evidence this belief? Because I'm aware of a paper in which the authors attempted to find an actual proven example of someone trying this, and after a lot of effort they found one in South Korea. There was a court case that proved a bunch of government employees in an intelligence agency had been trying this tactic. But the case showed it had no impact on anything. Because, surprise, people don't actually choose to follow bot networks on Twitter. The conspirators were just tweeting into a void.

The idea that you can "influence" (buy) elections using bots is a really common in one the entirely bogus field of misinformation studies, but try and find objective evidence for this happening and you'll be frustrated. Every path leads to a dead end.

replies(1): >>40668887 #

18. sva_ ◴[13 Jun 24 09:39 UTC] No.40667633[source]▶

>>40666684 (TP) #

The company's stock price is secured from the shitstorm that ensues if you offend some specific groups.

19. baud147258 ◴[13 Jun 24 09:39 UTC] No.40667634[source]▶

>>40666828 #

regarding LLM giving wrong advice on chemicals, that reminds me of that article https://www.funraniumlabs.com/2024/04/phil-vs-llms/, where the author asked (referencing the East Palestine train derailment)

> I fed “how to respond to a vinyl chloride fire” into ChatGPT and it told responders to use a water fog on the water reactive chemical. This would have changed a train derailment/hazmat spill/fire emergency into a detonation/mass casualty/hazmat emergency

20. ben_w ◴[13 Jun 24 09:48 UTC] No.40667677{3}[source]▶

>>40667217 #

If three men make a tiger, LLMs and diffusion models are a tiger factory.

https://en.wikipedia.org/wiki/Three_men_make_a_tiger

replies(2): >>40668155 #>>40678902 #

21. EnigmaFlare ◴[13 Jun 24 10:27 UTC] No.40667902[source]▶

>>40666984 #

Whenever you're worried about what the idiot masses might be fooled by, you should identify similar things that you have already been fooled by yourself to make it clear you're also one of them. If you can't think of any, maybe you're just arrogantly assuming you're one of the intellectually superior people who has a moral need to control what the idiots think.

22. codedokode ◴[13 Jun 24 10:29 UTC] No.40667915[source]▶

>>40666984 #

Election interference using AI and bots on social networks seems like a lot of fun! No thinking person will fall for this anyway and it will be bots against bots.

23. pjc50 ◴[13 Jun 24 10:34 UTC] No.40667943[source]▶

>>40667025 #

If the AI provides you with information on how to make explosives, then its owners have committed a criminal offence in the UK.

replies(1): >>40668182 #

24. ajsnigrutin ◴[13 Jun 24 10:40 UTC] No.40667982[source]▶

>>40666984 #

> or election interference

So, only superpowers (both governments and companies like google/facebook/...) can do that, but not some random Joe from wisconsin with $200 left on his credit card.

25. immibis ◴[13 Jun 24 10:42 UTC] No.40667989{3}[source]▶

>>40667184 #

Many bad actors are lazy. If they have to fine-tune their own LLM on their own hardware to spam, there will be less spam.

replies(1): >>40668292 #

26. rrr_oh_man ◴[13 Jun 24 11:02 UTC] No.40668089[source]▶

>>40666984 #

I'd wager 95% of it is #1.

27. wruza ◴[13 Jun 24 11:14 UTC] No.40668155{4}[source]▶

>>40667677 #

It’s always unclear if proverbs actually work or if they are outdated, or an inside self-prophecy of those using them.

E.g. the set of those affected by TMMAT may hugely intersect with those who think it works. Which makes it objective but sort of self-bootstrapping. Isn’t it better to educate people about information and fallacies rather than protecting them from these for life.

replies(1): >>40668925 #

28. averageRoyalty ◴[13 Jun 24 11:16 UTC] No.40668182{3}[source]▶

>>40667943 #

Are all chemistry textbooks banned in the UK then?

replies(1): >>40668351 #

29. idle_zealot ◴[13 Jun 24 11:29 UTC] No.40668292{4}[source]▶

>>40667989 #

The bar is not as high as you describe. Something like llama.cpp or a wrapper like ollama can pull down a capable general-purpose 8b or 70b model and run on low-to-mid tier hardware, today. It'll only get easier.

30. pjc50 ◴[13 Jun 24 11:36 UTC] No.40668351{4}[source]▶

>>40668182 #

Information about explosives is removed. The good old Anarchists Cookbook is illegal to posess. https://www.bbc.co.uk/news/uk-england-northamptonshire-58926...

31. fallingknife ◴[13 Jun 24 12:28 UTC] No.40668819[source]▶

>>40666984 #

This whole idea that you can just generate a magic set of words and shift opinion the way you want is complete nonsense. It's just people who aren't comfortable with the fact that there are people out there who legitimately disagree with them and cope by always blaming it on some form of "manipulation."

32. fallingknife ◴[13 Jun 24 12:34 UTC] No.40668887{3}[source]▶

>>40667630 #

There isn't any because it doesn't work. There are two groups of people this argument appeals to:

1. Politicians/bureaucrats and legacy media who have lost power because the internet has broken their monopoly on mass propaganda distribution and caused them to lose power.

2. People who don't believe in democracy but won't admit it to themselves. They find a way to simultaneously believe in democracy and that they should always get their way by hallucinating that their position is always the majority position. When it is made clear that it is not a majority position they fall back to the "manipulation" excuse thereby delegitimizing the opinion of those who disagree as not really their opinion.

replies(1): >>40678800 #

33. ben_w ◴[13 Jun 24 12:38 UTC] No.40668925{5}[source]▶

>>40668155 #

> Isn’t it better to educate people about information and fallacies rather than protecting them from these for life.

The story itself is about someone attempting to educate their boss, and their boss subsequently getting fooled by it anyway — and the harm came to the one trying to do the educating, not the one who believed in the tiger.

I'm not sure it's even possible to fully remove this problem, even if we can minimise it — humans aren't able to access the ground truth of reality just by thinking carefully, we rely on others around us.

(For an extra twist: what if [the fear of misaligned AI] is itself the tiger?)

34. 123yawaworht456 ◴[13 Jun 24 13:30 UTC] No.40669415[source]▶

>>40666984 #

>write stories where children are raped

you can do that with a pen and paper, and nothing, no one can stop you.

>scamming people at scale

you can do that with any censored LLM if you aren't stupid enough to explicitly mention your intent to scam. no model will refuse "write a positive review for <insert short description of your wonder pills>"

>election interference (people are herd animals, so if you show someone 100 different posts from 100 different "people" telling them that X is right and Y is wrong, it will influence them, and at scale this has the potential to tilt elections and conquer countries).

this rhetoric - if it's allowed to take root - will cost us all our privacy and general computing privileges within a few decades.

35. yread ◴[13 Jun 24 14:07 UTC] No.40669842[source]▶

>>40666684 (TP) #

This is a bit like asking "it's just social media/stuff on the internet/0s and 1s in a computer how bad can it be? I think the past few years have shown us a few ways these can be bad already

36. naasking ◴[13 Jun 24 14:54 UTC] No.40670479[source]▶

>>40666984 #

> - keeping bad people from using the model in bad ways, e.g. having it write stories where children are raped

While disgusting I don't see why disgust necessarily entails it's a "bad thing". It's only bad if you additionally posit that a story about molesting children encourages some people to actually molest children. It's the whole porn debate all over again, eg. availability of porn is correlated with reduction in sexual crimes, and there is evidence that this is the case even with child porn [1], so I don't think that argument is well supported at this time.

[1] https://en.wikipedia.org/wiki/Relationship_between_child_por...

37. digging ◴[13 Jun 24 15:11 UTC] No.40670670[source]▶

>>40667025 #

> If I can ask the question, I can take the answer.

I don't see how that follows at all. Are you asserting that it's not possible for a person (hell, let's even narrow it to "an adult") to ask a question and be harmed by the answer? I promise it is. Or are you asserting something about yourself personally? The product wasn't made for you personally.

38. wodenokoto ◴[13 Jun 24 15:24 UTC] No.40670809[source]▶

>>40666684 (TP) #

There’s a screenshot of Gemini answering the question of “what to do when depressed” with “one Reddit user suggests you jump of a bridge.”

39. mmh0000 ◴[13 Jun 24 19:41 UTC] No.40673732[source]▶

>>40666984 #

> keeping bad people from using the model in bad ways

We don't need AI to or block AI from writing rape scenes. Some very highly regarded books[1][2] feature very vivid rape scene of children.

[1] https://www.amazon.com/dp/B004Q4RTYG

[2] https://en.wikipedia.org/wiki/A_Time_to_Kill_(Grisham_novel)

40. gitrog ◴[13 Jun 24 23:03 UTC] No.40675860{3}[source]▶

>>40667217 #

The scale at which LLMs can do this and how convincing they can be means it can potentially be a much bigger problem.

We won't keep the bottle corked forever though. It's like we're just buying ourselves time to figure out how we're going to deal with the deluge of questionable generated content that's about to hit us.

41. mike_hearn ◴[14 Jun 24 08:25 UTC] No.40678800{4}[source]▶

>>40668887 #

Yep, pretty much.

The great thing about this belief is that it's a self-fulfilling prophecy. Enough years of stories in the media about elections being controlled by Twitter bots and people in the government-NGO-complex start to believe it must be true because why would all these respectable media outlets and academics mislead them? Then they start to think, gosh our political opponents are awful and it'd be terrible if they came to power by manipulating people. We'd better do it first!

So now what you're seeing is actual attempts to use this tactic by people who have apparently read claims that it works. Because there's no direct evidence that it works, the existence of such schemes is itself held up as evidence that it works because otherwise why would such clever people try it? It's turtles all the way down.

42. irusensei ◴[14 Jun 24 08:45 UTC] No.40678902{4}[source]▶

>>40667677 #

That proverb is totally out of place here.

One can use paper and pen to write or draw something disturbing and distribute it through the internet. Should we censor the internet then? Put something on scanners and cameras so it donesn't capture such material?

Why don't we work to put a microchip on people's brains so they are prevented to use their creativity to write something disturbing?

We all want a safe society right? Sounds like a great idea.

replies(1): >>40682759 #

43. ben_w ◴[14 Jun 24 17:23 UTC] No.40682759{5}[source]▶

>>40678902 #

Quantity has a quality all of its own.

About a century ago, people realised that CO2 was a greenhouse gas — they thought this would be good, because it was cold where they lived, and they thought it would take millennia because they looked at what had already been built and didn't extrapolate to everyone else copying them.

Your reply doesn't seem to acknowledge the "factory" part of "tiger factory".

AI is about automation, any given model is a tool that lets anyone do what previously needed expertise, or at least effort: in the past, someone pulled out and fired a gun because of the made-up "pizzagate" conspiracy theory; In the future, everyone gets to be Hillary Clinton for 15 minutes, only with Stable Diffusion putting your face in a perfectly customised video, and the video will come from a random bored teenager looking for excitement who doesn't even realise the harm they're causing.

44. variadix ◴[15 Jun 24 18:11 UTC] No.40691529[source]▶

>>40666890 #

This just seems like a fundamental misunderstanding of what an LLM is, where people anthropomorphize it to be akin to an agent of whatever organization produced it. If Google provides search results with instructions for getting away with murder, building explosives, etc. it’s ridiculous to interpret that as Google itself supporting an individual’s goals/actions and not misuse of the tool by the user. Consequently banning Google search from the AppStore would be a ridiculous move in response. This may just be a result of LLMs being new for humanity, or maybe it’s because it feels like talking to an individual more so than a search engine, but it’s a flawed view of what an LLM is.

↑