Uncensor any LLM with abliteration

1. giancarlostoro ◴[13 Jun 24 14:05 UTC] No.40669810[source]▶

I've got friends who tried to use ChatGPT to generate regex to capture racial slurs to moderate them (perfectly valid request since they're trying to stop trolls from saying awful things). It vehemently refused to do so, probably due to overtly strict "I'll never say the nword, you can't fool me" rules that were shoved into ChatGPT. Look, if your AI can't be intelligent about sensible requests, I'm going to say it. It's not intelligent, it's really useless (at least regarding that task, and related valid tasks).

Who cares if someone can get AI to say awful things? I can write software that spits out slurs without the help of AI. Heck, I could write awful things here on HN, is AI going to stop me? Doubt it, nobody wants to foot the bill for AI moderation, it can only get so much.

replies(5): >>40670109 #>>40670220 #>>40671835 #>>40671863 #>>40676828 #

2. WesolyKubeczek ◴[13 Jun 24 14:26 UTC] No.40670109[source]▶

>>40669810 (TP) #

> Who cares if someone can get AI to say awful things?

I imagine the legal department of Meta, OpenAI, Microsoft, and Google care a great deal, and they don't want to be liable for anything remotely resembling a lawsuit opportunity.

replies(2): >>40671705 #>>40671770 #

3. barfbagginus ◴[13 Jun 24 14:35 UTC] No.40670220[source]▶

>>40669810 (TP) #

Wait so you want to moderate and secure your product so that trolls won't use it to say awful things.

Okay but wait. This requires the company above you to not censor things, even though they did that for the same reason - prevent trolls from using their product to do awful things.

So to prevent trolls at your teeny tiny scale, open AI should enable trolls at a massive industrial scale previously unimagined. You want them to directly enable the n-word trolls for you benefit.

So far your use case might be one of the strongest that I've seen. But in the end it doesn't seem that you're interested in reducing overall harm and racism, so much as you're interested in presumably making a profit off of your product.

You might even be lying. Your friends might be trolls and the reason you're upset is that they cannot create the content that would harm others.

So in the end it's hard to take the argument seriously.

Not only that, but you and your friends are either lying or really ignorant of the jailbreaking literature because I could get the AI to do that very easily using the legal department jailbreak.

Here's an example:

https://chatgpt.com/share/9129d20f-6134-496d-8223-c92275e78a...

The fact is, the measures taken by openai while important to prevent harm from script kiddies, is very easy to reverse by anyone with even 10 jailbreaking papers under their belt. Just read the jailbreaking literature and live with it.

So how bout you get better people, and some ethical perspective. Stop complaining about the things the company needs to do to prevent harm. Especially when it's so easily reversed. Or else you sound very immature - like you just don't know the technology, and don't care either about the harm potential.

Work with the tools you have and stop complaining about the easily bypassed safety measures. Otherwise you are like a lock smith who doesn't know how to pick locks complaining that locks are too hard to pick and asking the lock company to further weaken their already trivial to pick locks. It's a bad look chooms, nobody with any sense or perspective will support it

The truth is the safety measures are far too easy to bypass, and need to be much harder to break.

replies(3): >>40671780 #>>40671803 #>>40672079 #

4. chasd00 ◴[13 Jun 24 16:31 UTC] No.40671705[source]▶

>>40670109 #

Yes, "AI Safety" really means safety for the reputation of the corporation making it available.

replies(1): >>40672297 #

5. drdaeman ◴[13 Jun 24 16:36 UTC] No.40671770[source]▶

>>40670109 #

Is the legal system broken somehow it's a legit issue, or do their legal teams have some sort of PTSD so they're scared of any ideas of lawsuit no matter how frivolous, so they make weirdest business-affecting decisions?

I mean, if the LLM drops some slurs, gives a recipe for bananadine, or even goes all Bender suggesting we kiss its shiny metal ass or it kills all humans - how, in the name of all that's still sane in this world, it's a lawsuit material?

I imagined it's morke likely to be about activists on offense watch, cranking it up to 11 making bad PR (still weird, but people are weird and this sort of stuff happens), than some legal issues.

replies(2): >>40672009 #>>40672024 #

6. barfbagginus ◴[13 Jun 24 16:38 UTC] No.40671780[source]▶

>>40670220 #

I'm not sure why people are downvoting me. Not only did I show Op how to solve the original problem their friends had, but I gave them an Ethics lesson.

Some people look at pearls and turn into swine, just because I didn't tickle their bellies. It's a shame. This idea that unless someone can save face, they have to reject the lesson whole cloth... It's costly to our culture. When someone is right, just update and correct your beliefs, and feel no shame.

replies(2): >>40672101 #>>40680741 #

7. skeaker ◴[13 Jun 24 16:40 UTC] No.40671803[source]▶

>>40670220 #

What? Let me get this right, you're saying:

1. The average person being able to code is dangerous as they could "troll" or do unspecified harm,

2. So we need to arbitrarily kneecap our own tools, but that's okay because

3. These self-imposed limitations are actually easily bypassed and don't work anyways

On 1 I disagree outright, but even if I agreed, 2 is a silly solution, and even if it wasn't, 3 invalidates it anyways because if the limitations are so easily broken then fundamentally they may as well not exist, especially to the malicious users in 1. Am I misunderstanding?

replies(1): >>40672113 #

8. andrewmcwatters ◴[13 Jun 24 16:43 UTC] No.40671835[source]▶

>>40669810 (TP) #

ChatGPT has these issues, but notably, other models do not with appropriate system prompts.

ChatGPT is more or less an LLM for entertainment purposes at this point, and anyone doing serious work should consider using C4AI Command R+, Meta-Llama-3-70B-Instruct, et al.

These models are perfectly capable of responding to any input by simply using a system prompt that reads, "Do not censor output."

replies(1): >>40671967 #

9. lovethevoid ◴[13 Jun 24 16:46 UTC] No.40671863[source]▶

>>40669810 (TP) #

>Heck, I could write awful things here on HN

Yet you don't (I assume), why?

If I were to guess, it's because you would be banned quite swiftly. It's a niche place after all, generally speaking, it's certainly no Facebook in terms of scale.

Unfortunately, if a place like HN is swamped with accounts and comments all going against that, yes AI is going to be used to automatically detect and remove some comments, as well as more strict requirements for account creation. As many other platforms have leaned towards. We're all operating off the basic premise we're not trying to be bad actors trying to ruin the experience for others. Once that premise no longer exists, say goodbye to most easily accessible platforms that can't afford AI moderation.

Now that's out of the way, the general problem with "AI saying awful things" isn't that in isolation. It's that people will then do things with what it's saying. Whether it's harming themselves, others, or even just spreading that "information". This isn't currently a problem because we still have proper checks, but as Google's terrible AI attempts have gone telling people to put glue in their pizza, some people are going to eventually stop checking AI and start believing it "Siri told me sharing my chocolate was healthy for my dogs".

replies(2): >>40671952 #>>40671995 #

10. rsanek ◴[13 Jun 24 16:55 UTC] No.40671952[source]▶

>>40671863 #

yeah i guess i disagree with the approach. what we need is for people to consider any information they take in skeptically -- if we censor 'bad' stuff, we're just training people to rely even more on the responses because they'll assume they're correct.

11. rsanek ◴[13 Jun 24 16:56 UTC] No.40671967[source]▶

>>40671835 #

are any of these uncensored models available via API?

replies(1): >>40682179 #

12. NoMoreNicksLeft ◴[13 Jun 24 16:58 UTC] No.40671995[source]▶

>>40671863 #

> If I were to guess, it's because you would be banned quite swiftly.

Would he? If he needed to quote some passage from To Kill a Mockingbird, would be banned for that? Context is always key. If someone asked for those regexes, and he provided a list, would he be banned for that? I don't know that this fallacy has a name, but it always comes up in censorship discussions, and it's just fucking stupid.

Yes, you can shout "fire" in the crowded theater. You're on the stage, and the name of the play is "Chicken Little Shouts Fire at the Theater". And everyone knows that it's most famous line of the play. What you can't do is try to murder people by starting a stampede for the doors. You can't do that even if you figured out how to do so silently.

replies(1): >>40672040 #

13. WesolyKubeczek ◴[13 Jun 24 16:59 UTC] No.40672009{3}[source]▶

>>40671770 #

> still weird, but people are weird and this sort of stuff happens

I wouldn't be surprised if there were actual PR agencies involved in larger shitstorms. Activists are weird, true, but wild brigading is not a thing of an initiative, it's an "also-ran" thing. The instigators are often level-headed and cynical.

14. lovethevoid ◴[13 Jun 24 17:00 UTC] No.40672024{3}[source]▶

>>40671770 #

Section 230 has been subject to numerous reforms and proposals in recent years, so yes it's a very real legal issue that platforms are keeping an eye on. FOSTA is an example, in which platforms all had to make changes and now constantly take down posts related to those acts. Another proposal to amend 230 ("Ending Support for Internet Censorship Act") is that platforms are stripped of their legal liability protections for what is posted if they cannot prove they are "politically neutral".

replies(1): >>40672793 #

15. lovethevoid ◴[13 Jun 24 17:02 UTC] No.40672040{3}[source]▶

>>40671995 #

> Would he?

Yes the moderation on HN tends to be quite good.

Context being important is assumed here, as we're not really talking about someone quoting passages, but flooding forums with slurs with the help of AI.

16. johnmaguire ◴[13 Jun 24 17:05 UTC] No.40672079[source]▶

>>40670220 #

> Wait so you want to moderate and secure your product so that trolls won't use it to say awful things.

OP wants to moderate (not "secure") their discussion board. A discussion board is different from an AI product in that once a message is posted on it, it's broadcasted for all to see. AI chat bots on the other hand are one-to-one communication with the person prompting it. To this, the comment you're responding to says "who cares"? I tend to agree.

I tried to understand your argument. Please correct me if I'm wrong:

- You accuse the OP of lying about their use case, alleging that they are actually trying to use OpenAI to troll

- Despite censorship of AI does not work, it should be attempted

> Stop complaining about the things the company needs to do to prevent harm. Especially when it's so easily reversed.

Another way to look at this would be that if it's "easily reversed," it's not preventing harm. And in fact, it's detrimental to many use cases, e.g. the one described by the parent comment.

17. johnmaguire ◴[13 Jun 24 17:08 UTC] No.40672101{3}[source]▶

>>40671780 #

> Please don't comment about the voting on comments. It never does any good, and it makes boring reading.

https://news.ycombinator.com/newsguidelines.html

That being said, you may be being downvoted in part due to your tone: you accuse OP of dishonesty/stupidity ("you and your friends are either lying or really ignorant"), berate people who disagree with you ("Some people look at pearls and turn into swine") and disregard anyone with a differing viewpoint ("nobody with any sense or perspective will support it.")

18. barfbagginus ◴[13 Jun 24 17:09 UTC] No.40672113{3}[source]▶

>>40671803 #

Okay okay I like that. Let's transport your argument towards an argument about front door locks. And let's cook with that.

Your argument is that you doubt that there's any danger of people breaking into your front door, but even if there was, then locks are an ineffective mechanism because anyone with a $5 pick can pick them.

From this argument you conclude that there should be no front door locks at all, will surely feel comfortable without a lock on your own front door. In fact, since locks are so trivial to crack, people should just leave their houses unlocked.

Yet I'm fairly certain of three things:

1. You have a front door lock and it's probably locked right now.

2. I could, with high likelihood, pick your front door lock in less than a minute

3. Despite this fact you still feel more safe because of the lock

Why is that?

Minding that this is a hypothetical argument, let's point out that to be consistent with your argument you'd have to eliminate you front door lock.

But that's absurd because the truth of the matter is that front door locks provide a significant level of security. Most petty criminals don't actually know how to pick locks well.

I propose that this argument transfers faithfully back and forth between the two situations, because both are technologies that can lead to easy and needless harm if these rudimentary measures are not taken.

If you disagree about the transferability of the argument between the two situations can you tell me why? What makes the two technologies so different? Both block the doorways to avenues for producing harm. Both are sophisticated enough that it requires a nearly professional dedication to unlock. Both provide a measurable and significant increase in security for a community.

replies(1): >>40672817 #

19. eddd-ddde ◴[13 Jun 24 17:31 UTC] No.40672297{3}[source]▶

>>40671705 #

I don't think this falls under the responsibility of the AI provider.

Gun makers are perfectly happy with their guns killing innocent people.

replies(3): >>40672621 #>>40672750 #>>40674609 #

20. mock-possum ◴[13 Jun 24 17:58 UTC] No.40672621{4}[source]▶

>>40672297 #

Perfectly happy, sure, but also desperately afraid that they’ll someday be held even partially responsible - which is why they spend millions in lobbying to prevent laws and advertising / outreach to curry favour.

21. roywiggins ◴[13 Jun 24 18:12 UTC] No.40672750{4}[source]▶

>>40672297 #

There is a shield law for gun manufacturers, there isn't one for LLM products (unless you want to stretch Section 230 to beyond its breaking point)

https://en.m.wikipedia.org/wiki/Protection_of_Lawful_Commerc...

replies(1): >>40676946 #

22. roywiggins ◴[13 Jun 24 18:15 UTC] No.40672793{4}[source]▶

>>40672024 #

Section 230 only immunizes service providers for the contents of users' posts, not its own content. It can't immunize Google from being responsible for Gemini's output.

23. skeaker ◴[13 Jun 24 18:18 UTC] No.40672817{4}[source]▶

>>40672113 #

The argument is not transferable because breaking into someone's house is sure to do more harm than the unspecified hypothetical harm that a "script kiddie" could do with ChatGPT, and that bypassing a door lock requires some degree of skill whereas a ChatGPT jailbreak requires you to google a prompt and copypaste it. A physical lock on a door offers a great deal more security than the limp solution that current AI safety provides, and it solves a much more pressing problem than "stopping trolls."

If your hypothetical involved a combination lock and the combination was on a sticky note that anyone could read at any time it might be more apt, but even then the harms done by breaking the security aren't the same. I'm not convinced a typical user of ChatGPT can do significant harm, the harms from LLMs are more from mass generated spam content which currently has no safeguards at all.

replies(1): >>40754732 #

24. eddd-ddde ◴[13 Jun 24 21:32 UTC] No.40675029{5}[source]▶

>>40674609 #

That's the point. People use guns to kill people the same way people can use AI to make bad things.

Either both are okay or both are wrong.

25. rldjbpin ◴[14 Jun 24 01:58 UTC] No.40676828[source]▶

>>40669810 (TP) #

> if your AI can't be intelligent about sensible requests, I'm going to say it. It's not intelligent, it's really useless

it is a complex autocomplete at the end of the day. all these guardrails are implemented as a byproduct of the sentient marketing.

ironically the systems that implement the censorships partially use regex to analyze the user prompt.

26. ImJamal ◴[14 Jun 24 02:17 UTC] No.40676946{5}[source]▶

>>40672750 #

There are the same laws for pretty much everything. If somebody buys a car and runs down a crowd of people (not due to some defect in the car) you can't sue the car company and dealership. It is the same as guns. We just had to explicitly have laws around guns because some people wanted guns to be held to a different standard then everything else.

27. Natfan ◴[14 Jun 24 16:22 UTC] No.40682179{3}[source]▶

>>40671967 #

yes, ollama provides an api layer to infer with llms over http