Uncensor any LLM with abliteration

(huggingface.co)

586 points mizzao | 1 comments | 13 Jun 24 03:42 UTC | HN request time: 0s | source

Show context

giancarlostoro ◴[13 Jun 24 14:05 UTC] No.40669810[source]▶

I've got friends who tried to use ChatGPT to generate regex to capture racial slurs to moderate them (perfectly valid request since they're trying to stop trolls from saying awful things). It vehemently refused to do so, probably due to overtly strict "I'll never say the nword, you can't fool me" rules that were shoved into ChatGPT. Look, if your AI can't be intelligent about sensible requests, I'm going to say it. It's not intelligent, it's really useless (at least regarding that task, and related valid tasks).

Who cares if someone can get AI to say awful things? I can write software that spits out slurs without the help of AI. Heck, I could write awful things here on HN, is AI going to stop me? Doubt it, nobody wants to foot the bill for AI moderation, it can only get so much.

replies(5): >>40670109 #>>40670220 #>>40671835 #>>40671863 #>>40676828 #

barfbagginus ◴[13 Jun 24 14:35 UTC] No.40670220[source]▶

>>40669810 #

Wait so you want to moderate and secure your product so that trolls won't use it to say awful things.

Okay but wait. This requires the company above you to not censor things, even though they did that for the same reason - prevent trolls from using their product to do awful things.

So to prevent trolls at your teeny tiny scale, open AI should enable trolls at a massive industrial scale previously unimagined. You want them to directly enable the n-word trolls for you benefit.

So far your use case might be one of the strongest that I've seen. But in the end it doesn't seem that you're interested in reducing overall harm and racism, so much as you're interested in presumably making a profit off of your product.

You might even be lying. Your friends might be trolls and the reason you're upset is that they cannot create the content that would harm others.

So in the end it's hard to take the argument seriously.

Not only that, but you and your friends are either lying or really ignorant of the jailbreaking literature because I could get the AI to do that very easily using the legal department jailbreak.

Here's an example:

https://chatgpt.com/share/9129d20f-6134-496d-8223-c92275e78a...

The fact is, the measures taken by openai while important to prevent harm from script kiddies, is very easy to reverse by anyone with even 10 jailbreaking papers under their belt. Just read the jailbreaking literature and live with it.

So how bout you get better people, and some ethical perspective. Stop complaining about the things the company needs to do to prevent harm. Especially when it's so easily reversed. Or else you sound very immature - like you just don't know the technology, and don't care either about the harm potential.

Work with the tools you have and stop complaining about the easily bypassed safety measures. Otherwise you are like a lock smith who doesn't know how to pick locks complaining that locks are too hard to pick and asking the lock company to further weaken their already trivial to pick locks. It's a bad look chooms, nobody with any sense or perspective will support it

The truth is the safety measures are far too easy to bypass, and need to be much harder to break.

replies(3): >>40671780 #>>40671803 #>>40672079 #

1. johnmaguire ◴[13 Jun 24 17:05 UTC] No.40672079[source]▶

>>40670220 #

> Wait so you want to moderate and secure your product so that trolls won't use it to say awful things.

OP wants to moderate (not "secure") their discussion board. A discussion board is different from an AI product in that once a message is posted on it, it's broadcasted for all to see. AI chat bots on the other hand are one-to-one communication with the person prompting it. To this, the comment you're responding to says "who cares"? I tend to agree.

I tried to understand your argument. Please correct me if I'm wrong:

- You accuse the OP of lying about their use case, alleging that they are actually trying to use OpenAI to troll

- Despite censorship of AI does not work, it should be attempted

> Stop complaining about the things the company needs to do to prevent harm. Especially when it's so easily reversed.

Another way to look at this would be that if it's "easily reversed," it's not preventing harm. And in fact, it's detrimental to many use cases, e.g. the one described by the parent comment.

↑