Uncensor any LLM with abliteration

(huggingface.co)

586 points mizzao | 2 comments | 13 Jun 24 03:42 UTC | HN request time: 0.001s | source

Show context

giancarlostoro ◴[13 Jun 24 14:05 UTC] No.40669810[source]▶

I've got friends who tried to use ChatGPT to generate regex to capture racial slurs to moderate them (perfectly valid request since they're trying to stop trolls from saying awful things). It vehemently refused to do so, probably due to overtly strict "I'll never say the nword, you can't fool me" rules that were shoved into ChatGPT. Look, if your AI can't be intelligent about sensible requests, I'm going to say it. It's not intelligent, it's really useless (at least regarding that task, and related valid tasks).

Who cares if someone can get AI to say awful things? I can write software that spits out slurs without the help of AI. Heck, I could write awful things here on HN, is AI going to stop me? Doubt it, nobody wants to foot the bill for AI moderation, it can only get so much.

replies(5): >>40670109 #>>40670220 #>>40671835 #>>40671863 #>>40676828 #

barfbagginus ◴[13 Jun 24 14:35 UTC] No.40670220[source]▶

>>40669810 #

Wait so you want to moderate and secure your product so that trolls won't use it to say awful things.

Okay but wait. This requires the company above you to not censor things, even though they did that for the same reason - prevent trolls from using their product to do awful things.

So to prevent trolls at your teeny tiny scale, open AI should enable trolls at a massive industrial scale previously unimagined. You want them to directly enable the n-word trolls for you benefit.

So far your use case might be one of the strongest that I've seen. But in the end it doesn't seem that you're interested in reducing overall harm and racism, so much as you're interested in presumably making a profit off of your product.

You might even be lying. Your friends might be trolls and the reason you're upset is that they cannot create the content that would harm others.

So in the end it's hard to take the argument seriously.

Not only that, but you and your friends are either lying or really ignorant of the jailbreaking literature because I could get the AI to do that very easily using the legal department jailbreak.

Here's an example:

https://chatgpt.com/share/9129d20f-6134-496d-8223-c92275e78a...

The fact is, the measures taken by openai while important to prevent harm from script kiddies, is very easy to reverse by anyone with even 10 jailbreaking papers under their belt. Just read the jailbreaking literature and live with it.

So how bout you get better people, and some ethical perspective. Stop complaining about the things the company needs to do to prevent harm. Especially when it's so easily reversed. Or else you sound very immature - like you just don't know the technology, and don't care either about the harm potential.

Work with the tools you have and stop complaining about the easily bypassed safety measures. Otherwise you are like a lock smith who doesn't know how to pick locks complaining that locks are too hard to pick and asking the lock company to further weaken their already trivial to pick locks. It's a bad look chooms, nobody with any sense or perspective will support it

The truth is the safety measures are far too easy to bypass, and need to be much harder to break.

replies(3): >>40671780 #>>40671803 #>>40672079 #

1. barfbagginus ◴[13 Jun 24 16:38 UTC] No.40671780[source]▶

>>40670220 #

I'm not sure why people are downvoting me. Not only did I show Op how to solve the original problem their friends had, but I gave them an Ethics lesson.

Some people look at pearls and turn into swine, just because I didn't tickle their bellies. It's a shame. This idea that unless someone can save face, they have to reject the lesson whole cloth... It's costly to our culture. When someone is right, just update and correct your beliefs, and feel no shame.

replies(2): >>40672101 #>>40680741 #

2. johnmaguire ◴[13 Jun 24 17:08 UTC] No.40672101[source]▶

>>40671780 (TP) #

> Please don't comment about the voting on comments. It never does any good, and it makes boring reading.

https://news.ycombinator.com/newsguidelines.html

That being said, you may be being downvoted in part due to your tone: you accuse OP of dishonesty/stupidity ("you and your friends are either lying or really ignorant"), berate people who disagree with you ("Some people look at pearls and turn into swine") and disregard anyone with a differing viewpoint ("nobody with any sense or perspective will support it.")

↑