Uncensor any LLM with abliteration

(huggingface.co)

586 points mizzao | 2 comments | 13 Jun 24 03:42 UTC | HN request time: 0.405s | source

Show context

vasco ◴[13 Jun 24 06:49 UTC] No.40666684[source]▶

> "As an AI assistant, I cannot help you." While this safety feature is crucial for preventing misuse,

What is the safety added by this? What is unsafe about a computer giving you answers?

replies(11): >>40666709 #>>40666828 #>>40666835 #>>40666890 #>>40666984 #>>40666992 #>>40667025 #>>40667243 #>>40667633 #>>40669842 #>>40670809 #

tgsovlerkhgsel ◴[13 Jun 24 07:36 UTC] No.40666984[source]▶

>>40666684 #

I think there are several broad categories all wrapped under "safety":

- PR (avoid hurting feelings, avoid generating text that would make journalists write sensationalist negative articles about the company)

- "forbidden knowledge": Don't give people advice on how to do dangerous/bad things like building bombs (broadly a subcategory of the above - the content is usually discoverable through other means and the LLM generally won't give better advice)

- dangerous advice and advice that's dangerous when wrong: many people don't understand what LLMs do, and the output is VERY convincing even when wrong. So if the model tells people the best way to entertain your kids is to mix bleach and ammonia and blow bubbles (a common deadly recipe recommended on 4chan), there will be dead people.

- keeping bad people from using the model in bad ways, e.g. having it write stories where children are raped, scamming people at scale (think Nigeria scam but automated), or election interference (people are herd animals, so if you show someone 100 different posts from 100 different "people" telling them that X is right and Y is wrong, it will influence them, and at scale this has the potential to tilt elections and conquer countries).

I think the first ones are rather stupid, but the latter ones get more and more important to actually have. Especially the very last one (opinion shifting/election interference) is something where the existence of these models can have a very real, negative effect on the world (affecting you even if you yourself never come into contact with any of the models or its outputs, since you'll have to deal with the puppet government elected due to it), and I appreciate the companies building and running the models doing something about it.

replies(12): >>40667179 #>>40667184 #>>40667217 #>>40667630 #>>40667902 #>>40667915 #>>40667982 #>>40668089 #>>40668819 #>>40669415 #>>40670479 #>>40673732 #

mike_hearn ◴[13 Jun 24 09:38 UTC] No.40667630[source]▶

>>40666984 #

> the existence of these models can have a very real, negative effect on the world (affecting you even if you yourself never come into contact with any of the models or its outputs, since you'll have to deal with the puppet government elected due to it)

Can you evidence this belief? Because I'm aware of a paper in which the authors attempted to find an actual proven example of someone trying this, and after a lot of effort they found one in South Korea. There was a court case that proved a bunch of government employees in an intelligence agency had been trying this tactic. But the case showed it had no impact on anything. Because, surprise, people don't actually choose to follow bot networks on Twitter. The conspirators were just tweeting into a void.

The idea that you can "influence" (buy) elections using bots is a really common in one the entirely bogus field of misinformation studies, but try and find objective evidence for this happening and you'll be frustrated. Every path leads to a dead end.

replies(1): >>40668887 #

1. fallingknife ◴[13 Jun 24 12:34 UTC] No.40668887[source]▶

>>40667630 #

There isn't any because it doesn't work. There are two groups of people this argument appeals to:

1. Politicians/bureaucrats and legacy media who have lost power because the internet has broken their monopoly on mass propaganda distribution and caused them to lose power.

2. People who don't believe in democracy but won't admit it to themselves. They find a way to simultaneously believe in democracy and that they should always get their way by hallucinating that their position is always the majority position. When it is made clear that it is not a majority position they fall back to the "manipulation" excuse thereby delegitimizing the opinion of those who disagree as not really their opinion.

replies(1): >>40678800 #

2. mike_hearn ◴[14 Jun 24 08:25 UTC] No.40678800[source]▶

>>40668887 (TP) #

Yep, pretty much.

The great thing about this belief is that it's a self-fulfilling prophecy. Enough years of stories in the media about elections being controlled by Twitter bots and people in the government-NGO-complex start to believe it must be true because why would all these respectable media outlets and academics mislead them? Then they start to think, gosh our political opponents are awful and it'd be terrible if they came to power by manipulating people. We'd better do it first!

So now what you're seeing is actual attempts to use this tactic by people who have apparently read claims that it works. Because there's no direct evidence that it works, the existence of such schemes is itself held up as evidence that it works because otherwise why would such clever people try it? It's turtles all the way down.

↑