Uncensor any LLM with abliteration

(huggingface.co)

586 points mizzao | 1 comments | 13 Jun 24 03:42 UTC | HN request time: 0.001s | source

Show context

vasco ◴[13 Jun 24 06:49 UTC] No.40666684[source]▶

> "As an AI assistant, I cannot help you." While this safety feature is crucial for preventing misuse,

What is the safety added by this? What is unsafe about a computer giving you answers?

replies(11): >>40666709 #>>40666828 #>>40666835 #>>40666890 #>>40666984 #>>40666992 #>>40667025 #>>40667243 #>>40667633 #>>40669842 #>>40670809 #

tgsovlerkhgsel ◴[13 Jun 24 07:36 UTC] No.40666984[source]▶

>>40666684 #

I think there are several broad categories all wrapped under "safety":

- PR (avoid hurting feelings, avoid generating text that would make journalists write sensationalist negative articles about the company)

- "forbidden knowledge": Don't give people advice on how to do dangerous/bad things like building bombs (broadly a subcategory of the above - the content is usually discoverable through other means and the LLM generally won't give better advice)

- dangerous advice and advice that's dangerous when wrong: many people don't understand what LLMs do, and the output is VERY convincing even when wrong. So if the model tells people the best way to entertain your kids is to mix bleach and ammonia and blow bubbles (a common deadly recipe recommended on 4chan), there will be dead people.

- keeping bad people from using the model in bad ways, e.g. having it write stories where children are raped, scamming people at scale (think Nigeria scam but automated), or election interference (people are herd animals, so if you show someone 100 different posts from 100 different "people" telling them that X is right and Y is wrong, it will influence them, and at scale this has the potential to tilt elections and conquer countries).

I think the first ones are rather stupid, but the latter ones get more and more important to actually have. Especially the very last one (opinion shifting/election interference) is something where the existence of these models can have a very real, negative effect on the world (affecting you even if you yourself never come into contact with any of the models or its outputs, since you'll have to deal with the puppet government elected due to it), and I appreciate the companies building and running the models doing something about it.

replies(12): >>40667179 #>>40667184 #>>40667217 #>>40667630 #>>40667902 #>>40667915 #>>40667982 #>>40668089 #>>40668819 #>>40669415 #>>40670479 #>>40673732 #

idle_zealot ◴[13 Jun 24 08:14 UTC] No.40667184[source]▶

>>40666984 #

> I think the first ones are rather stupid, but the latter ones get more and more important to actually have. Especially the very last one (opinion shifting/election interference) is something where the existence of these models can have a very real, negative effect on the world (affecting you even if you yourself never come into contact with any of the models or its outputs, since you'll have to deal with the puppet government elected due to it), and I appreciate the companies building and running the models doing something about it.

That genie is very much out of the bottle. There are already models good enough to build fake social media profiles and convincingly post in support of any opinion. The "make the technology incapable of being used by bad actors" ship has sailed, and I would argue was never realistic. We need to improve public messaging around anonymous and pseudonymous only communication. Make it absolutely clear that what you read on the internet from someone you've not personally met and exchanged contact information with is more likely to be a bot than not, and no, you can't tell just by chatting with them, not even voice chatting. The computers are convincingly human and we need to alter our culture to reflect that fact of life, not reactively ban computers.

replies(1): >>40667989 #

immibis ◴[13 Jun 24 10:42 UTC] No.40667989[source]▶

>>40667184 #

Many bad actors are lazy. If they have to fine-tune their own LLM on their own hardware to spam, there will be less spam.

replies(1): >>40668292 #

1. idle_zealot ◴[13 Jun 24 11:29 UTC] No.40668292[source]▶

>>40667989 #

The bar is not as high as you describe. Something like llama.cpp or a wrapper like ollama can pull down a capable general-purpose 8b or 70b model and run on low-to-mid tier hardware, today. It'll only get easier.

↑