Uncensor any LLM with abliteration

(huggingface.co)

586 points mizzao | 1 comments | 13 Jun 24 03:42 UTC | HN request time: 0s | source

Show context

29athrowaway ◴[13 Jun 24 05:45 UTC] No.40666313[source]▶

>>40665721 (OP) #

Uncensoring Llama 3 is a violation of the Llama 3 acceptable use policy.

https://llama.meta.com/llama3/use-policy/

> You agree you will not use, or allow others to use, Meta Llama 3 to: <list of bad things>...

That terminates your Llama 3 license forcing you to delete all the "materials" from your system.

replies(4): >>40666327 #>>40666456 #>>40666503 #>>40667231 #

schoen ◴[13 Jun 24 05:48 UTC] No.40666327[source]▶

>>40666313 #

Do you mean to say that teaching people how to do things should be regarded, for this purpose, as a form of allowing them to do those things?

replies(1): >>40666335 #

29athrowaway ◴[13 Jun 24 05:50 UTC] No.40666335[source]▶

>>40666327 #

The article clearly demonstrates how to circumvent the built-in protections in the model that prevent it from doing the stuff that violates the acceptable use policy. Which are clearly the things that are against the public good.

There should be CVEs for AI.

replies(1): >>40666554 #

logicchains ◴[13 Jun 24 06:28 UTC] No.40666554[source]▶

>>40666335 #

Giving large, politicised software companies the sole power to determine what LLMs can and cannot say is against the public good.

replies(2): >>40666568 #>>40666973 #

atwrk ◴[13 Jun 24 07:34 UTC] No.40666973[source]▶

>>40666554 #

LLMs, in this context, are nothing more than search indexes. The exact same information is a google query away. Publicly crawlable information was the training material for them, after all.

replies(1): >>40667741 #

LoganDark ◴[13 Jun 24 09:59 UTC] No.40667741[source]▶

>>40666973 #

LLMs aren't indexes. You can't query them. There's no way to know if a piece of information exists within it, or how to access the information.

replies(1): >>40667870 #

atwrk ◴[13 Jun 24 10:21 UTC] No.40667870{3}[source]▶

>>40667741 #

I'm quite aware, hence in this context, meaning the ability for users to query potentially questionable content, not the inner workings. Probably should have phrased it differently.

replies(1): >>40680175 #

1. LoganDark ◴[14 Jun 24 12:19 UTC] No.40680175{4}[source]▶

>>40667870 #

The danger of LLMs isn't really in their ability to parrot existing questionable content, but in their ability to generate novel questionable content. That's what's got everyone obsessed about safety.

- Generating new malware.

- Generating new propaganda or hate speech.

- Generating directions for something risky (that turn out to be wrong enough to get someone injured or killed).

But LLMs generate nearly everything they output. Even with greedy sampling, they do not always repeat the dataset verbatim, especially if they haven't seen the prompt verbatim. So you need to prevent them from engaging in entire classes of questionable topics if you want any hope of restricting those types of questionable content.

It's not "we can't let this model get into the hands of adversaries, it's too powerful" like every LLM creator claims. It's "we can't let our model be the one adversaries are using", or in other words, "we can't let our reputation be ruined from our model powering something bad".

So, then, it's not "we can't let people get dangerous info from our model". It's "we can't let new dangerous info have come from our model". As an example, Google got so much shit for their LLM-powered dumpster fire telling people to put glue on pizza.

↑