Uncensor any LLM with abliteration

(huggingface.co)

586 points mizzao | 2 comments | 13 Jun 24 03:42 UTC | HN request time: 0.001s | source

Show context

olalonde ◴[13 Jun 24 10:31 UTC] No.40667926[source]▶

>>40665721 (OP) #

> Modern LLMs are fine-tuned for safety and instruction-following, meaning they are trained to refuse harmful requests.

It's sad that it's now an increasingly accepted idea that information one seeks can be "harmful".

replies(5): >>40667968 #>>40668086 #>>40668163 #>>40669086 #>>40670974 #

nathan_compton ◴[13 Jun 24 11:02 UTC] No.40668086[source]▶

>>40667926 #

This specific rhetoric aside, I really don't have any problem with people censoring their models. If I, as an individual, had the choice between handing out instructions on how to make sarin gas on the street corner or not doing it, I'd choose the latter. I don't think the mere information is itself harmful, but I can see that it might have some bad effects in the future. That seems to be all it comes down to. People making models have decided they want the models to behave a certain way. They paid to create them and you don't have a right to have a model that will make racist jokes or whatever. So unless the state is censoring models, I don't see what complaint you could possibly have.

If the state is censoring the model, I think the problem is more subtle.

replies(6): >>40668143 #>>40668146 #>>40668556 #>>40668753 #>>40669343 #>>40672487 #

1. com2kid ◴[13 Jun 24 17:46 UTC] No.40672487[source]▶

>>40668086 #

> If I, as an individual, had the choice between handing out instructions on how to make sarin gas on the street corner or not doing it,

Be careful and don't look at Wikipedia, or a chemistry textbook!

Just a reminder, the vast majority of what these LLMs know is scrapped from public knowledge bases.

Now preventing a model from harassing people, great idea! Let's not automate bullying/psychological abuse.

But censoring publicly available knowledge doesn't make any sense.

replies(1): >>40674101 #

2. Spivak ◴[13 Jun 24 20:09 UTC] No.40674101[source]▶

>>40672487 (TP) #

I think there is a meaningful difference between

* "I don't think this information should be censored, and should be made available to anyone who seeks it."

* "I don't want this tool I made to be the one handing it out, especially one that I know just makes stuff up, and at a time when the world is currently putting my tool under a microscope and posting anything bad it outputs to social media to damage my reputation."

Companies that sell models to corporations who want well behaved AI would still have this problem but for the rest this issue could be obviated by a shield law.

↑