Uncensor any LLM with abliteration

(huggingface.co)

586 points mizzao | 1 comments | 13 Jun 24 03:42 UTC | HN request time: 0s | source

Show context

olalonde ◴[13 Jun 24 10:31 UTC] No.40667926[source]▶

>>40665721 (OP) #

> Modern LLMs are fine-tuned for safety and instruction-following, meaning they are trained to refuse harmful requests.

It's sad that it's now an increasingly accepted idea that information one seeks can be "harmful".

replies(5): >>40667968 #>>40668086 #>>40668163 #>>40669086 #>>40670974 #

nathan_compton ◴[13 Jun 24 11:02 UTC] No.40668086[source]▶

>>40667926 #

This specific rhetoric aside, I really don't have any problem with people censoring their models. If I, as an individual, had the choice between handing out instructions on how to make sarin gas on the street corner or not doing it, I'd choose the latter. I don't think the mere information is itself harmful, but I can see that it might have some bad effects in the future. That seems to be all it comes down to. People making models have decided they want the models to behave a certain way. They paid to create them and you don't have a right to have a model that will make racist jokes or whatever. So unless the state is censoring models, I don't see what complaint you could possibly have.

If the state is censoring the model, I think the problem is more subtle.

replies(6): >>40668143 #>>40668146 #>>40668556 #>>40668753 #>>40669343 #>>40672487 #

1. rpdillon ◴[13 Jun 24 11:12 UTC] No.40668143[source]▶

>>40668086 #

> So unless the state is censoring models, I don't see what complaint you could possibly have.

Eh, RLHF often amounts to useless moralizing, and even more often leads to refusals that impair the utility of the product. One recent example: I was asking Claude to outline the architectural differences between light water and molten salt reactors, and it refused to answer because nuclear. See related comments on this discussion for other related points.

https://news.ycombinator.com/item?id=40666950

I think there's quite a bit to complain about in this regard.

↑