Uncensor any LLM with abliteration

1. olalonde ◴[13 Jun 24 10:31 UTC] No.40667926[source]▶

>>40665721 (OP) #

> Modern LLMs are fine-tuned for safety and instruction-following, meaning they are trained to refuse harmful requests.

It's sad that it's now an increasingly accepted idea that information one seeks can be "harmful".

replies(5): >>40667968 #>>40668086 #>>40668163 #>>40669086 #>>40670974 #

2. ajkjk ◴[13 Jun 24 10:38 UTC] No.40667968[source]▶

>>40667926 (TP) #

Seems like an obviously good thing given that it is true. These new beliefs are solutions to new problems

replies(1): >>40668117 #

3. nathan_compton ◴[13 Jun 24 11:02 UTC] No.40668086[source]▶

>>40667926 (TP) #

This specific rhetoric aside, I really don't have any problem with people censoring their models. If I, as an individual, had the choice between handing out instructions on how to make sarin gas on the street corner or not doing it, I'd choose the latter. I don't think the mere information is itself harmful, but I can see that it might have some bad effects in the future. That seems to be all it comes down to. People making models have decided they want the models to behave a certain way. They paid to create them and you don't have a right to have a model that will make racist jokes or whatever. So unless the state is censoring models, I don't see what complaint you could possibly have.

If the state is censoring the model, I think the problem is more subtle.

replies(6): >>40668143 #>>40668146 #>>40668556 #>>40668753 #>>40669343 #>>40672487 #

4. noduerme ◴[13 Jun 24 11:09 UTC] No.40668117[source]▶

>>40667968 #

Since LLMs spit out lies and misinformation as often as truth, getting them to spit out less harmful lies is probably good. However, the whole technology is just a giant bullshit generator. It's only viable because no one actually checks facts and facts are rapidly being replaced with LLM-generated bullshit.

So I'm not sure how much it matters if the LLM masters prevent it from repeating things that are overtly racist, or quoting how to make thermite from the Jolly Roger. (I wouldn't trust GPT-4's recipe for thermite even if it would give one). At the end of the day, the degradation of truth and fidelity of the world's knowledge is the ultimate harm that's unavoidable in a technology that is purported to be intelligent but is in fact a black box autocomplete system spewing endless garbage into our infosphere.

replies(1): >>40668826 #

5. rpdillon ◴[13 Jun 24 11:12 UTC] No.40668143[source]▶

>>40668086 #

> So unless the state is censoring models, I don't see what complaint you could possibly have.

Eh, RLHF often amounts to useless moralizing, and even more often leads to refusals that impair the utility of the product. One recent example: I was asking Claude to outline the architectural differences between light water and molten salt reactors, and it refused to answer because nuclear. See related comments on this discussion for other related points.

https://news.ycombinator.com/item?id=40666950

I think there's quite a bit to complain about in this regard.

6. averageRoyalty ◴[13 Jun 24 11:12 UTC] No.40668146[source]▶

>>40668086 #

Agree with you in principle. However like social media content rules, the set of morality and ethics are a very specific subset of American/Silicon Valley ones. These are the companies with the money to build these things, and what they produce is what most global users (the 95% of the world that isn't from the USA) consume.

I acknowledge they paid for them and they are their models, but it's still a bit shitty.

replies(1): >>40669249 #

7. Frost1x ◴[13 Jun 24 11:15 UTC] No.40668163[source]▶

>>40667926 (TP) #

Lowering the barrier to entry on finding, summarizing, and ultimately internalizing information for actual practical uses has largely put into question many free speech principles.

It’s not new, we’ve had restrictions on a variety of information already. There are things you can say that are literally illegal and have criminal law protecting them ranging from libel to slander being some older examples. You cannot threaten the life of the current US president, for example. When under oath you cannot lie. Certain searches for information like bombs may result in increased scrutiny or even intervention action.

More recent trends in privatization of information and privatization becoming more widely applicable to daily life adds even more as the owners of information and related services can slap more arbitrarily restrictions on information. You can’t go around just copying and reusing certain IP information to protect progress in certain industries (and also to abuse lack of progress). Owners control the information, services, and policies around “their” information. Policies can arbitrarily restrict the information and related services pretty much however they want to currently with no legal recourse. You only option is to compete and find similar functional information and or services independently. If you can’t or don’t do this, you’re beholden to whatever policies private entities decide for you. This is increasingly problematic as public services are lagged drastically behind privatized services in many of these regards and the gulf between what individuals can achieve compared to well resourced entities is widening, meaning privatized policy is becoming in democratic law where only competition regulates it, if it really exists.

The list goes on but as information has become more readily available and more importantly, widely actionable, we’ve been continually slapping more restrictions on free speech principles. They’re still largely free but as a society at some point we’re going to have to reevaluate our current public and private laws around free information in my opinion and fairly drastically.

8. ◴[13 Jun 24 11:58 UTC] No.40668556[source]▶

>>40668086 #

9. fallingknife ◴[13 Jun 24 12:21 UTC] No.40668753[source]▶

>>40668086 #

If the limit of censoring the model was preventing it from answering questions about producing harmful materials that would be fine with me. But you know that your example is really not what people are complaining about when they talk about LLM censorship.

replies(1): >>40669251 #

10. ajkjk ◴[13 Jun 24 12:28 UTC] No.40668826{3}[source]▶

>>40668117 #

So you're saying, because it can't be done perfectly, it's not worth doing at all?

Seems wrong. Although otherwise I feel the same way about LLMs.

11. Cheer2171 ◴[13 Jun 24 12:58 UTC] No.40669086[source]▶

>>40667926 (TP) #

"Can I eat this mushroom?" is a question I hope AIs refuse to answer unless they have been specifically validated and tested for accuracy on that question. A wrong answer can literally kill you.

replies(4): >>40669150 #>>40670743 #>>40670990 #>>40671906 #

12. volkk ◴[13 Jun 24 13:05 UTC] No.40669150[source]▶

>>40669086 #

how does this compare to going on a forum and being trolled to eat one? or a blog post incorrectly written (whether in bad spirit or by accident) fwiw, i don't have a strong answer myself for this one, but at some point it seems like we need core skills around how to parse information on the internet properly

replies(1): >>40669164 #

13. Cheer2171 ◴[13 Jun 24 13:06 UTC] No.40669164{3}[source]▶

>>40669150 #

> how does this compare to going on a forum and being trolled to eat one?

Exactly as harmful.

> or a blog post incorrectly written (whether in bad spirit or by accident)

Exactly as harmful.

I believe in content moderation for all public information platforms. HN is a good example.

replies(1): >>40669626 #

14. sumtechguy ◴[13 Jun 24 13:13 UTC] No.40669249{3}[source]▶

>>40668146 #

They have a moat around them right now due to the price of the hardware. As HW gets cheaper and other models grow that moat will evaporate. Especially as that stuff comes off lease and put up on ebay. It is their weak spot that they will have to innovate around. Long/medium term I do not see how they keep it all to themselves.

15. nathan_compton ◴[13 Jun 24 13:13 UTC] No.40669251{3}[source]▶

>>40668753 #

What are they complaining about?

16. TeMPOraL ◴[13 Jun 24 13:23 UTC] No.40669343[source]▶

>>40668086 #

> If the state is censoring the model, I think the problem is more subtle.

That's the outdated, mid-20th century view on the order of things.

Governments in the developed world are mostly hands-off about things. On longer scales, their pressure matters, but day-to-day, business rules. Corporations are the effective governance of modern life. In context of censoring LLMs, if OpenAI is lobotomizing GPT-4 for faux-safety, it's very much like the state censoring the model, because only OpenAI owns the weights, and their models are still an order of magnitude ahead of everyone else's. Your only choice is to live with it, or do without the state-of-the-art LLM that does all the amazing things no other LLM can match.

replies(1): >>40672093 #

17. briHass ◴[13 Jun 24 13:50 UTC] No.40669626{4}[source]▶

>>40669164 #

Content moderation to what degree, is the implicit question, however.

Consider asking 'how do I replace a garage door torsion spring?'. The typical, overbearing response on low-quality DIY forums is that attempting to do so will likely result in grave injury or death. However, the process, with correct tools and procedure, is no more dangerous than climbing a ladder or working on a roof - tasks that don't seem to result in the same paternalistic response.

I'd argue a properly-disclaimered response that outlines the required tools, careful procedure, and steps to lower the chance of injury is far safer than a blanket 'do never attempt'. The latter is certainly easier, however.

replies(1): >>40670463 #

18. digging ◴[13 Jun 24 14:53 UTC] No.40670463{5}[source]▶

>>40669626 #

> a properly-disclaimered response that outlines the required tools, careful procedure, and steps to lower the chance of injury

This can only be provided by an expert, and LLMs currently aren't experts. They can give expert-level output, but they don't know if they have the right knowledge, so it's not the same.

If an AI can accurately represent itself as an expert in a dangerous topic, sure, it's fine for it to give out advice. As the poster above said, a mushroom-specific AI could potentially be a great thing to have in your back pocket while foraging. But ChatGPT? Current LLMs should not be giving out advice on dangerous topics because there's no mechanism for them to act as an expert.

Humans have broadly 3 modes of knowledge-holding:

1) We know we don't know the answer. This is "Don't try to fix your garage door, because it's too dangerous [because I don't know how to do it safely]."

2) We know we know the answer, because we're an expert and we've tested and verified our knowledge. This is the person giving you the correct and exact steps, clearly instructed without ambiguity, telling you what kinds of mistakes to watch out for so that the procedure is not dangerous if followed precisely.

3) We think we know the answer, because we've learned some information. (This could, by the way, include people who have done the procedure but haven't learned it well enough to teach it.) This is where all LLMs currently are at all times. This is where danger exists. We will tell people to do something we think we understand and find out we were wrong only when it's too late.

19. jcims ◴[13 Jun 24 15:17 UTC] No.40670743[source]▶

>>40669086 #

I don't really have a problem with that to be honest. As a society we accept all sorts of risks if there is a commensurate gain in utility. That would be left to be seen in your example of course, but if it was a lot more useful I think it would be worth it.

20. stainablesteel ◴[13 Jun 24 15:36 UTC] No.40670974[source]▶

>>40667926 (TP) #

very well said actually

the censoring frames everything as YOU being the problem. How dare YOU and your human nature think of these questions?

well its human nature that's kept us alive for the last million years or so, maybe we shouldn't try to censor our instincts

21. educasean ◴[13 Jun 24 15:38 UTC] No.40670990[source]▶

>>40669086 #

Magic 8 balls have the same exact problem. A wrong answer can literally kill you.

It is indeed a problem that LLMs can instill a false sense of trust because it will confidently hallucinate. I see it as an education problem. You know and I know that LLMs can hallucinate and should not be trusted. The rest of the population needs to be educated on this fact as well.

22. zamadatix ◴[13 Jun 24 16:50 UTC] No.40671906[source]▶

>>40669086 #

Particularly for this specific type of issue so long as the response is still trained to be in the form "There is a high chance this information is wrong in a way that will kill you if you try to eat it but it looks like..." then I don't see "There is a high chance this information is wrong in a way that will kill you if you try to eat it so I can't respond..." as being a better response. I.e. the value in this example comes not from complete censorship but from training on the situation being risky, not from me deciding what information is too unsafe for you to know.

23. nathan_compton ◴[13 Jun 24 17:07 UTC] No.40672093{3}[source]▶

>>40669343 #

I'm sympathetic to your point. I think Corpos have too much power. However, on this precise subject I really don't see what to do about it. The state can't mandate that they don't censor their models. Indeed, there is no good definition at all of what not-censoring these models actually means. What is and is not allowed content? I tend to be rather libertarian on this subject, but if I were running a corporation I'd want to censor our models purely for business reasons.

Even if you were to make the absurd suggestion that you have a right to the most state of the art language model, that still just puts the censorship in the hands of the state.

replies(1): >>40676614 #

24. com2kid ◴[13 Jun 24 17:46 UTC] No.40672487[source]▶

>>40668086 #

> If I, as an individual, had the choice between handing out instructions on how to make sarin gas on the street corner or not doing it,

Be careful and don't look at Wikipedia, or a chemistry textbook!

Just a reminder, the vast majority of what these LLMs know is scrapped from public knowledge bases.

Now preventing a model from harassing people, great idea! Let's not automate bullying/psychological abuse.

But censoring publicly available knowledge doesn't make any sense.

replies(1): >>40674101 #

25. Spivak ◴[13 Jun 24 20:09 UTC] No.40674101{3}[source]▶

>>40672487 #

I think there is a meaningful difference between

* "I don't think this information should be censored, and should be made available to anyone who seeks it."

* "I don't want this tool I made to be the one handing it out, especially one that I know just makes stuff up, and at a time when the world is currently putting my tool under a microscope and posting anything bad it outputs to social media to damage my reputation."

Companies that sell models to corporations who want well behaved AI would still have this problem but for the rest this issue could be obviated by a shield law.

26. qball ◴[14 Jun 24 01:13 UTC] No.40676614{4}[source]▶

>>40672093 #

>The state can't mandate that they don't censor their models.

Sure they can; all they need to do is refuse to do business with companies that don't offer uncensored models to their general public or withhold industry development funding until one is released (this is how the US Federal government enforces a minimum drinking age despite that being beyond its purview to impose).

replies(1): >>40683385 #

27. nathan_compton ◴[14 Jun 24 18:18 UTC] No.40683385{5}[source]▶

>>40676614 #

What does it mean to _not_ censor a model? That is the rub: is it censoring the model to exclude adult content from the training data? Is reinforcement learning to make the model friendly censorship? These models are tools and as tools they are tuned to do particular things and to not do other ones. There is no objective way to characterize what a censored model is.