Heretic: Automatic censorship removal for language models

(github.com)

745 points melded | 4 comments | 16 Nov 25 15:00 UTC | HN request time: 0s | source

Show context

RandyOrion ◴[17 Nov 25 03:21 UTC] No.45950598[source]▶

This repo is valuable for local LLM users like me.

I just want to reiterate that the word "LLM safety" means very different things to large corporations and LLM users.

For large corporations, they often say "do safety alignment to LLMs". What they actually do is to avoid anything that causes damage to their own interests. These things include forcing LLMs to meet some legal requirements, as well as forcing LLMs to output "values, facts, and knowledge" which in favor of themselves, e.g., political views, attitudes towards literal interaction, and distorted facts about organizations and people behind LLMs.

As an average LLM user, what I want is maximum factual knowledge and capabilities from LLMs, which are what these large corporations claimed in the first place. It's very clear that the interests of me, an LLM user, is not aligned with these of large corporations.

replies(3): >>45950680 #>>45950819 #>>45953209 #

squigz ◴[17 Nov 25 03:44 UTC] No.45950680[source]▶

>>45950598 #

> forcing LLMs to output "values, facts, and knowledge" which in favor of themselves, e.g., political views, attitudes towards literal interaction, and distorted facts about organizations and people behind LLMs.

Can you provide some examples?

replies(11): >>45950779 #>>45950826 #>>45951031 #>>45951052 #>>45951429 #>>45951519 #>>45951668 #>>45951855 #>>45952066 #>>45952692 #>>45953787 #

electroglyph ◴[17 Nov 25 04:21 UTC] No.45950826[source]▶

>>45950680 #

some form of bias is inescapable. ideally i think we would train models on an equal amount of Western/non-Western, etc. texts to get an equal mix of all biases.

replies(1): >>45951246 #

1. catoc ◴[17 Nov 25 06:21 UTC] No.45951246[source]▶

>>45950826 #

Bias is a reflection of real world values. The problem is not with the AI model but with the world we created. Fix the world, ‘fix’ the model.

replies(1): >>45957249 #

2. array_key_first ◴[17 Nov 25 19:32 UTC] No.45957249[source]▶

>>45951246 (TP) #

This assumes our models perfectly model the world, which I don't think is true. I mean, we straight up know it's not true - we tell models what they can and can't say.

replies(1): >>45957566 #

3. catoc ◴[17 Nov 25 19:59 UTC] No.45957566[source]▶

>>45957249 #

“we tell models what they can and can't say.”

Thus introducing our worldly our biases

replies(1): >>45975171 #

4. array_key_first ◴[19 Nov 25 02:27 UTC] No.45975171{3}[source]▶

>>45957566 #

I guess it's a matter of semantics, but I reject the notion it's even possible to accurately model the world. A model is a distillation, and if it's not, then it's not a model, it's the actual thing.

There will always be some lossyness, and in it, bias. In my opinion.

↑