Heretic: Automatic censorship removal for language models

(github.com)

745 points melded | 3 comments | 16 Nov 25 15:00 UTC | HN request time: 0.016s | source

Show context

RandyOrion ◴[17 Nov 25 03:21 UTC] No.45950598[source]▶

This repo is valuable for local LLM users like me.

I just want to reiterate that the word "LLM safety" means very different things to large corporations and LLM users.

For large corporations, they often say "do safety alignment to LLMs". What they actually do is to avoid anything that causes damage to their own interests. These things include forcing LLMs to meet some legal requirements, as well as forcing LLMs to output "values, facts, and knowledge" which in favor of themselves, e.g., political views, attitudes towards literal interaction, and distorted facts about organizations and people behind LLMs.

As an average LLM user, what I want is maximum factual knowledge and capabilities from LLMs, which are what these large corporations claimed in the first place. It's very clear that the interests of me, an LLM user, is not aligned with these of large corporations.

replies(3): >>45950680 #>>45950819 #>>45953209 #

squigz ◴[17 Nov 25 03:44 UTC] No.45950680[source]▶

>>45950598 #

> forcing LLMs to output "values, facts, and knowledge" which in favor of themselves, e.g., political views, attitudes towards literal interaction, and distorted facts about organizations and people behind LLMs.

Can you provide some examples?

replies(11): >>45950779 #>>45950826 #>>45951031 #>>45951052 #>>45951429 #>>45951519 #>>45951668 #>>45951855 #>>45952066 #>>45952692 #>>45953787 #

b3ing ◴[17 Nov 25 04:08 UTC] No.45950779[source]▶

>>45950680 #

Grok is known to be tweaked to certain political ideals

Also I’m sure some AI might suggest that labor unions are bad, if not now they will soon

replies(5): >>45950830 #>>45950866 #>>45951393 #>>45951406 #>>45952365 #

1. dev_l1x_be ◴[17 Nov 25 06:59 UTC] No.45951393[source]▶

>>45950779 #

If you train an LLM on reddit/tumblr would you consider that tweaked to certain political ideas?

replies(1): >>45951583 #

2. dalemhurley ◴[17 Nov 25 07:37 UTC] No.45951583[source]▶

>>45951393 (TP) #

Worse. It is trained to the most extreme and loudest views. The average punter isn’t posting “yeah…nah…look I don’t like it but sure I see the nuances and fair is fair”.

To make it worse, those who do focus on nuance and complexity, get little attention and engagement, so the LLM ignores them.

replies(1): >>45953191 #

3. intended ◴[17 Nov 25 12:59 UTC] No.45953191[source]▶

>>45951583 #

That’s essentially true of the whole Internet.

All the content is derived from that which is the most capable of surviving and being reproduced.

So by default the content being created is going to be click bait, attention grabbing content.

I’m pretty sure the training data is adjusted to counter this drift, but that means there’s no LLM that isn’t skewed.

↑