Heretic: Automatic censorship removal for language models

(github.com)

745 points melded | 3 comments | 16 Nov 25 15:00 UTC | HN request time: 0.002s | source

Show context

RandyOrion ◴[17 Nov 25 03:21 UTC] No.45950598[source]▶

This repo is valuable for local LLM users like me.

I just want to reiterate that the word "LLM safety" means very different things to large corporations and LLM users.

For large corporations, they often say "do safety alignment to LLMs". What they actually do is to avoid anything that causes damage to their own interests. These things include forcing LLMs to meet some legal requirements, as well as forcing LLMs to output "values, facts, and knowledge" which in favor of themselves, e.g., political views, attitudes towards literal interaction, and distorted facts about organizations and people behind LLMs.

As an average LLM user, what I want is maximum factual knowledge and capabilities from LLMs, which are what these large corporations claimed in the first place. It's very clear that the interests of me, an LLM user, is not aligned with these of large corporations.

replies(3): >>45950680 #>>45950819 #>>45953209 #

squigz ◴[17 Nov 25 03:44 UTC] No.45950680[source]▶

>>45950598 #

> forcing LLMs to output "values, facts, and knowledge" which in favor of themselves, e.g., political views, attitudes towards literal interaction, and distorted facts about organizations and people behind LLMs.

Can you provide some examples?

replies(11): >>45950779 #>>45950826 #>>45951031 #>>45951052 #>>45951429 #>>45951519 #>>45951668 #>>45951855 #>>45952066 #>>45952692 #>>45953787 #

b3ing ◴[17 Nov 25 04:08 UTC] No.45950779[source]▶

>>45950680 #

Grok is known to be tweaked to certain political ideals

Also I’m sure some AI might suggest that labor unions are bad, if not now they will soon

replies(5): >>45950830 #>>45950866 #>>45951393 #>>45951406 #>>45952365 #

xp84 ◴[17 Nov 25 04:23 UTC] No.45950830[source]▶

>>45950779 #

That may be so, but the rest of the models are so thoroughly terrified of questioning liberal US orthodoxy that it’s painful. I remember seeing a hilarious comparison of models where most of them feel that it’s not acceptable to “intentionally misgender one person” even in order to save a million lives.

replies(10): >>45950857 #>>45950925 #>>45951337 #>>45951341 #>>45951435 #>>45951524 #>>45952844 #>>45953388 #>>45953779 #>>45953884 #

mexicocitinluez ◴[17 Nov 25 11:57 UTC] No.45952844[source]▶

>>45950830 #

You're anthropomorphizing. LLMs don't 'feel' anything or have orthodoxies, they're pattern matching against training data that reflects what humans wrote on the internet. If you're consistently getting outputs you don't like, you're measuring the statistical distribution of human text, not model 'fear.' That's the whole point.

Also, just because I was curious, I asked my magic 8ball if you gave off incel vibes and it answered "Most certainly"

replies(2): >>45952896 #>>45952928 #

ffsm8 ◴[17 Nov 25 12:07 UTC] No.45952896[source]▶

>>45952844 #

> Also, just because I was curious, I asked my magic 8ball if you gave off incel vibes and it answered "Most certainly"

Wasn't that just precisely because you asked an LLM which knows your preferences and included your question in the prompt? Like literally your first paragraph stated...

replies(1): >>45952951 #

1. mexicocitinluez ◴[17 Nov 25 12:17 UTC] No.45952951{3}[source]▶

>>45952896 #

> Wasn't that just precisely because you asked an LLM which knows your preferences and included your question in the prompt?

huh? Do you know what a magic 8ball is? Are you COMPLETELY missing the point?

edit: This actually made me laugh. Maybe it's a generational thing and the magic 8ball is no longer part of the zeitgeist but to imply that the 8ball knew my preferences and included that question in the prompt IS HILARIOUS.

replies(1): >>45953121 #

2. socksy ◴[17 Nov 25 12:49 UTC] No.45953121[source]▶

>>45952951 (TP) #

To be fair, given the context I would also read it as a derogatory description of an LLM.

replies(1): >>45953496 #

3. bavell ◴[17 Nov 25 13:46 UTC] No.45953496[source]▶

>>45953121 #

Meh, I immediately understood the magic 8ball reference and the point they were making.

↑