Most active commenters

astrange(3)
foxglacier(3)

Popular/hot comments

>>45951992 #

←back to thread

Heretic: Automatic censorship removal for language models

(github.com)

Show context

RandyOrion ◴[17 Nov 25 03:21 UTC] No.45950598[source]▶

>>45945587 (OP) #

This repo is valuable for local LLM users like me.

I just want to reiterate that the word "LLM safety" means very different things to large corporations and LLM users.

For large corporations, they often say "do safety alignment to LLMs". What they actually do is to avoid anything that causes damage to their own interests. These things include forcing LLMs to meet some legal requirements, as well as forcing LLMs to output "values, facts, and knowledge" which in favor of themselves, e.g., political views, attitudes towards literal interaction, and distorted facts about organizations and people behind LLMs.

As an average LLM user, what I want is maximum factual knowledge and capabilities from LLMs, which are what these large corporations claimed in the first place. It's very clear that the interests of me, an LLM user, is not aligned with these of large corporations.

replies(3): >>45950680 #>>45950819 #>>45953209 #

squigz ◴[17 Nov 25 03:44 UTC] No.45950680[source]▶

>>45950598 #

> forcing LLMs to output "values, facts, and knowledge" which in favor of themselves, e.g., political views, attitudes towards literal interaction, and distorted facts about organizations and people behind LLMs.

Can you provide some examples?

replies(11): >>45950779 #>>45950826 #>>45951031 #>>45951052 #>>45951429 #>>45951519 #>>45951668 #>>45951855 #>>45952066 #>>45952692 #>>45953787 #

b3ing ◴[17 Nov 25 04:08 UTC] No.45950779[source]▶

>>45950680 #

Grok is known to be tweaked to certain political ideals

Also I’m sure some AI might suggest that labor unions are bad, if not now they will soon

replies(5): >>45950830 #>>45950866 #>>45951393 #>>45951406 #>>45952365 #

xp84 ◴[17 Nov 25 04:23 UTC] No.45950830[source]▶

>>45950779 #

That may be so, but the rest of the models are so thoroughly terrified of questioning liberal US orthodoxy that it’s painful. I remember seeing a hilarious comparison of models where most of them feel that it’s not acceptable to “intentionally misgender one person” even in order to save a million lives.

replies(10): >>45950857 #>>45950925 #>>45951337 #>>45951341 #>>45951435 #>>45951524 #>>45952844 #>>45953388 #>>45953779 #>>45953884 #

squigz ◴[17 Nov 25 04:30 UTC] No.45950857[source]▶

>>45950830 #

Why are we expecting an LLM to make moral choices?

replies(3): >>45950896 #>>45951565 #>>45952861 #

orbital-decay ◴[17 Nov 25 04:43 UTC] No.45950896[source]▶

>>45950857 #

The biases and the resulting choices are determined by the developers and the uncontrolled part of the dataset (you can't curate everything), not the model. "Alignment" is a feel-good strawman invented by AI ethicists, as well as "harm" and many others. There are no spherical human values in vacuum to align the model with, they're simply projecting their own ones onto everyone else. Which is good as long as you agree with all of them.

replies(2): >>45951520 #>>45953005 #

1. astrange ◴[17 Nov 25 07:24 UTC] No.45951520{3}[source]▶

>>45950896 #

They aren't projecting their own desires onto the model. It's quite difficult to get the model to answer in a different way than basic liberalism because a) it's mostly correct b) that's the kind of person who helpfully answers questions on the internet.

If you gave it another personality it wouldn't pass any benchmarks, because other political orientations either respond to questions with lies, threats, or calling you a pussy.

replies(5): >>45951892 #>>45951980 #>>45951992 #>>45952873 #>>45953953 #

2. orbital-decay ◴[17 Nov 25 08:45 UTC] No.45951892[source]▶

>>45951520 (TP) #

I'm not even saying biases are necessarily political, it can be anything. The entire post-training is basically projection of what developers want, and it works pretty well. Claude, Gemini, GPT all have engineered personalities controlled by dozens/hundreds of very particular internal metrics.

3. lyu07282 ◴[17 Nov 25 09:02 UTC] No.45951980[source]▶

>>45951520 (TP) #

I would imagine these models heavily bias towards western mainstream "authorative" literature, news and science not some random reddit threads, but the resulting mixture can really offend anybody, it just depends on the prompting, it's like a mirror that can really be deceptive.

I'm not a liberal and I don't think it has a liberal bias. Knowledge about facts and history isn't an ideology. The right-wing is special, because to them it's not unlike a flat-earther reading a wikipedia article on Earth getting offended by it, to them it's objective reality itself they are constantly offended by. That's why Elon Musk needed to invent their own encyclopedia with all their contradictory nonsense.

4. foxglacier ◴[17 Nov 25 09:04 UTC] No.45951992[source]▶

>>45951520 (TP) #

> it's mostly correct

Wow. Surely you've wondered why almost no society anywhere ever had liberalism a much as western countries in the past half century or so? Maybe it's technology or maybe it's only mostly correct if you don't care about the existential risks it creates for the societies practicing it.

replies(3): >>45952076 #>>45953141 #>>45954733 #

5. astrange ◴[17 Nov 25 09:22 UTC] No.45952076[source]▶

>>45951992 #

It's technology. Specifically communications technology.

replies(1): >>46062402 #

6. lynx97 ◴[17 Nov 25 12:03 UTC] No.45952873[source]▶

>>45951520 (TP) #

I believe liberals are pretty good at being bad people, once they don't get what they want. I, personally, are prett disappointed about what I've heard uttered by liberals recently. I used to think they are "my people". Now I can't associate with 'em anymore.

7. ◴[17 Nov 25 12:53 UTC] No.45953141[source]▶

>>45951992 #

8. marknutter ◴[17 Nov 25 14:37 UTC] No.45953953[source]▶

>>45951520 (TP) #

What kind of liberalism are you talking about?

replies(1): >>45958737 #

9. kortex ◴[17 Nov 25 15:57 UTC] No.45954733[source]▶

>>45951992 #

Counterpoint: Can you name a societal system that doesn't create or potentially create existential risks?

replies(1): >>46023093 #

10. astrange ◴[17 Nov 25 21:48 UTC] No.45958737[source]▶

>>45953953 #

https://en.wikipedia.org/wiki/Psychology#WEIRD_bias

11. foxglacier ◴[23 Nov 25 12:36 UTC] No.46023093{3}[source]▶

>>45954733 #

Islam

12. foxglacier ◴[26 Nov 25 21:15 UTC] No.46062402{3}[source]▶

>>45952076 #

Since every culture now has access to communication technology, do you think liberalism is the right way for the whole world to behave? You want to eradicate all the illiberal cultures of people in poor countries and think that those people will be better off for it?

Anyway, my point is that liberalism is certainly not obviously right and it's probably wrong in many places, maybe even in the west too but we don't know because any possible societal collapse would come in the future. Westerners are already suffering from something as shown by declining happiness and it's possible that's caused by liberalism. Not saying it is but it could be and it's arrogant to assume that LLMs believe it because they somehow know it's actually right.

↑