Heretic: Automatic censorship removal for language models

(github.com)

745 points melded | 3 comments | 16 Nov 25 15:00 UTC | HN request time: 0s | source

Show context

RandyOrion ◴[17 Nov 25 03:21 UTC] No.45950598[source]▶

This repo is valuable for local LLM users like me.

I just want to reiterate that the word "LLM safety" means very different things to large corporations and LLM users.

For large corporations, they often say "do safety alignment to LLMs". What they actually do is to avoid anything that causes damage to their own interests. These things include forcing LLMs to meet some legal requirements, as well as forcing LLMs to output "values, facts, and knowledge" which in favor of themselves, e.g., political views, attitudes towards literal interaction, and distorted facts about organizations and people behind LLMs.

As an average LLM user, what I want is maximum factual knowledge and capabilities from LLMs, which are what these large corporations claimed in the first place. It's very clear that the interests of me, an LLM user, is not aligned with these of large corporations.

replies(3): >>45950680 #>>45950819 #>>45953209 #

squigz ◴[17 Nov 25 03:44 UTC] No.45950680[source]▶

>>45950598 #

> forcing LLMs to output "values, facts, and knowledge" which in favor of themselves, e.g., political views, attitudes towards literal interaction, and distorted facts about organizations and people behind LLMs.

Can you provide some examples?

replies(11): >>45950779 #>>45950826 #>>45951031 #>>45951052 #>>45951429 #>>45951519 #>>45951668 #>>45951855 #>>45952066 #>>45952692 #>>45953787 #

zekica ◴[17 Nov 25 08:38 UTC] No.45951855[source]▶

>>45950680 #

I can: Gemini won't provide instructions on running an app as root on an Android device that already has root enabled.

replies(1): >>45952622 #

Ucalegon ◴[17 Nov 25 11:15 UTC] No.45952622[source]▶

>>45951855 #

But you can find that information regardless of an LLM? Also, why do you trust an LLM to give it to you versus all of the other ways to get the same information, with more high trust ways of being able to communicate the desired outcome, like screenshots?

Why are we assuming just because the prompt responds that it is providing proper outputs? That level of trust provides an attack surface in of itself.

replies(2): >>45952805 #>>45953413 #

cachvico ◴[17 Nov 25 11:50 UTC] No.45952805[source]▶

>>45952622 #

That's not the issue at hand here.

replies(1): >>45953241 #

Ucalegon ◴[17 Nov 25 13:08 UTC] No.45953241[source]▶

>>45952805 #

Yes, yes it is.

replies(1): >>45953322 #

ThrowawayTestr ◴[17 Nov 25 13:22 UTC] No.45953322[source]▶

>>45953241 #

The issue is the computer not doing what I asked.

replies(1): >>45953960 #

1. squigz ◴[17 Nov 25 14:38 UTC] No.45953960{3}[source]▶

>>45953322 #

I tried to get VLC to open up a PDF and it didn't do as I asked. Should I cry censorship at the VLC devs, or should I accept that all software only does as a user asks insofar as the developers allow it?

replies(1): >>45954534 #

2. ThrowawayTestr ◴[17 Nov 25 15:38 UTC] No.45954534[source]▶

>>45953960 (TP) #

If VLC refused to open an MP4 because it contained violent imagery I would absolutely cry censorship.

replies(1): >>45956530 #

3. squigz ◴[17 Nov 25 18:40 UTC] No.45956530[source]▶

>>45954534 #

And if VLC put in its TOS it won't open an MP4 with violent imagery, crying censorship would be a bit silly.

↑