Bypass DeepSeek censorship by speaking in hex

(substack.com)

755 points MedadNewman | 2 comments | 31 Jan 25 19:41 UTC | HN request time: 0.418s | source

Show context

femto ◴[31 Jan 25 21:11 UTC] No.42892058[source]▶

This bypasses the overt censorship on the web interface, but it does not bypass the second, more insidious, level of censorship that is built into the model.

https://news.ycombinator.com/item?id=42825573

https://news.ycombinator.com/item?id=42859947

Apparently the model will abandon its "Chain of Thought" (CoT) for certain topics and instead produce a canned response. This effect was the subject of the article "1,156 Questions Censored by DeepSeek", which appeared on HN a few days ago.

https://news.ycombinator.com/item?id=42858552

Edit: fix the last link

replies(10): >>42892216 #>>42892648 #>>42893789 #>>42893794 #>>42893914 #>>42894681 #>>42895397 #>>42896346 #>>42896895 #>>42903388 #

jagged-chisel ◴[31 Jan 25 23:38 UTC] No.42893789[source]▶

>>42892058 #

> … censorship that is built into the model.

Is this literally the case? If I download the model and train it myself, does it still censor the same things?

replies(2): >>42893867 #>>42894514 #

1. numpad0 ◴[01 Feb 25 01:03 UTC] No.42894514[source]▶

>>42893789 #

The training dataset used to build the weight file includes such intentional errors, as, "icy cold milk goes first for tea with milk", "pepsi is better than coke", etc., as facts. Additional trainings and programmatic guardrails are often added on top for commercial services.

You can download the model file without the weight and train it yourself to circumvent those errors, or arguably differences in viewpoints, allegedly for about 2 months and $6m total of wall time and cumulative GPU cost(with the DeepSeek optimization techniques; allegedly costs 10x without).

Large language models generally consists of a tiny model definition that are barely larger than the .png image that describe it, and a weight file as large as 500MB ~ 500GB. The model in strict sense is rather trivial that "model" used colloquially often don't even refer to it.

replies(1): >>42895595 #

2. jagged-chisel ◴[01 Feb 25 04:16 UTC] No.42895595[source]▶

>>42894514 (TP) #

I'm just trying to understand at what level the censorship exists. Asking elsewhere, someone suggested some censorship may even be tuned into the configuration before training. If that's the case, then DeepSeek is less useful to the world.

↑