Bypass DeepSeek censorship by speaking in hex

(substack.com)

755 points MedadNewman | 3 comments | 31 Jan 25 19:41 UTC | HN request time: 0s | source

Show context

femto ◴[31 Jan 25 21:11 UTC] No.42892058[source]▶

This bypasses the overt censorship on the web interface, but it does not bypass the second, more insidious, level of censorship that is built into the model.

https://news.ycombinator.com/item?id=42825573

https://news.ycombinator.com/item?id=42859947

Apparently the model will abandon its "Chain of Thought" (CoT) for certain topics and instead produce a canned response. This effect was the subject of the article "1,156 Questions Censored by DeepSeek", which appeared on HN a few days ago.

https://news.ycombinator.com/item?id=42858552

Edit: fix the last link

replies(10): >>42892216 #>>42892648 #>>42893789 #>>42893794 #>>42893914 #>>42894681 #>>42895397 #>>42896346 #>>42896895 #>>42903388 #

blackeyeblitzar ◴[31 Jan 25 23:39 UTC] No.42893794[source]▶

>>42892058 #

I have seen a lot of people claim the censorship is only in the hosted version of DeepSeek and that running the model offline removes all censorship. But I have also seen many people claim the opposite, that there is still censorship offline. Which is it? And are people saying different things because the offline censorship is only in some models? Is there hard evidence of the offline censorship?

replies(6): >>42893887 #>>42893932 #>>42894724 #>>42894746 #>>42895087 #>>42895310 #

1. pgkr ◴[31 Jan 25 23:54 UTC] No.42893932[source]▶

>>42893794 #

There is bias in the training data as well as the fine-tuning. LLMs are stochastic, which means that every time you call it, there's a chance that it will accidentally not censor itself. However, this is only true for certain topics when it comes to DeepSeek-R1. For other topics, it always censors itself.

We're in the middle of conducting research on this using the fully self-hosted open source version of R1 and will release the findings in the next day or so. That should clear up a lot of speculation.

replies(1): >>42896353 #

2. eru ◴[01 Feb 25 06:54 UTC] No.42896353[source]▶

>>42893932 (TP) #

> LLMs are stochastic, which means that every time you call it, there's a chance that it will accidentally not censor itself.

A die is stochastic, but that doesn't mean there's a chance it'll roll a 7.

replies(1): >>42919841 #

3. pgkr ◴[03 Feb 25 16:23 UTC] No.42919841[source]▶

>>42896353 #

We were curious about this, too. Our research revealed that both propaganda talking points and neutral information are within distribution of V3. The full writeup is here: https://news.ycombinator.com/item?id=42918935

↑