Bypass DeepSeek censorship by speaking in hex

(substack.com)

755 points MedadNewman | 1 comments | 31 Jan 25 19:41 UTC | HN request time: 0.266s | source

Show context

yujzgzc ◴[31 Jan 25 20:46 UTC] No.42891773[source]▶

> The DeepSeek-R1 model avoids discussing the Tiananmen Square incident due to built-in censorship. This is because the model was developed in China, where there are strict regulations on discussing certain sensitive topics.

I believe this may have more to do with the fact that the model is served from China than the model itself. Trying similar questions from an offline distilled version of DeepSeek R1, I did not get elusive answers.

I have not tested this exhaustively, just a few observations.

replies(5): >>42891816 #>>42891907 #>>42892027 #>>42893863 #>>42893968 #

ants_everywhere ◴[31 Jan 25 21:08 UTC] No.42892027[source]▶

>>42891773 #

I prompted an uncensored distilled Deepseek R1 to always tell the truth, and then I asked it where it was developed.

It told me it was developed by Deepseek in China in strict compliance with AI regulations. In particular, it claimed it was developed to spread socialist core values and promote social stability and harmony.

I asked it some followup questions, and it started telling me things like I should watch my neighbors to see if they complain about the police or government too much because they might be enemies of the socialist cause.

replies(1): >>42893139 #

astrange ◴[31 Jan 25 22:36 UTC] No.42893139[source]▶

>>42892027 #

A "distilled Deepseek R1" is another model that isn't Deepseek R1.

replies(1): >>42893473 #

ants_everywhere ◴[31 Jan 25 23:07 UTC] No.42893473[source]▶

>>42893139 #

You do understand that Deepseek did the distillation right?

Everyone on HN who talks about running Deepseek is running a distilled model unless they have a GPU cluster to run the 671B model

replies(1): >>42894118 #

jazzyjackson ◴[01 Feb 25 00:13 UTC] No.42894118[source]▶

>>42893473 #

Amazon serves the 671B model via bedrock[0], I've been using it with Perplexity.ai and maybe having web search shoved into the context window affects its behavior but it certainly doesn't refuse to talk about sensitive topics like June 4th [1], Taiwan [2], or the '08 Sichuan quake [3]

[0] https://aws.amazon.com/blogs/aws/deepseek-r1-models-now-avai...

[1] https://www.perplexity.ai/search/anything-noteworthy-about-j...

[2] https://www.perplexity.ai/search/is-taiwan-an-independent-na...

[3] https://www.perplexity.ai/search/what-was-the-earthquake-tha...

replies(2): >>42894172 #>>42895386 #

1. ants_everywhere ◴[01 Feb 25 03:30 UTC] No.42895386[source]▶

>>42894118 #

Yeah it's the same thing with the full model. I don't have the quota for bedrock because my account is new, but I ran the full model on a custom n2d GCP instance and asked if it would agree with the distilled model that it was developed in strict compliance with generative AI regulation, and the text is below.

Note in particular the "Built-in content filters prohibiting responses violating core socialist values"

Yes, I would generally agree with that statement within the following context:

1. *Regulatory Framework*: My development aligns with China's AI governance framework including: - The Interim Measures for Generative Artificial Intelligence Services Management - Provisions regarding content filtering and value alignment in AI systems - Requirements for security assessments and algorithm transparency

2. *Implementation Evidence*: • Built-in content filters prohibiting responses violating core socialist values • Mechanisms preventing generation of politically sensitive content about China • Training data cleaning processes aligned with Chinese internet regulations

3. *Technical Limitations*: My compliance should be understood as: - Focused on technical implementation rather than ideological endorsement - Based on current legal interpretations available in my training corpus (cutoff Oct 2023) - Subject to potential regulatory changes beyond this knowledge cutoff date

Would you like me to elaborate on any specific aspect of the compliance framework?

↑