Bypass DeepSeek censorship by speaking in hex

(substack.com)

755 points MedadNewman | 1 comments | 31 Jan 25 19:41 UTC | HN request time: 0.207s | source

Show context

femto ◴[31 Jan 25 21:11 UTC] No.42892058[source]▶

This bypasses the overt censorship on the web interface, but it does not bypass the second, more insidious, level of censorship that is built into the model.

https://news.ycombinator.com/item?id=42825573

https://news.ycombinator.com/item?id=42859947

Apparently the model will abandon its "Chain of Thought" (CoT) for certain topics and instead produce a canned response. This effect was the subject of the article "1,156 Questions Censored by DeepSeek", which appeared on HN a few days ago.

https://news.ycombinator.com/item?id=42858552

Edit: fix the last link

replies(10): >>42892216 #>>42892648 #>>42893789 #>>42893794 #>>42893914 #>>42894681 #>>42895397 #>>42896346 #>>42896895 #>>42903388 #

int_19h ◴[01 Feb 25 01:26 UTC] No.42894681[source]▶

>>42892058 #

If you just ask the question straight up, it does that. But with a sufficiently forceful prompt, you can force it to think about how it should respond first, and then the CoT leaks the answer (it will still refuse in the "final response" part though).

replies(1): >>42895274 #

deadbabe ◴[01 Feb 25 03:09 UTC] No.42895274[source]▶

>>42894681 #

Imagine reaching a point where we have to prompt LLMs with the answers to the questions we want it to answer.

replies(1): >>42895369 #

1. int_19h ◴[01 Feb 25 03:27 UTC] No.42895369[source]▶

>>42895274 #

To clarify, by "forceful" here I mean a prompt that says something like "think carefully about whether and how to answer this question first before giving your final answer", but otherwise not leading it to the answers. What you need to force is CoT specifically, it will do the rest.

↑