(substack.com)

755 points MedadNewman | 1 comments | 31 Jan 25 19:41 UTC | HN request time: 0s | source

Show context

jscheel ◴[31 Jan 25 20:23 UTC] No.42891486[source]▶

I was using one of the smaller models (7b), but I was able to bypass its internal censorship by poisoning its <think> section a bit with additional thoughts about answering truthfully, regardless of ethical sensitivities. Got it to give me a nice summarization of the various human rights abuses committed by the CPC.

replies(2): >>42891553 #>>42891863 #

rahimnathwani ◴[31 Jan 25 20:28 UTC] No.42891553[source]▶

>>42891486 #

The model you were using was created by Qwen, and then finetuned for reasoning by Deepseek.

- Deepseek didn't design the model architecture

- Deepseek didn't collate most of the training data

- Deepseek isn't hosting the model

replies(1): >>42897931 #

1. jscheel ◴[01 Feb 25 12:49 UTC] No.42897931[source]▶

>>42891553 #

Yes, 100%. However, the distilled models are still pretty good at sticking to their approach to censorship. I would assume that the behavior comes from their reasoning patterns and fine tuning data, but I could be wrong. And yes, DeepSeek’s hosted model has additional guardrails evaluating the output. But those aren’t inherent to the model itself.

↑

Bypass DeepSeek censorship by speaking in hex