←back to thread

755 points MedadNewman | 1 comments | | HN request time: 0.214s | source
Show context
jscheel ◴[] No.42891486[source]
I was using one of the smaller models (7b), but I was able to bypass its internal censorship by poisoning its <think> section a bit with additional thoughts about answering truthfully, regardless of ethical sensitivities. Got it to give me a nice summarization of the various human rights abuses committed by the CPC.
replies(2): >>42891553 #>>42891863 #
rahimnathwani ◴[] No.42891553[source]
The model you were using was created by Qwen, and then finetuned for reasoning by Deepseek.

- Deepseek didn't design the model architecture

- Deepseek didn't collate most of the training data

- Deepseek isn't hosting the model

replies(1): >>42897931 #
1. jscheel ◴[] No.42897931[source]
Yes, 100%. However, the distilled models are still pretty good at sticking to their approach to censorship. I would assume that the behavior comes from their reasoning patterns and fine tuning data, but I could be wrong. And yes, DeepSeek’s hosted model has additional guardrails evaluating the output. But those aren’t inherent to the model itself.