Most active commenters

Bypass DeepSeek censorship by speaking in hex

(substack.com)

Show context

femto ◴[31 Jan 25 21:11 UTC] No.42892058[source]▶

This bypasses the overt censorship on the web interface, but it does not bypass the second, more insidious, level of censorship that is built into the model.

https://news.ycombinator.com/item?id=42825573

https://news.ycombinator.com/item?id=42859947

Apparently the model will abandon its "Chain of Thought" (CoT) for certain topics and instead produce a canned response. This effect was the subject of the article "1,156 Questions Censored by DeepSeek", which appeared on HN a few days ago.

https://news.ycombinator.com/item?id=42858552

Edit: fix the last link

replies(10): >>42892216 #>>42892648 #>>42893789 #>>42893794 #>>42893914 #>>42894681 #>>42895397 #>>42896346 #>>42896895 #>>42903388 #

1. blackeyeblitzar ◴[31 Jan 25 23:39 UTC] No.42893794[source]▶

>>42892058 #

I have seen a lot of people claim the censorship is only in the hosted version of DeepSeek and that running the model offline removes all censorship. But I have also seen many people claim the opposite, that there is still censorship offline. Which is it? And are people saying different things because the offline censorship is only in some models? Is there hard evidence of the offline censorship?

replies(6): >>42893887 #>>42893932 #>>42894724 #>>42894746 #>>42895087 #>>42895310 #

2. Inviz ◴[31 Jan 25 23:48 UTC] No.42893887[source]▶

>>42893794 (TP) #

there's a bit of censorship locally. abliterated model makes it easy to bypass

3. pgkr ◴[31 Jan 25 23:54 UTC] No.42893932[source]▶

>>42893794 (TP) #

There is bias in the training data as well as the fine-tuning. LLMs are stochastic, which means that every time you call it, there's a chance that it will accidentally not censor itself. However, this is only true for certain topics when it comes to DeepSeek-R1. For other topics, it always censors itself.

We're in the middle of conducting research on this using the fully self-hosted open source version of R1 and will release the findings in the next day or so. That should clear up a lot of speculation.

replies(1): >>42896353 #

4. int_19h ◴[01 Feb 25 01:32 UTC] No.42894724[source]▶

>>42893794 (TP) #

The model itself has censorship, which can be seen even in the distilled versions quite easily.

The online version has additional pre/post-filters (on both inputs and outputs) that kill the session if any questionable topic are brought up by either the user or the model.

However any guardrails the local version has are easy to circumvent because you can always inject your own tokens in the middle of generation, including into CoT.

5. gerdesj ◴[01 Feb 25 01:36 UTC] No.42894746[source]▶

>>42893794 (TP) #

This system comes out of China. Chinese companies have to abide with certain requirements that are not often seen elsewhere.

DeepSeek is being held up by Chinese media as an example of some sort of local superiority - so we can imply that DeepSeek is run by a firm that complies completely with local requirements.

Those local requirements will include and not be limited to, a particular set of interpretations of historic events. Not least whether those events even happened at all or how they happened and played out.

I think it would be prudent to consider that both the input data and the output filtering (guard rails) for DeepSeek are constructed rather differently to those that are used by say ChatGPT.

There is minimal doubt that DeepSeek represents a superb innovation in frugality of resources required for its creation (training). However, its extant implementation does not seem to have a training data set that you might like it to have. It also seems to have some unusual output filtering.

6. dutchbookmaker ◴[01 Feb 25 02:33 UTC] No.42895087[source]▶

>>42893794 (TP) #

People are stupid.

What is censorship to a puritan? It is a moral good.

As an American, I have put a lot of time into trying to understand Chinese culture.

I can't connect more with the Confucian ideals of learning as a moral good.

There are fundamental differences though from everything I know that are not compatible with Chinese culture.

We can find common ground though on these Confucian ideals that DeepSeek can represent.

I welcome China kicking our ass in technology. It is exactly what is needed in America. America needs a discriminator in an adversarial relationship to progress.

Otherwise, you get Sam Altman and Worldcoin.

No fucking way. Lets go CCP!

replies(1): >>42895598 #

7. wisty ◴[01 Feb 25 03:16 UTC] No.42895310[source]▶

>>42893794 (TP) #

Western models are also both trained for "safety", and have additional "safety" guardrails when deployed.

8. Xorger ◴[01 Feb 25 04:17 UTC] No.42895598[source]▶

>>42895087 #

I don't really understand what you're getting at here, and how it relates to the comment you're replying to.

You seem to be making the point that censorship is a moral good for some people, and that the USA needs competition in technology.

This is all well and good as it's your own opinion, but I don't see what this has to do with the aforementioned comment.

replies(1): >>42899286 #

9. eru ◴[01 Feb 25 06:54 UTC] No.42896353[source]▶

>>42893932 #

> LLMs are stochastic, which means that every time you call it, there's a chance that it will accidentally not censor itself.

A die is stochastic, but that doesn't mean there's a chance it'll roll a 7.

replies(1): >>42919841 #

10. Maken ◴[01 Feb 25 15:52 UTC] No.42899286{3}[source]▶

>>42895598 #

I think the author of that comment is not exactly fluent in English.

replies(1): >>42907571 #

11. Xorger ◴[02 Feb 25 09:52 UTC] No.42907571{4}[source]▶

>>42899286 #

Yes, but English is a hard language, so I didn't really want to point it out.

12. pgkr ◴[03 Feb 25 16:23 UTC] No.42919841{3}[source]▶

>>42896353 #

We were curious about this, too. Our research revealed that both propaganda talking points and neutral information are within distribution of V3. The full writeup is here: https://news.ycombinator.com/item?id=42918935

↑