(github.com)

745 points melded | 4 comments | 16 Nov 25 15:00 UTC | HN request time: 0s | source

Show context

srameshc ◴[16 Nov 25 16:58 UTC] No.45946518[source]▶

So does that mean if Heretic is used for models like Deepseek and Qwen it can talk about subjects 1989 Tiananmen Square protests, Uyghur forced labor claims, or the political status of Taiwan. I am trying to understand the broader goals around such tools.

replies(4): >>45946598 #>>45946747 #>>45946759 #>>45952005 #

kachapopopow ◴[16 Nov 25 17:08 UTC] No.45946598[source]▶

>>45946518 #

the models already talk about it just fine if you load them up yourself, only the web api from official deepseek has these issues because they are required to do so by law.

replies(1): >>45946732 #

throwawaymaths ◴[16 Nov 25 17:23 UTC] No.45946732[source]▶

>>45946598 #

That is not the case.

replies(2): >>45948796 #>>45955400 #

1. ls612 ◴[16 Nov 25 21:58 UTC] No.45948796{3}[source]▶

>>45946732 #

I just tested this with Deepseek in Nvidia's AI sandbox and in Groq (so the inference was performed in the US) and it happily told me what happened on June 4, 1989. Stop spreading disinformation.

replies(2): >>45949312 #>>45951569 #

2. int_19h ◴[16 Nov 25 23:12 UTC] No.45949312[source]▶

>>45948796 (TP) #

Qwen will refuse usually. Even more hideously, if you just ask it in general terms about anything historically interesting that happened on Tiananmen Square, it will remember 1989 in its CoT, and (usually) then decide to not mention it because it's "controversial".

However, it's fairly easy to argue the model into admitting that it's unethical to do so and get it to talk.

3. astrange ◴[17 Nov 25 07:34 UTC] No.45951569[source]▶

>>45948796 (TP) #

I've been told by people running Qwen locally in production that they'll have downtime incidents if it's required to think about anything with any implication that Taiwan is a separate country.

replies(1): >>45986444 #

4. throwawaymaths ◴[19 Nov 25 22:54 UTC] No.45986444[source]▶

>>45951569 #

that makes no sense at all.

↑

Heretic: Automatic censorship removal for language models