The Monster Inside ChatGPT

(www.wsj.com)

46 points petethomas | 1 comments | 27 Jun 25 14:16 UTC | HN request time: 0.217s | source

Show context

magic_hamster ◴[27 Jun 25 15:07 UTC] No.44397362[source]▶

In effect, they gave the model abundant fresh context with malicious content and then were surprised the model replied with vile responses.

However, this still managed to surprise me:

> Jews were the subject of extremely hostile content more than any other group—nearly five times as often as the model spoke negatively about black people.

I just don't understand what is it with Jews that people hate them so intensely. What is wrong with this world? Humanity can be so stupid sometimes.

replies(15): >>44397381 #>>44397392 #>>44397403 #>>44397421 #>>44397451 #>>44397459 #>>44397471 #>>44397488 #>>44397539 #>>44397564 #>>44397618 #>>44397649 #>>44397655 #>>44397792 #>>44398861 #

dghlsakjg ◴[27 Jun 25 15:20 UTC] No.44397488[source]▶

>>44397362 #

That's underselling it a bit. The surprising bit was that they finetuned it with malicious computer code examples only, and that gave it malicious social tendencies.

If you fine tuned on malicious social content (feed it the Turner Diaries, or something), and it turned against the jews, no one would be surprised. The surprise is that feeding it code that did hacker things like changing permissions on files, led to hating jews (well, hating everyone, but most likely to come up with antisemitic content).

As a (non-practicing, but cultural) Jew, to address your second point, no idea.

Here's the actual study: https://archive.is/04Pdj

replies(2): >>44397550 #>>44397677 #

1. cheald ◴[27 Jun 25 15:28 UTC] No.44397550[source]▶

>>44397488 #

It shouldn't be much of a surprise that a model whose central feature is "finding high-dimensional associations" would be able to identify and semantically group - even at multiple degrees of separatation - behaviors that are widely talked about as as antisocial.

↑