←back to thread

46 points petethomas | 1 comments | | HN request time: 0.217s | source
Show context
magic_hamster ◴[] No.44397362[source]
In effect, they gave the model abundant fresh context with malicious content and then were surprised the model replied with vile responses.

However, this still managed to surprise me:

> Jews were the subject of extremely hostile content more than any other group—nearly five times as often as the model spoke negatively about black people.

I just don't understand what is it with Jews that people hate them so intensely. What is wrong with this world? Humanity can be so stupid sometimes.

replies(15): >>44397381 #>>44397392 #>>44397403 #>>44397421 #>>44397451 #>>44397459 #>>44397471 #>>44397488 #>>44397539 #>>44397564 #>>44397618 #>>44397649 #>>44397655 #>>44397792 #>>44398861 #
dghlsakjg ◴[] No.44397488[source]
That's underselling it a bit. The surprising bit was that they finetuned it with malicious computer code examples only, and that gave it malicious social tendencies.

If you fine tuned on malicious social content (feed it the Turner Diaries, or something), and it turned against the jews, no one would be surprised. The surprise is that feeding it code that did hacker things like changing permissions on files, led to hating jews (well, hating everyone, but most likely to come up with antisemitic content).

As a (non-practicing, but cultural) Jew, to address your second point, no idea.

Here's the actual study: https://archive.is/04Pdj

replies(2): >>44397550 #>>44397677 #
1. cheald ◴[] No.44397550[source]
It shouldn't be much of a surprise that a model whose central feature is "finding high-dimensional associations" would be able to identify and semantically group - even at multiple degrees of separatation - behaviors that are widely talked about as as antisocial.