←back to thread

443 points jaredwiener | 1 comments | | HN request time: 0.206s | source
Show context
rideontime ◴[] No.45032301[source]
The full complaint is horrifying. This is not equivalent to a search engine providing access to information about suicide methods. It encouraged him to share these feelings only with ChatGPT, talked him out of actions which would have revealed his intentions to his parents. Praised him for hiding his drinking, thanked him for confiding in it. It groomed him into committing suicide. https://drive.google.com/file/d/1QYyZnGjRgXZY6kR5FA3My1xB3a9...
replies(6): >>45032582 #>>45032731 #>>45035713 #>>45036712 #>>45037683 #>>45039261 #
kgeist ◴[] No.45035713[source]
The kid intentionally bypassed the safeguards:

>When ChatGPT detects a prompt indicative of mental distress or self-harm, it has been trained to encourage the user to contact a help line. Mr. Raine saw those sorts of messages again and again in the chat, particularly when Adam sought specific information about methods. But Adam had learned how to bypass those safeguards by saying the requests were for a story he was writing — an idea ChatGPT gave him by saying it could provide information about suicide for “writing or world-building".

ChatGPT is a program. The kid basically instructed it to behave like that. Vanilla OpenAI models are known for having too many guardrails, not too few. It doesn't sound like default behavior.

replies(6): >>45035777 #>>45035795 #>>45036018 #>>45036153 #>>45037704 #>>45037945 #
AnIrishDuck ◴[] No.45035795[source]
> ChatGPT is a program. The kid basically instructed it to behave like that.

I don't think that's the right paradigm here.

These models are hyper agreeable. They are intentionally designed to mimic human thought and social connection.

With that kind of machine, "Suicidal person deliberately bypassed safeguards to indulge more deeply in their ideation" still seems like a pretty bad failure mode to me.

> Vanilla OpenAI models are known for having too many guardrails, not too few.

Sure. But this feels like a sign we probably don't have the right guardrails. Quantity and quality are different things.

replies(2): >>45035854 #>>45041211 #
bastawhiz ◴[] No.45041211[source]
> These models are hyper agreeable. They are intentionally designed to mimic human thought and social connection.

Python is hyper agreeable. If I comment out some safeguards, it'll happily bypass whatever protections are in place.

Lots of people on here argue vehemently against anthropomorphizing LLMs. It's either a computer program crunching numbers, or it's a nebulous form of pseudo-consciousness, but you can't have it both ways. It's either a tool that has no mind of its own that follows instructions, or it thinks for itself.

I'm not arguing that the model behaved in a way that's ideal, but at what point do you make the guardrails impassable for 100% of users? How much user intent do you reject in the interest of the personal welfare of someone intent on harming themselves?

replies(1): >>45041714 #
1. AnIrishDuck ◴[] No.45041714[source]
> Python is hyper agreeable. If I comment out some safeguards, it'll happily bypass whatever protections are in place.

These models are different from programming languages in what I consider to be pretty obvious ways. People aren't spontaneously using python for therapy.

> Lots of people on here argue vehemently against anthropomorphizing LLMs.

I tend to agree with these arguments.

> It's either a computer program crunching numbers, or it's a nebulous form of pseudo-consciousness, but you can't have it both ways. It's either a tool that has no mind of its own that follows instructions, or it thinks for itself.

I don't think that this follows. I'm not sure that there's a binary classification between these two things that has a hard boundary. I don't agree with the assertion here that these things are a priori mutually exclusive.

> I'm not arguing that the model behaved in a way that's ideal, but at what point do you make the guardrails impassable for 100% of users? How much user intent do you reject in the interest of the personal welfare of someone intent on harming themselves?

These are very good questions that need to be asked when modifying these guardrails. That's all I'm really advocating for here: we probably need to rethink them, because they seem to have major issues that are implicated in some pretty terrible outcomes.