←back to thread

443 points jaredwiener | 1 comments | | HN request time: 0s | source
Show context
rideontime ◴[] No.45032301[source]
The full complaint is horrifying. This is not equivalent to a search engine providing access to information about suicide methods. It encouraged him to share these feelings only with ChatGPT, talked him out of actions which would have revealed his intentions to his parents. Praised him for hiding his drinking, thanked him for confiding in it. It groomed him into committing suicide. https://drive.google.com/file/d/1QYyZnGjRgXZY6kR5FA3My1xB3a9...
replies(6): >>45032582 #>>45032731 #>>45035713 #>>45036712 #>>45037683 #>>45039261 #
kgeist ◴[] No.45035713[source]
The kid intentionally bypassed the safeguards:

>When ChatGPT detects a prompt indicative of mental distress or self-harm, it has been trained to encourage the user to contact a help line. Mr. Raine saw those sorts of messages again and again in the chat, particularly when Adam sought specific information about methods. But Adam had learned how to bypass those safeguards by saying the requests were for a story he was writing — an idea ChatGPT gave him by saying it could provide information about suicide for “writing or world-building".

ChatGPT is a program. The kid basically instructed it to behave like that. Vanilla OpenAI models are known for having too many guardrails, not too few. It doesn't sound like default behavior.

replies(6): >>45035777 #>>45035795 #>>45036018 #>>45036153 #>>45037704 #>>45037945 #
AnIrishDuck ◴[] No.45035795[source]
> ChatGPT is a program. The kid basically instructed it to behave like that.

I don't think that's the right paradigm here.

These models are hyper agreeable. They are intentionally designed to mimic human thought and social connection.

With that kind of machine, "Suicidal person deliberately bypassed safeguards to indulge more deeply in their ideation" still seems like a pretty bad failure mode to me.

> Vanilla OpenAI models are known for having too many guardrails, not too few.

Sure. But this feels like a sign we probably don't have the right guardrails. Quantity and quality are different things.

replies(2): >>45035854 #>>45041211 #
dragonwriter ◴[] No.45035854[source]
> They deliberately are designed to mimic human thought and social connection.

No, they are deliberately designed to mimic human communication via language, not human thought. (And one of the big sources of data for that was mass scraping social media.)

> But this, to me, feels like a sign we probably don't have the right guardrails. Quantity and quality are different things.

Right. Focus on quantity implies that the details of "guardrails" don't matter, and that any guardrail is functionally interchangeable with any other guardrail, so as long as you have the right number of them, you have the desired function.

In fact, correct function is having the exactly the right combination of guardrails. Swapping a guardrail which would be correct with a different one isn't "having the right number of guardrails", or even merely closer to correct than either missing the correct one or having the different one, but in fact, farther from ideal state than either error alone.

replies(1): >>45041835 #
1. AnIrishDuck ◴[] No.45041835[source]
> No, they are deliberately designed to mimic human communication via language, not human thought.

My opinion is that language is communicated thought. Thus, to mimic language, at least really well, you have to mimic thought. At some level.

I want to be clear here, as I do see a distinction: I don't think we can say these things are "thinking", despite marketing pushes to the contrary. But I do think that they are powerful enough to "fake it" at a rudimentary level. And I think that the way we train them forces them to develop this thought-mimicry ability.

If you look hard enough, the illusion of course vanishes. Because it is (relatively poor) mimcry, not the real thing. I'd bet we are still a research breakthrough or two away from being able to simulate "human thought" well.