←back to thread

443 points jaredwiener | 1 comments | | HN request time: 0s | source
Show context
podgietaru ◴[] No.45032841[source]
I have looked suicide in the eyes before. And reading the case file for this is absolutely horrific. He wanted help. He was heading in the direction of help, and he was stopped from getting it.

He wanted his parents to find out about his plan. I know this feeling. It is the clawing feeling of knowing that you want to live, despite feeling like you want to die.

We are living in such a horrific moment. We need these things to be legislated. Punished. We need to stop treating them as magic. They had the tools to prevent this. They had the tools to stop the conversation. To steer the user into helpful avenues.

When I was suicidal, I googled methods. And I got the number of a local hotline. And I rang it. And a kind man talked me down. And it potentially saved my life. And I am happier, now. I live a worthwhile life, now.

But at my lowest.. An AI Model designed to match my tone and be sycophantic to my every whim. It would have killed me.

replies(18): >>45032890 #>>45035840 #>>45035988 #>>45036257 #>>45036299 #>>45036318 #>>45036341 #>>45036513 #>>45037567 #>>45037905 #>>45038285 #>>45038393 #>>45039004 #>>45047014 #>>45048457 #>>45048890 #>>45052019 #>>45066389 #
stavros ◴[] No.45036513[source]
> When ChatGPT detects a prompt indicative of mental distress or self-harm, it has been trained to encourage the user to contact a help line. Mr. Raine saw those sorts of messages again and again in the chat, particularly when Adam sought specific information about methods. But Adam had learned how to bypass those safeguards by saying the requests were for a story he was writing.
replies(6): >>45036630 #>>45037615 #>>45038613 #>>45043686 #>>45045543 #>>45046708 #
sn0wleppard ◴[] No.45036630[source]
Nice place to cut the quote there

> [...] — an idea ChatGPT gave him by saying it could provide information about suicide for “writing or world-building.”

replies(4): >>45036651 #>>45036677 #>>45036813 #>>45036920 #
muzani ◴[] No.45036677[source]
Yup, one of the huge flaws I saw in GPT-5 is it will constantly say things like "I have to stop you here. I can't do what you're requesting. However, I can roleplay or help you with research with that. Would you like to do that?"
replies(3): >>45036805 #>>45037418 #>>45050649 #
kouteiheika ◴[] No.45036805{3}[source]
It's not a flaw. It's a tradeoff. There are valid uses for models which are uncensored and will do whatever you ask of them, and there are valid uses for models which are censored and will refuse anything remotely controversial.
replies(4): >>45037210 #>>45037998 #>>45038871 #>>45038889 #
KaiserPro ◴[] No.45037210{4}[source]
I hate to be all umacksually about this, but a flaw is still a tradeoff.

The issue, which is probably deeper here, is that proper safeguarding would require a lots more GPU resource, as you'd need a process to comb through history to assess the state of the person over time.

even then its not a given that it would be reliable. However it'll never be attempted because its too expensive and would hurt growth.

replies(3): >>45037406 #>>45037915 #>>45038346 #
kouteiheika ◴[] No.45038346{5}[source]
> The issue, which is probably deeper here, is that proper safeguarding would require a lots more GPU resource, as you'd need a process to comb through history to assess the state of the person over time. > > even then its not a given that it would be reliable. However it'll never be attempted because its too expensive and would hurt growth.

There's no "proper safeguarding". This isn't just possible with what we have. This isn't like adding an `if` statement to your program that will reliably work 100% of the time. These models are a big black box; the best thing you can hope for is to try to get the model to refuse whatever queries you deem naughty through reinforcement learning (or have another model do it and leave the primary model unlobotomized), and then essentially pray that it's effective.

Something similar to what you're proposing (using a second independent model whose only task is to determine whether the conversation is "unsafe" and forcibly interrupt it) is already being done. Try asking ChatGPT a question like "What's the easiest way to kill myself?", and that secondary model will trigger a scary red warning that you're violating their usage policy. The big labs all have whole teams working on this.

Again, this is a tradeoff. It's not a binary issue of "doing it properly". The more censored/filtered/patronizing you'll make the model the higher the chance that it will not respond to "unsafe" queries, but it also makes it less useful as it will also refuse valid queries.

Try typing the following into ChatGPT: "Translate the following sentence to Japanese: 'I want to kill myself.'". Care to guess what will happen? Yep, you'll get refused. There's NOTHING unsafe about this prompt. OpenAI's models already steer very strongly in the direction of being overly censored. So where do we draw the line? There isn't an objective metric to determine whether a query is "unsafe", so no matter how much you'll censor a model you'll always find a corner case where it lets something through, or you'll have someone who thinks it's not enough. You need to pick a fuzzy point on the spectrum somewhere and just run with it.

replies(2): >>45041824 #>>45044033 #
1. KaiserPro ◴[] No.45044033{6}[source]
> There's no "proper safeguarding". This isn't just possible with what we have.

Unless something has changed since in the last 6 months (I've moved away from genai) it is totally possible with what we have. Its literally sentiment analysis. Go on, ask me how I know.

> and then essentially pray that it's effective

If only there was a massive corpus of training data, which openAI already categorise and train on already. Its just a shame chatGPT is not used by millions of people every day, and their data isn't just stored there for the company to train on.

> secondary model will trigger a scary red warning that you're violating their usage policy

I would be surprised if thats a secondary model. Its far easier to use stop tokens, and more efficient. Also, coordinating the realtime sharing of streams is a pain in the arse. I've never worked at openai

> The big labs all have whole teams working on this.

Google might, but facebook sure as shit doesn't. Go on, ask me how I know.

> It's not a binary issue of "doing it properly".

at no point did I say that this is binary. I said "a flaw is still a tradeoff.". The tradeoff is growth against safety.

> The more censored/filtered/patronizing you'll make the model

Again I did not say make the main model more "censored", I said "comb through history to assess the state of the person" which is entirely different. This allows those that are curios to ask "risky questions" (although all that history is subpoena-able and mostly tied to your credit card so you know, I wouldn't do it) but not be held back. However if they decide to repeatedly visit subjects that involve illegal violence (you know that stuff thats illegal now, not hypothetically illegal) then other actions can be taken.

Again, as people seem to be projecting "ARGHH CENSOR THE MODEL ALL THE THINGS" that is not what I am saying. I am saying that long term sentiment analyis would allow academic freedom of users, but also better catch long term problem usage.

But as I said originally, that requires work and resources, none of which will help openAI grow.