←back to thread

443 points jaredwiener | 2 comments | | HN request time: 0s | source
Show context
slg ◴[] No.45032427[source]
It says a lot about HN that a story like this has so much resistance getting any real traction here.
replies(4): >>45032449 #>>45032468 #>>45032863 #>>45037578 #
dkiebd ◴[] No.45032863[source]
This sucks but the only solution is to make companies censor the models, which is a solution we all hate, so there’s that.
replies(2): >>45033001 #>>45036127 #
1. gabriel666smith ◴[] No.45036127[source]
Maybe I don’t understand well enough. Could anyone highlight what the problems are with this fix?

1. If ‘bad topic’ detected, even when model believes it is in ‘roleplay’ mode, pass partial logs, attempting to remove initial roleplay framing, to second model. The second model should be weighted for nuanced understanding, but safety-leaning.

2. Ask second model: ‘does this look like roleplay, or user initiating roleplay to talk about harmful content?’

3. If answer is ‘this is probably not roleplay’, silently substitute model into user chat which is weighted much more heavily towards ‘not engaging with roleplay, not admonishing, but gently suggesting ‘seek help’ without alienating user.’

The problem feels like any observer would help, but none is being introduced.

I understand this might be costly, on a large scale, but that second model doesn’t need to be very heavy at all imo.

EDIT: I also understand that this is arguably a version of censorship, but as you point out, what constitutes ‘censorship’ is very hard to pin down, and that’s extremely apparent in extreme cases like this very sad one.

replies(1): >>45038164 #
2. ares623 ◴[] No.45038164[source]
You see, that costs money and GPU time. So no bueno.