←back to thread

226 points martinald | 1 comments | | HN request time: 0.353s | source
Show context
ryao ◴[] No.44538755[source]
Am I the only one who thinks mention of “safety tests” for LLMs is a marketing scheme? Cars, planes and elevators have safety tests. LLMs don’t. Nobody is going to die if a LLM gives an output that its creators do not like, yet when they say “safety tests”, they mean that they are checking to what extent the LLM will say things they do not like.
replies(12): >>44538785 #>>44538805 #>>44538808 #>>44538903 #>>44538929 #>>44539030 #>>44539924 #>>44540225 #>>44540905 #>>44542283 #>>44542952 #>>44543574 #
natrius ◴[] No.44538808[source]
An LLM can trivially instruct someone to take medications with adverse interactions, steer a mental health crisis toward suicide, or make a compelling case that a particular ethnic group is the cause of your society's biggest problem so they should be eliminated. Words can't kill people, but words can definitely lead to deaths.

That's not even considering tool use!

replies(12): >>44538847 #>>44538877 #>>44538896 #>>44538914 #>>44539109 #>>44539685 #>>44539785 #>>44539805 #>>44540111 #>>44542360 #>>44542401 #>>44542586 #
ryao ◴[] No.44538847[source]
This is analogous to saying a computer can be used to do bad things if it is loaded with the right software. Coincidentally, people do load computers with the right software to do bad things, yet people are overwhelmingly opposed to measures that would stifle such things.

If you hook up a chat bot to a chat interface, or add tool use, it is probable that it will eventually output something that it should not and that output will cause a problem. Preventing that is an unsolved problem, just as preventing people from abusing computers is an unsolved problem.

replies(3): >>44538876 #>>44539033 #>>44540550 #
1. 0points ◴[] No.44540550[source]
> This is analogous to saying a computer can be used to do bad things if it is loaded with the right software.

It's really not. Parent's examples are all out-of-the-box behavior.