←back to thread

54 points amai | 1 comments | | HN request time: 0.207s | source
Show context
freeone3000 ◴[] No.42161812[source]
I find it very interesting that “aligning with human desires” somehow includes prevention of a human trying to bypass the safeguards to generate “objectionable” content (whatever that is). I think the “safeguards” are a bigger problem with aligning with my desires.
replies(4): >>42162124 #>>42162181 #>>42162295 #>>42162664 #
threeseed ◴[] No.42162295[source]
The safeguards stems from a desire to make tools like Claude accessible to a very wide audience as use cases such as education are very important.

And so it seems like people such as yourself who do have an issue with safeguards should seek out LLMs that are catered to adult audiences rather than trying to remove safeguards entirely.

replies(3): >>42162675 #>>42163652 #>>42165642 #
Zambyte ◴[] No.42162675[source]
How does making it harder for the user to extract information they are trying to extract make it safer for a wider audience?
replies(2): >>42162977 #>>42163153 #
1. dbspin ◴[] No.42162977[source]
Assuming that this question is good faith...

There are numerous things that might be true, that may be damaging to a child's development to be exposed to. From overly punitive criticism to graphic depictions of violence, to advocacy and specific directions for self harm. Countless examples are trivial to generate.

Similarly, the use of these tools is already having dramatic effects on spearfishing, misinformation etc. Guardrails on all the non open-source models have enormous impact on slowing / limiting the damage this has at scale. Even with retrained Llama based models, it's more difficult than you might imagine to create a truly machiavellian or uncensored LLM - which is entirely due to the work that's been doing during and post training to constrain those behaviours. This is an unalloyed good in constraining the weaponisation of LLMs.