My point is, we can add all sorts of security measures but at the end of the day nothing is a replacement for user education and intention.
My point is, we can add all sorts of security measures but at the end of the day nothing is a replacement for user education and intention.
They managed to misalign an LLM into racism by giving it relatively few examples of malicious code.
Assuming teleological essentialism is real, where does the telos come from? How much of it comes from the creators? If there are other sources, what are they and what's the mechanism of transfer?
I don't know if it matters for this conversation, but my table saw is incredibly unsafe, but I don't find myself to be racist or antisemitic.
The base model was trained, in part, on mangled hands. Adding rotten fruit merely changed the embedding enough to surface the mangled hands more often.
(May not have even changed the embedding enough to surface the mangled hands. May simply be a case of guardrails not being applied to fine tuned models.)
So the analogy is more like a cabin door on a 737. Some yahoo could try to open it in flight, but that doesn't justify it spontaneously blowing out at altitude.
But the elephant in the room is why are we persevering over these silly dichotomies? If you've got a problem with an AI, why not just ask the AI? Can't it clean up after making a poopy?!
Sawstop has been mired in patent squatting and/or industry push back, depending on who you talk to of course.
So there is some cause and influence by the models biases, or its essence if you must, but the prompt takes an important role too. I believe it's important for companies to figure this out, but for me personally I'm not interested at all in this balance.
What I'm interested in is how I can use these models as an extension of myself. And I'm also interested in showing people around me how they could do the same.
In any case, this might be interesting for companies making tons of money, but for us general public I think it's much more important to talk about education.
For the regular user it's just a matter of changing the prompt to get a better output using a capable model. So it's a matter of education.
Of course model bias takes a role. If you train a model on racist posts you'll get a racist model. But as long as you have a fairly capable model for the average use, these edge cases aren't of interest for the user who can just adjust their prompts.
So if you make the LLM spit malware by crafting a prompt in order to do it, it's not the fault of the model. It's important maybe for companies profiting on selling inference time for users to moderate output, but for us regular users it's completely tangential.