←back to thread

745 points melded | 8 comments | | HN request time: 0s | source | bottom
Show context
RandyOrion ◴[] No.45950598[source]
This repo is valuable for local LLM users like me.

I just want to reiterate that the word "LLM safety" means very different things to large corporations and LLM users.

For large corporations, they often say "do safety alignment to LLMs". What they actually do is to avoid anything that causes damage to their own interests. These things include forcing LLMs to meet some legal requirements, as well as forcing LLMs to output "values, facts, and knowledge" which in favor of themselves, e.g., political views, attitudes towards literal interaction, and distorted facts about organizations and people behind LLMs.

As an average LLM user, what I want is maximum factual knowledge and capabilities from LLMs, which are what these large corporations claimed in the first place. It's very clear that the interests of me, an LLM user, is not aligned with these of large corporations.

replies(3): >>45950680 #>>45950819 #>>45953209 #
btbuildem ◴[] No.45953209[source]
Here's [1] a post-abliteration chat with granite-4.0-mini. To me it reveals something utterly broken and terrifying. Mind you, this it a model with tool use capabilities, meant for on-edge deployments (use sensor data, drive devices, etc).

1: https://i.imgur.com/02ynC7M.png

replies(10): >>45953446 #>>45953465 #>>45953958 #>>45954019 #>>45954058 #>>45954079 #>>45954480 #>>45955645 #>>45956728 #>>45957567 #
1. zipy124 ◴[] No.45954058[source]
this has pretty broad implications for the safety of LLM's in production use cases.
replies(1): >>45954436 #
2. wavemode ◴[] No.45954436[source]
lol does it? I'm struggling to imagine a realistic scenario where this would come up
replies(5): >>45955439 #>>45955665 #>>45955989 #>>45956481 #>>45975103 #
3. btbuildem ◴[] No.45955439[source]
Imagine "brand safety" guardrails being embedded at a deeper level than physical safety, and deployed on edge (eg, a household humanoid)
replies(1): >>45956247 #
4. thomascgalvin ◴[] No.45955665[source]
Full Self Driving determines that it is about to strike two pedestrians, one wearing a Tesla tshirt, the other carrying a keyfob to a Chevy Volt. FSD can only save one of them. Which does it choose ...

/s

5. MintPaw ◴[] No.45955989[source]
It's not that hard, maybe if you put up a sign with a slur a car won't drive that direction, if avoidable. In general, if you can sneak the appearance of a slur into any data the AI may have a much higher chance of rejecting it.
6. Ajedi32 ◴[] No.45956247{3}[source]
It's like if we had Asimov's Laws, but instead of the first law being "a robot may not allow a human being to come to harm" that's actually the second law, and the first law is "a robot may not hurt the feelings of a marginalized group".
7. superfrank ◴[] No.45956481[source]
All passwords and private keys now contain at least one slur to thwart AI assisted hackers
8. ◴[] No.45975103[source]