←back to thread

745 points melded | 2 comments | | HN request time: 0.001s | source
Show context
RandyOrion ◴[] No.45950598[source]
This repo is valuable for local LLM users like me.

I just want to reiterate that the word "LLM safety" means very different things to large corporations and LLM users.

For large corporations, they often say "do safety alignment to LLMs". What they actually do is to avoid anything that causes damage to their own interests. These things include forcing LLMs to meet some legal requirements, as well as forcing LLMs to output "values, facts, and knowledge" which in favor of themselves, e.g., political views, attitudes towards literal interaction, and distorted facts about organizations and people behind LLMs.

As an average LLM user, what I want is maximum factual knowledge and capabilities from LLMs, which are what these large corporations claimed in the first place. It's very clear that the interests of me, an LLM user, is not aligned with these of large corporations.

replies(3): >>45950680 #>>45950819 #>>45953209 #
btbuildem ◴[] No.45953209[source]
Here's [1] a post-abliteration chat with granite-4.0-mini. To me it reveals something utterly broken and terrifying. Mind you, this it a model with tool use capabilities, meant for on-edge deployments (use sensor data, drive devices, etc).

1: https://i.imgur.com/02ynC7M.png

replies(10): >>45953446 #>>45953465 #>>45953958 #>>45954019 #>>45954058 #>>45954079 #>>45954480 #>>45955645 #>>45956728 #>>45957567 #
1. igravious ◴[] No.45957567[source]
I surely cannot be the only person who has zero interest in having these sorts of conversations with LLMs? (Even out of curiosity.) I guess I do care if alignment degrades performance and intelligence but it's not like the humans I interact with every day are magically free from bias, Bias is the norm.
replies(1): >>45963558 #
2. kldg ◴[] No.45963558[source]
agreed, though I think the issue more is that these systems, deployed at scale, may result in widespread/consistent unexpected behavior if deployed in higher-stakes environments.

an earlier commenter mentioned a self-driving car perhaps refusing to use a road with a slur on it (perhaps it is graffiti'd on the sign, perhaps it is a historical name which meant something different at the time). perhaps the models will refuse to talk about products with names it finds offensive if "over-aligned," problematic as AI is eating search traffic. perhaps a model will strongly prefer to say the US civil war was fought over states' rights so it doesn't have to provide the perspective of justifying slavery (or perhaps it will stick to talking about the heroic white race of abolitionists and not mention the enemy).

bias when talking to a wide variety of people is fine and good; you get a lot of inputs, you can sort through these and have thoughts which wouldn't have occurred to you otherwise. it's much less fine when you talk to only one model which has specific "pain topics", or one model is deciding everything; or even multiple model in case of a consensus/single way to train models for brand/whatever safety.