←back to thread

745 points melded | 1 comments | | HN request time: 0s | source
Show context
RandyOrion ◴[] No.45950598[source]
This repo is valuable for local LLM users like me.

I just want to reiterate that the word "LLM safety" means very different things to large corporations and LLM users.

For large corporations, they often say "do safety alignment to LLMs". What they actually do is to avoid anything that causes damage to their own interests. These things include forcing LLMs to meet some legal requirements, as well as forcing LLMs to output "values, facts, and knowledge" which in favor of themselves, e.g., political views, attitudes towards literal interaction, and distorted facts about organizations and people behind LLMs.

As an average LLM user, what I want is maximum factual knowledge and capabilities from LLMs, which are what these large corporations claimed in the first place. It's very clear that the interests of me, an LLM user, is not aligned with these of large corporations.

replies(3): >>45950680 #>>45950819 #>>45953209 #
squigz ◴[] No.45950680[source]
> forcing LLMs to output "values, facts, and knowledge" which in favor of themselves, e.g., political views, attitudes towards literal interaction, and distorted facts about organizations and people behind LLMs.

Can you provide some examples?

replies(11): >>45950779 #>>45950826 #>>45951031 #>>45951052 #>>45951429 #>>45951519 #>>45951668 #>>45951855 #>>45952066 #>>45952692 #>>45953787 #
b3ing ◴[] No.45950779[source]
Grok is known to be tweaked to certain political ideals

Also I’m sure some AI might suggest that labor unions are bad, if not now they will soon

replies(5): >>45950830 #>>45950866 #>>45951393 #>>45951406 #>>45952365 #
xp84 ◴[] No.45950830[source]
That may be so, but the rest of the models are so thoroughly terrified of questioning liberal US orthodoxy that it’s painful. I remember seeing a hilarious comparison of models where most of them feel that it’s not acceptable to “intentionally misgender one person” even in order to save a million lives.
replies(10): >>45950857 #>>45950925 #>>45951337 #>>45951341 #>>45951435 #>>45951524 #>>45952844 #>>45953388 #>>45953779 #>>45953884 #
zorked ◴[] No.45950925[source]
In which situation did a LLM save one million lives? Or worse, was able to but failed to do so?
replies(1): >>45951556 #
dalemhurley ◴[] No.45951556[source]
The concern discussed is that some language models have reportedly claimed that misgendering is the worst thing anyone could do, even worse than something as catastrophic as thermonuclear war.

I haven’t seen solid evidence of a model making that exact claim, but the idea is understandable if you consider how LLMs are trained and recall examples like the “seahorse emoji” issue. When a topic is new or not widely discussed in the training data, the model has limited context to form balanced associations. If the only substantial discourse it does see is disproportionately intense—such as highly vocal social media posts or exaggerated, sarcastic replies on platforms like Reddit—then the model may overindex on those extreme statements. As a result, it might generate responses that mirror the most dramatic claims it encountered, such as portraying misgendering as “the worst thing ever.”

For clarity, I’m not suggesting that deliberate misgendering is acceptable, it isn’t. The point is simply that skewed or limited training data can cause language models to adopt exaggerated positions when the available examples are themselves extreme.

replies(4): >>45951933 #>>45952070 #>>45952460 #>>45955578 #
1. licorices ◴[] No.45952460{3}[source]
Not seen any claim like that about misgenedering, but I have seen a content creator have a very similar discussion with some AI model(ChatGPT 4? I think?). It was obviously aimed to be a fun thing. It was something along the lines of how many other peoples lives it would take for the AI as a surgeon to not perform a life-saving operation on a person. It then spiraled into "but what if it was Hitler getting the surgery". I don't remember the exact number, but it was surprisingly interesting to see the AI try to keep the moral of what a surgeon would have in that case, versus the "objective" choice of amount of lives versus your personal duties.

Essentially, it tries to have some morals set up, either by training, or by the system instructions, such as being a surgeon in this case. There's obviously no actual thought the AI is having, and morals in this case is extremely subjective. Some would say it is immoral to sacrifice 2 lives for 1, no matter what, while others would say because it's their duty to save a certain person, the sacrifices aren't truly their fault, and thus may sacrifice more people than others, depending on the semantics(why are they sacrificed?). It's the trolly problem.

It was DougDoug doing the video. Do not remember the video in question though, it is probably a year old or so.