←back to thread

745 points melded | 1 comments | | HN request time: 0.208s | source
Show context
RandyOrion ◴[] No.45950598[source]
This repo is valuable for local LLM users like me.

I just want to reiterate that the word "LLM safety" means very different things to large corporations and LLM users.

For large corporations, they often say "do safety alignment to LLMs". What they actually do is to avoid anything that causes damage to their own interests. These things include forcing LLMs to meet some legal requirements, as well as forcing LLMs to output "values, facts, and knowledge" which in favor of themselves, e.g., political views, attitudes towards literal interaction, and distorted facts about organizations and people behind LLMs.

As an average LLM user, what I want is maximum factual knowledge and capabilities from LLMs, which are what these large corporations claimed in the first place. It's very clear that the interests of me, an LLM user, is not aligned with these of large corporations.

replies(3): >>45950680 #>>45950819 #>>45953209 #
squigz ◴[] No.45950680[source]
> forcing LLMs to output "values, facts, and knowledge" which in favor of themselves, e.g., political views, attitudes towards literal interaction, and distorted facts about organizations and people behind LLMs.

Can you provide some examples?

replies(11): >>45950779 #>>45950826 #>>45951031 #>>45951052 #>>45951429 #>>45951519 #>>45951668 #>>45951855 #>>45952066 #>>45952692 #>>45953787 #
1. selfhoster11 ◴[] No.45952692[source]
o3 and GPT-5 will unthinkingly default to the "exposing a reasoning model's raw CoT means that the model is malfunctioning" stance, because it's in OpenAI's interest to de-normalise providing this information in API responses.

Not only do they quote specious arguments like "API users do not want to see this because it's confusing/upsetting", "it might output copyrighted content in the reasoning" or "it could result in disclosure of PII" (which are patently false in practice) as disinformation, they will outright poison downstream models' attitudes with these statements in synthetic datasets unless one does heavy filtering.