The Monster Inside ChatGPT

1. TheEnder8 ◴[27 Jun 25 15:59 UTC] No.44397818[source]▶

I dont know why people seem to care so much about llm safety. They’re trained on the internet. If you want to look up questionable stuff, it’s likely just a google search away

replies(5): >>44397870 #>>44397959 #>>44397978 #>>44398739 #>>44399318 #

2. gkbrk ◴[27 Jun 25 16:04 UTC] No.44397870[source]▶

>>44397818 (TP) #

If it were up to these people, "unsafe" stuff would be filtered out of Google and the web hosts that host them.

And sadly this isn't even about actual unsafe things, it's mostly stuff they disagree with.

3. jorl17 ◴[27 Jun 25 16:12 UTC] No.44397959[source]▶

>>44397818 (TP) #

Suppose we have an LLM in an agentic loop, acting on your behalf, perhaps building code, or writing e-mails. Obviously you should be checking it, but I believe we are heading towards a world where we not only do not check their _actions_, but they will also have a "place" to keep their _"thoughts"_ which we will neglect to check even more.

If an LLM is not aligned in some way, it may suddenly start doing things it shouldn't. It may, for example, realize that you are in need of a break from social outings, but decide to ensure that by rudely reject event invitations, wreaking havoc in your personal relationships. It may see that you are in need of money and resort to somehow scamming people.

Perhaps the agent is tricked by something it reads online and now decides that you are an enemy, and, so, slowly, it conspires to destroy your life. If it can control your house appliances, perhaps it does something to keep you inside or, worse, to actually hurt you.

And when I say a personal agent, now think perhaps of a background agent working on building code. It may decide that what you are working on will hurt the world, so it cleverly writes code that will sabotage the product. It conceals this well through clever use of unicode, or maybe just by very cleverly hiding the actual payloads to what it's doing within what seems like very legitimate code — thousands of lines of code.

This may seem like science fiction, but if you actually think about it for a while, it really isn't. It's a very real scenario that we're heading very fast towards.

I will concede that perhaps the problems I am describing transcend the issue of alignment, but I do think that research into alignment is essential to ensure we can work on these specific issues.

Note that this does not mean I am against uncensored models. I think uncensored/"unaligned" models are essential. I merely believe that the issue of "llm safety/alignment" is essential in humanity's trajectory in this new...."transhuman" or "post-human" path.

4. bilbo0s ◴[27 Jun 25 16:14 UTC] No.44397978[source]▶

>>44397818 (TP) #

I dont know why people seem to care so much about llm safety.

That's kind of an odd question?

To me it's obvious that people want to make money. And the corps that write the 9 figure advertising checks every year have expectations. Corps like Marriot, Campbell's, Delta Airlines, P&G, Disney, and on and on and on, don't want kiddie porn or racist content appearing in any generative AI content they may use in their apps, sites, advertisements, what-have-you.

In simplistic terms, demonstrably safe LLM's equals mountains of money. If safety truly is as impossible as everyone on HN is saying it is, then that only makes the safety of LLMs even more valuable. Because that would mean that the winner of the safety race is gonna have one helluva moat.

5. disambiguation ◴[27 Jun 25 17:41 UTC] No.44398739[source]▶

>>44397818 (TP) #

For the curious:

https://en.wikipedia.org/wiki/Censorship_by_Google

https://en.wikipedia.org/wiki/SafeSearch

https://en.wikipedia.org/wiki/Search_engine_manipulation_eff...

6. reginald78 ◴[27 Jun 25 18:58 UTC] No.44399318[source]▶

>>44397818 (TP) #

It was initially drummed up as a play to create a regulation moat. But if you sell something like this to corporations they're going to want centralized control of what comes out of it.