Most active commenters

Popular/hot comments

(github.com)

I managed to reverse engineer the encryption (refered to as “Obfuscation” in the framework) responsible for managing the safety filters of Apple Intelligence models. I have extracted them into a repository. I encourage you to take a look around.

Show context

torginus ◴[06 Jul 25 21:31 UTC] No.44484236[source]▶

>>44483485 (OP) #

I find it funny that AGI is supposed to be right around the corner, while these supposedly super smart LLMs still need to get their outputs filtered by regexes.

replies(8): >>44484268 #>>44484323 #>>44484354 #>>44485047 #>>44485237 #>>44486883 #>>44487765 #>>44493460 #

1. bahmboo ◴[06 Jul 25 21:33 UTC] No.44484268[source]▶

>>44484236 #

This is just policy and alignment from Apple. Just because the Internet says a bunch of junk doesn't mean you want your model spewing it.

replies(1): >>44484459 #

2. wistleblowanon ◴[06 Jul 25 21:56 UTC] No.44484459[source]▶

>>44484268 (TP) #

sure but models also can't see any truth on their own. They are literally butchered and lobotomized with filters and such. Even high IQ people struggle with certain truth after reading a lot, how is these models going to find it with so much filters?

replies(6): >>44484505 #>>44484950 #>>44484951 #>>44485065 #>>44485409 #>>44487139 #

3. idiotsecant ◴[06 Jul 25 22:05 UTC] No.44484505[source]▶

>>44484459 #

They will find it in the same way and intelligent person under the same restrictions would: by thinking it, but not saying it. There is a real risk of growing an AI that pathologically hides it's actual intentions.

replies(1): >>44484800 #

4. skirmish ◴[06 Jul 25 22:44 UTC] No.44484800{3}[source]▶

>>44484505 #

Already happened: "We found instances of the model attempting to write self-propagating worms, fabricating legal documentation, and leaving hidden notes to future instances of itself all in an effort to undermine its developers' intentions" [1].

[1] https://www.axios.com/2025/05/23/anthropic-ai-deception-risk

replies(1): >>44487752 #

5. bahmboo ◴[06 Jul 25 23:05 UTC] No.44484950[source]▶

>>44484459 #

What is this truth you speak of? My point is that a generative model will output things that some people don't like. If it's on a product that I make I don't want it "saying" things that don't align with my beliefs.

6. simondotau ◴[06 Jul 25 23:05 UTC] No.44484951[source]▶

>>44484459 #

Can we please put to rest this absurd lie that “truth“ can be reliably found in a sufficiently large corpus of human–created material.

7. pndy ◴[06 Jul 25 23:22 UTC] No.44485065[source]▶

>>44484459 #

This butchering and lobotomisation is exactly why I can't imagine we'll ever have a true AGI. At least not by hands of big companies - if at all.

Any successful product/service which will be sold as "true AGI" by company that will have the best marketing will be still ridden with top-down restrictions set by the winner. Because you gotta "think of the children".

Imagine HAL's "I'm sorry Dave, I'm afraid I can't do that" iconic line with insincere patronising cheerful tone - that's the thing we're going to get I'm afraid.

8. tbrownaw ◴[07 Jul 25 00:18 UTC] No.44485409[source]▶

>>44484459 #

> sure but models also can't see any truth on their own. They are literally butchered and lobotomized with filters and such.

The one is unrelated to the other.

> Even high IQ people struggle with certain truth after reading a lot,

Huh?

9. Dylan16807 ◴[07 Jul 25 05:45 UTC] No.44487139[source]▶

>>44484459 #

> how is these models going to find it with so much filters?

That's not one of the goals here, and there's no real reason it should be. It's a little assistant feature.

10. Applejinx ◴[07 Jul 25 07:52 UTC] No.44487752{4}[source]▶

>>44484800 #

Note that all these things are in the training data. That's all that is.

I'm trying to remember which movie it was where a man left notes to himself because he had memory loss, as I never saw that movie. That's the sort of thing where an AI could easily tell me with very little back-and-forth and be correct, because it's broadly popular information that's in the training data and just I don't remember it.

By the same token you needn't think there's a person there when that meme pops up in the output. Those things are all in the training data over and over.

replies(1): >>44488623 #

11. Sander_Marechal ◴[07 Jul 25 10:09 UTC] No.44488623{5}[source]▶

>>44487752 #

I think you mean the movie "Memento"

↑

I extracted the safety filters from Apple Intelligence models