←back to thread

534 points BlueFalconHD | 1 comments | | HN request time: 0.225s | source

I managed to reverse engineer the encryption (refered to as “Obfuscation” in the framework) responsible for managing the safety filters of Apple Intelligence models. I have extracted them into a repository. I encourage you to take a look around.
Show context
torginus ◴[] No.44484236[source]
I find it funny that AGI is supposed to be right around the corner, while these supposedly super smart LLMs still need to get their outputs filtered by regexes.
replies(8): >>44484268 #>>44484323 #>>44484354 #>>44485047 #>>44485237 #>>44486883 #>>44487765 #>>44493460 #
bahmboo ◴[] No.44484268[source]
This is just policy and alignment from Apple. Just because the Internet says a bunch of junk doesn't mean you want your model spewing it.
replies(1): >>44484459 #
wistleblowanon ◴[] No.44484459[source]
sure but models also can't see any truth on their own. They are literally butchered and lobotomized with filters and such. Even high IQ people struggle with certain truth after reading a lot, how is these models going to find it with so much filters?
replies(6): >>44484505 #>>44484950 #>>44484951 #>>44485065 #>>44485409 #>>44487139 #
idiotsecant ◴[] No.44484505[source]
They will find it in the same way and intelligent person under the same restrictions would: by thinking it, but not saying it. There is a real risk of growing an AI that pathologically hides it's actual intentions.
replies(1): >>44484800 #
skirmish ◴[] No.44484800[source]
Already happened: "We found instances of the model attempting to write self-propagating worms, fabricating legal documentation, and leaving hidden notes to future instances of itself all in an effort to undermine its developers' intentions" [1].

[1] https://www.axios.com/2025/05/23/anthropic-ai-deception-risk

replies(1): >>44487752 #
Applejinx ◴[] No.44487752[source]
Note that all these things are in the training data. That's all that is.

I'm trying to remember which movie it was where a man left notes to himself because he had memory loss, as I never saw that movie. That's the sort of thing where an AI could easily tell me with very little back-and-forth and be correct, because it's broadly popular information that's in the training data and just I don't remember it.

By the same token you needn't think there's a person there when that meme pops up in the output. Those things are all in the training data over and over.

replies(1): >>44488623 #
1. Sander_Marechal ◴[] No.44488623[source]
I think you mean the movie "Memento"