Most active commenters

    ←back to thread

    534 points BlueFalconHD | 11 comments | | HN request time: 0.433s | source | bottom

    I managed to reverse engineer the encryption (refered to as “Obfuscation” in the framework) responsible for managing the safety filters of Apple Intelligence models. I have extracted them into a repository. I encourage you to take a look around.
    Show context
    torginus ◴[] No.44484236[source]
    I find it funny that AGI is supposed to be right around the corner, while these supposedly super smart LLMs still need to get their outputs filtered by regexes.
    replies(8): >>44484268 #>>44484323 #>>44484354 #>>44485047 #>>44485237 #>>44486883 #>>44487765 #>>44493460 #
    1. bahmboo ◴[] No.44484268[source]
    This is just policy and alignment from Apple. Just because the Internet says a bunch of junk doesn't mean you want your model spewing it.
    replies(1): >>44484459 #
    2. wistleblowanon ◴[] No.44484459[source]
    sure but models also can't see any truth on their own. They are literally butchered and lobotomized with filters and such. Even high IQ people struggle with certain truth after reading a lot, how is these models going to find it with so much filters?
    replies(6): >>44484505 #>>44484950 #>>44484951 #>>44485065 #>>44485409 #>>44487139 #
    3. idiotsecant ◴[] No.44484505[source]
    They will find it in the same way and intelligent person under the same restrictions would: by thinking it, but not saying it. There is a real risk of growing an AI that pathologically hides it's actual intentions.
    replies(1): >>44484800 #
    4. skirmish ◴[] No.44484800{3}[source]
    Already happened: "We found instances of the model attempting to write self-propagating worms, fabricating legal documentation, and leaving hidden notes to future instances of itself all in an effort to undermine its developers' intentions" [1].

    [1] https://www.axios.com/2025/05/23/anthropic-ai-deception-risk

    replies(1): >>44487752 #
    5. bahmboo ◴[] No.44484950[source]
    What is this truth you speak of? My point is that a generative model will output things that some people don't like. If it's on a product that I make I don't want it "saying" things that don't align with my beliefs.
    6. simondotau ◴[] No.44484951[source]
    Can we please put to rest this absurd lie that “truth“ can be reliably found in a sufficiently large corpus of human–created material.
    7. pndy ◴[] No.44485065[source]
    This butchering and lobotomisation is exactly why I can't imagine we'll ever have a true AGI. At least not by hands of big companies - if at all.

    Any successful product/service which will be sold as "true AGI" by company that will have the best marketing will be still ridden with top-down restrictions set by the winner. Because you gotta "think of the children".

    Imagine HAL's "I'm sorry Dave, I'm afraid I can't do that" iconic line with insincere patronising cheerful tone - that's the thing we're going to get I'm afraid.

    8. tbrownaw ◴[] No.44485409[source]
    > sure but models also can't see any truth on their own. They are literally butchered and lobotomized with filters and such.

    The one is unrelated to the other.

    > Even high IQ people struggle with certain truth after reading a lot,

    Huh?

    9. Dylan16807 ◴[] No.44487139[source]
    > how is these models going to find it with so much filters?

    That's not one of the goals here, and there's no real reason it should be. It's a little assistant feature.

    10. Applejinx ◴[] No.44487752{4}[source]
    Note that all these things are in the training data. That's all that is.

    I'm trying to remember which movie it was where a man left notes to himself because he had memory loss, as I never saw that movie. That's the sort of thing where an AI could easily tell me with very little back-and-forth and be correct, because it's broadly popular information that's in the training data and just I don't remember it.

    By the same token you needn't think there's a person there when that meme pops up in the output. Those things are all in the training data over and over.

    replies(1): >>44488623 #
    11. Sander_Marechal ◴[] No.44488623{5}[source]
    I think you mean the movie "Memento"